开yun体育官网入口登录苹果 开yun体育官网入口登录苹果
开yun体育官网入口登录苹果
College of Computer Science and Software Engineering, SZU

Taming System Dynamics on Resource Optimization for Data Processing Workflows: A Probabilistic Approach

IEEE Transactions on Parallel and Distributed Systems (TPDS)

 

Amelie Chi Zhou1    Weilin Xue1    Yao Xiao1    Bingsheng He2    Shadi Ibrahim3     Reynold Cheng4

1Shenzhen University    2National University of Singapore    3Inria    4University of Hong Kong

Abstract

In many data-intensive applications, workflow is often used as an important model for organizing data processing tasks and resource provisioning is an important and challenging problem for improving the performance of workflows. Recently, system variations in the cloud and large-scale clusters, such as those in I/O and network performances and failure events, have been observed to greatly affect the performance of workflows. Traditional resource provisioning methods, which overlook these variations, can lead to suboptimal resource provisioning results. In this article, we provide a general solution for workflow performance optimizations considering system variations. Specifically, we model system dynamics as time-dependent random variables and take their probability distributions as optimization input. Despite its effectiveness, this solution involves heavy computation overhead. Thus, we propose three pruning techniques to simplify workflow structure and reduce the probability evaluation overhead. We implement our techniques in a runtime library, which allows users to incorporate efficient probabilistic optimization into existing resource provisioning methods. Experiments show that probabilistic solutions can improve the performance by up to 65 percent compared to state-of-the-art static solutions, and our pruning techniques can greatly reduce the overhead of our probabilistic approach.

 

Fig. 1. (a) Spatial and (b) temporal features of the I/O and network per-formance distributions of Windows Azure instances.

Fig. 2. Failure dynamics in Google trace: (a) failure interval distributions of four types of tasks; (b) relationship between task execution time and MTBF.

Fig. 4. An example of pre-processing for Montage workflow.

 

Fig. 8. Normalized results of budget-constrained scheduling under tight and loose budget on Amazon EC2.

 

Fig. 10. Normalized results of budget-constrained scheduling under tight and loose budget on Windows Azure.

 

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61802260, in part by the Shenzhen Science and Technology Foundation under Grant JCYJ20180305125737520, and in part by the Natural Science Foundation of SZU under Grant 000370. The work of Bingsheng He was supported in part by a collaborative grant from Microsoft Research Asia. The work of Shadi Ibrahim work was supported by ANR KerStream Project under Grant ANR-16-CE25-0014-01. The work of Reynold Cheng was supported in part by the Research Grants Council of HK through RGC Projects HKU under Grants 17229116, 106150091, and 17205115, the University of HK under Grants 104004572, 102009508, and 104004129, and in part by the Innovation&Technology Commission of HK through ITF Project MRP/029/18.

 

Bibtex

@ARTICLE{9462122,

author={Zhou, Amelie Chi and Xue, Weilin and Xiao, Yao and He, Bingsheng and Ibrahim, Shadi and Cheng, Reynold},

journal={IEEE Transactions on Parallel and Distributed Systems},

title={Taming System Dynamics on Resource Optimization for Data Processing Workflows: A Probabilistic Approach},

year={2022},

volume={33},

number={1},

pages={231-248},

}

Downloads