Fecha de publicación:
21/06/2017
Fuente: WIPO "hive"
Systems and methods for generating performance prediction model and estimating execution time for applications is provided. The system executes synthetic benchmarks for a first dataset on a first cluster. Each synthetic benchmark includes a MapReduce (MR) job. The system further extracts sensitive parameters for each sub-phase of the MR job, generates a linear regression prediction model for each sub-phase to obtain one or more linear regression prediction models, based on which the system further generates a performance prediction model to be utilized for predicting, using the sensitive parameters, a Hive query execution time of a DAG of one or more MR jobs executed on a second dataset on a second cluster, wherein the first cluster that includes the first dataset is smaller compared to the second cluster that includes the second dataset.