WORKLOAD GENERATION FOR OPTIMAL STRESS TESTING OF BIG DATA MANAGEMENT SYSTEMS

Fecha de publicación: 19/10/2023
Fuente: Wipo "BigData"
A computer-implemented method, system and computer program product for optimally performing stress testing against big data management systems. A set of random test queries is generated and compiled to determine the data points of the features (e.g., table type being queried) of the set of random test queries. A distance (e.g., Mahalanobis distance) is then measured between the data points of the features and the mean of a distribution of data points corresponding to each same feature of an extracted feature set. Each random test query whose distance exceeds a threshold distance is then ranked. The ranked random test queries are then executed in order of rank. Those executed random test queries which resulted in an error (e.g., system failure) are added to a log, which is used to identify those queries to perform a stress test against the big data management system.