Big data Performance Enhancement using Machine Learning Spark-ML Pipeline Auto Parameter Tuning

Fecha de publicación: 01/01/2021
Fuente: Wipo "BigData"
The Big data is not only complex, huge data also variety of data which is very difficult to analyze and process efficiently using traditional systems. To analyze and process big data efficiently, we have recently many frameworks like Hadoop, Spark, flink. Some of the languages to process big data are java, Scala, Pig, NoSQL, mango DB Hive Habse. Spark is developed using scala, one of the languages which reduce the unnecessary code of Java for processing, Py-Spark is one of the python and spark frame work for processing big data efficiently using Python that is python with spark. Spark-R is also a language for processing big data it’s a spark on top of R language. Here with spark mllib or SparkMl, mlflow pipelined Auto parameters tuning enhances the processing performance of big data.