Apache Spark is an execution engine that broadens the type of computing workloads Hadoop can handle, while also tuning the performance of the big data framework.
Hadoop specialist Cloudera recently announced that it will offer commercial support for Apache Spark, which is available as part of Cloudera’s Hadoop-powered Enterprise Data Hub. But why should businesses care about Spark?
Apache Spark has numerous advantages over Hadoop’s MapReduce execution engine, in both the speed with which it carries out batch processing jobs and the wider range of computing workloads it can handle.
Spark is able to execute batch-processing jobs between 10 to 100 times faster than the MapReduce engine according to Cloudera, primarily by reducing the number of writes and reads to disc.
“You have map and reduce tasks and after that there’s a synchronisation barrier and you persist all of the data to disc,” said Mark Grover, Hadoop engineer for Cloudera.