Is Apache Hadoop the only option to implement big data?

Hadoop   |   
Published December 24, 2015   |   

Yes, Hadoop is not only the options to big data problem. Hadoop is one of the solutions.

The HPCC (High-Performance Computing Cluster) Systems technology is an open source data-driven and intensive processing and delivery platform developed by LexisNexis Risk Solutions. HPCC Systems incorporates a big data software architecture implemented on commodity shared-nothing computing clusters to provide high-performance, data-parallel processing and delivery for applications utilizing Big Data.

The HPCC Systems platform includes system configurations to support both, parallel batch data processing (Thor) ~ (HBase Design & Data Consumption Principle) and high-performance data delivery applications using indexed data files (Roxie) ~ (HBase Design & Data Consumption Principle). It includes Enterprise Control Language (ECL) which is a parallel data-centric declarative programming language.

HPCC

HPCC Systems detailed components:

Thor – Data Refinery Cluster designed to execute big data workflows including extraction, loading, cleansing, transformations, linking and indexing.

Roxie – Rapid Data Delivery Cluster provides separate high-performance online query delivery for Big Data delivery. Roxie utilizes highly optimized distributed B-tree indexed data structures and has been conceived for high concurrent use. A typical 10 node cluster can process thousands of concurrent requests and deliver them in fractions of a second.

ECL – Enterprise Control Language is declarative, data-centric, distributed processing language for Big Data.  ECL is a declarative, collaborative and extensible, high-level language that allows the programmer to describe the desired outcome instead of programming tedious and ambiguous scripting.

ECL

Declarative: describes the what, not the how. Focused: Higher level code means fewer programmers and shortens time to delivery. Extensible: As new attributes are defined, they become primitives that other programmers can use. Implicitly parallel: Parallelism is built into the underlying platform. The programmer does not need to manage it. Maintainable: Designed for long-term, large-scale, enterprise use. Complete: Provides for a complete programming paradigm. Homogeneous: One language to express data algorithms across the entire HPCC Systems platform, including a data ETL and high-speed delivery.

IDE – The Integrated Development Environment called the ECL IDE turns code into graphs that facilitate the understanding and processing of large-scale, complex data analytics.

ESP – Enterprise Services Platform provides an easy to use interface to access ECL queries using XML, HTTP, SOAP (Simple Object Access Protocol) and REST (Representational State Transfer).

Data Graphs – We need a series of advanced functions to solve many complex data. The HPCC Systems technology, complex data challenges can be represented naturally with a transformative data graph. The nodes of the data graph can be processed in parallel as distinct data flows. Each section of the graph includes information such as function, records processed or skew and each node can be drilled into for specific details.