Hadoop

Recent Articles
Apache Spark vs Hadoop: Which is the big data winner?
Apache Spark vs Hadoop: Which is the big data winner?

With the evolution of technology, data is present everywhere. Thanks to the internet, which has enabled inter-connectivity of millions of devices across the globe. There has been an unprecedented growth of data usage in the recent years which is likely to expand...

What are Hadoop alternatives and should you look for one?
What are Hadoop alternatives and should you look for one?

Hadoop’s development from a batch-oriented, large-scale analytics tool to an entire ecosystem comprised of various application, tools, services, and vendors goes hand in hand with the rise of big data marketplace. It is predominantly used for large scale data analysis...

The business of transferring data from Salesforce to Hadoop

The sustained success of Hadoop has brought about a radical change in big data management. This highly popular open-source MapReduce technology allows easy access and provides reliable answers to advanced data questions. Data management has been taken to the next...

Why use Hadoop? Top pros and cons of Hadoop
Why use Hadoop? Top pros and cons of Hadoop

Big Data is one of the major areas of focus in today’s digital world. There are tons of data generated and collected from the various processes carried out by the company. This data could contain patterns and methods as to how the company can improve its processes....

How to fetch HBase table data in Apache Phoenix?
How to fetch HBase table data in Apache Phoenix?

This exclusive post is shared by big data services providers to help developers in development. They tell the best way to fetch HBase table data in Apache Phoenix. Read this article and discover what they have to say about Big Data related services. The term 'Big...

Top 11 key tuning checklists for Apache Hadoop
Top 11 key tuning checklists for Apache Hadoop

Apache Hadoop is a well know and de-facto framework for processing large big data sets through distributed & parallel computing. YARN(Yet Another Resources Negotiator) allowed Hadoop to evolve from a simple MapReduce engine to a big data ecosystem that can run...

Eight breakthrough changes in Apache Flink 1.0.0
Eight breakthrough changes in Apache Flink 1.0.0

Apache Flink is an open source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink also builds...

What is SMACK (Spark, Mesos, Akka, and Kafka)?
What is SMACK (Spark, Mesos, Akka, and Kafka)?

This blog introduces the convergence of complementary technologies – Spark, Mesos, Akka, Cassandra and Kafka (SMACK) stack. And we will see how Apache Kafka can help us to get data under control and what is it role in our data pipeline, how Spark & Akka help us to...

Top ten pointers in the new Apache Spark release (version 1.6)
Top ten pointers in the new Apache Spark release (version 1.6)

In 2016, we should be excited that Apache Spark community launched Apache Spark 1.6. Committers – There are around 1000 contributors to Apache Spark, which has doubled. Patches – Apache Spark 1.6 version includes & covers 1000 patches. Run SQL query on files –...

What is the role of RDDs in Apache Spark? – Part 1
What is the role of RDDs in Apache Spark? – Part 1

This blog introduces Spark’s core abstraction for working with data, the RDD (Resilient Distributed Dataset). An RDD is simply a distributed collection of elements or objects (Java, Scala, Python, and user defined functions) across the Spark cluster. In Spark, all...

Is Apache Hadoop the only option to implement big data?
Is Apache Hadoop the only option to implement big data?

Yes, Hadoop is not only the options to big data problem. Hadoop is one of the solutions. The HPCC (High-Performance Computing Cluster) Systems technology is an open source data-driven and intensive processing and delivery platform developed by LexisNexis Risk...

The top 12 Apache Hadoop challenges
The top 12 Apache Hadoop challenges

Hadoop is a large-scale distributed batch processing infrastructure. While it can be used on a single machine, its true power lies in its ability to scale to hundreds or thousands of computers, each with several processor cores. Hadoop is also designed to efficiently...

Three reasons why business users may want to learn Hadoop
Three reasons why business users may want to learn Hadoop

Big data is a popular topic these days, not only in the tech media, but also among mainstream news outlets. Executives see Big Data as providing significant business benefits – greater insight and learning, the ability to obtain answers and make decisions faster and...

Seven common problems of scaling Hadoop

Every Hadoop implementation encounters the occasional crisis, including moments when the folks running Hadoop feel like their hair is on fire. Sometimes it happens before you get to production, which can cause organizations to throw the Hadoop baby out with the...

Hadoop Glossary: 20 most important terms
Hadoop Glossary: 20 most important terms

This is a list of most important Hadoop terms you need to know and understand, before going into the Hadoop eco-system. [To read about top 10 most popular myths about Hadoop, click here.] Most important Hadoop terms Apache or Apache Software Foundation (ASF): A...

Top 20 essential Hadoop tools for crunching Big Data
Top 20 essential Hadoop tools for crunching Big Data

Hadoop is an open source distributed processing framework which is at the center of a growing big data ecosystem. Used to support advanced analytics initiatives, including predictive analytics, data mining and machine learning applications, Hadoop manages data...

Top 10 books to get started with Hadoop
Top 10 books to get started with Hadoop

These books are our recommendations if you are planning to start your Big Data journey with Hadoop, - an open source distributed processing framework which is at the center of a growing big data ecosystem. The books are listed in no specific order. 1. Hadoop: The...

Learn Hadoop with 10 SlideShare presentations

Want to learn Hadoop? Watch these presentations on SlideShare to understand Hadoop HDFS, the MapReduce algorithm, the Pig Latin language, and the Hive SQL language. 1. Introduction to MapReduce, an Abstraction for Large-Scale Computation by Ilan Horn, Google...

Best LinkedIn groups all Hadoop experts should join

There are hundreds of Hadoop groups on LinkedIn, but these are the best ones you should definitely consider joining. Join them to learn about the latest happenings in the world of Hadoop, and engage in discussions with other professionals online. 1. Hadoop Users Group...

Top 10 most popular myths about Hadoop
Top 10 most popular myths about Hadoop

Hadoop and Big Data are practically synonymous these days. There is so much info on Hadoop and Big Data out there, but as the Big Data hype machine gears up, there's a lot of confusion about where Hadoop actually fits into the overall Big Data landscape. Let’s have a...

Understanding the power of Hadoop as a Service

Across a wide range of industries from health care and financial services to manufacturing and retail, companies are realizing the value of analyzing data with Hadoop. With access to a Hadoop cluster, organizations are able to collect, analyze, and act on data at a...

Hadoop, big data, and the elephant in the room
Hadoop, big data, and the elephant in the room

In the 1800s, John Godfrey Saxe wrote a poem about six blind men and an elephant based on an old Indian story. In an effort to discover what the elephant is, each man touches a different part of the creature and subsequently draws his own unique — and incorrect —...

A Guide to Checkpointing in Hadoop

Checkpointing is an essential part of maintaining and persisting filesystem metadata in HDFS. It’s crucial for efficient NameNode recovery and restart, and is an important indicator of overall cluster health. However, checkpointing can also be a source of confusion...

Big data infrastructure goes far beyond Hadoop

Wikibon Principal Research Contributor Jeff Kelly provides an inclusive basic tutorial of the big data environment, including technologies, skill sets, and use cases, in “Big Data: Hadoop, Business Analytics and Beyond”, and while the environment starts with Hadoop...

Top 3 reasons Hadoop is heading to the Cloud
Top 3 reasons Hadoop is heading to the Cloud

Cloud computing and big data have been vying for the attention of business owners for several years now. Both initiatives are compelling, as big data analytics promises the potential of powerful new business insights, and cloud computing offers greater flexibility,...

Big data: 5 major advantages of Hadoop
Big data: 5 major advantages of Hadoop

By now, you have probably heard of Apache Hadoop - the name is derived from a cute toy elephant but Hadoop is all but a soft toy. Hadoop is an open source project that offers a new way to store and process big data. The software framework is written in Java for...