What is Apache HCatalog ?
What is HCatalog ? Apache HCatalog is a Storage Management Layer for Hadoop that helps to users of different data processing tools in Hadoop ecosystem like Hive, Pig and MapReduce easily read and write data...
What is HCatalog ? Apache HCatalog is a Storage Management Layer for Hadoop that helps to users of different data processing tools in Hadoop ecosystem like Hive, Pig and MapReduce easily read and write data...
Problem Statement : Until Java 7, java.util.Hashmap implementations always suffered with the problem of Hash Collision, i.e. when multiple hashCode() values end up in the same bucket, values are placed in a Linked List implementation, which reduces...
Since Hadoop jobs are often long running, its difficult for newbies to manage the processes in Unix unless they know some useful Unix commands to do so, so that they can increase their efficiency. In...
Big Data / Java / Kafka / Technology
Simple String Example for Setting up Camus for Kafka-HDFS Data Pipeline I came across Camus while building a Lambda Architecture framework recently. I couldn’t find a good Illustration of getting started with Kafk-HDFS pipeline ,...
Big Data / Java / Kafka / Technology
Twitter open-sourced its Hosebird client (hbc), a robust Java HTTP library for consuming Twitter’s Streaming API. In this post, I am going to present a demo of how we can use hbc to create a Kafka twitter stream...
Big Data / HBase / Technology
What is Coprocessor in HBase ? Coprocessor is a mechanism which helps to move computations closer to the data in HBase. It is like a Mapreduce framework to distribute tasks across the cluster. You can...