Hashmap Performance Improvements in Java 8

Problem Statement :

Until Java 7, java.util.Hashmap implementations always suffered with the problem of Hash Collision, i.e. when multiple hashCode() values end up in the same bucket, values are placed in a Linked List implementation, which reduces Hashmap performance from O(1) to O(n).

Solution :

Improve the performance of java.util.HashMap under high hash-collision conditions by using balanced trees rather than linked lists to store … Continue Reading ››

Unix Job Control Commands – bg, fg, Ctrl+Z,jobs

Since Hadoop jobs are often long running, its difficult for newbies to manage the processes in Unix unless they know some useful Unix commands to do so, so that they can increase their efficiency. In this post, I will explain some of the commands that are very useful while executing some long running jobs .We will see how to execute a … Continue Reading ››

How-To : Integrate Kafka with HDFS using Camus (Twitter Stream Example)

{{unknown}}

How-To : Write a Kafka Producer using Twitter Stream ( Twitter HBC Client)

Twitter opensourced it's  Hosebird client (hbc) , a robust Java HTTP library for consuming Twitter’s Streaming API . In this post, I am going to present a demo of how we can use hbc to create a Kafka twitter stream producer , which tracks few terms on twitter statuses  and produces a kafka stream out of it, which can be … Continue Reading ››

Hive : SORT BY vs ORDER BY vs DISTRIBUTE BY vs CLUSTER BY

In Apache Hive, It's always a matter of confusion that how SORT BY,  ORDER BY, DISTRIBUTE BY and CLUSTER BY differs. I have compiled a set of differences between these based on attributes like  how will final output look like and ordering of data in output -

SORT BY

Sort By vs Order By … <a href=Continue Reading ››

How-to : Write a CoProcessor in HBase

What is Coprocessor in HBase ?

Coprocessor is a mechanism which helps to move computations closer to the data in HBase. It is like a Mapreduce framework to distribute tasks across the cluster.hbase_logo You can think of them like either Aspects in Java  where it intercepts code before and after some critical operations and … Continue Reading ››

BigData and Hadoop Tutorial

%d bloggers like this: