Problem Statement :
Until Java 7, java.util.Hashmap implementations always suffered with the problem of Hash Collision, i.e. when multiple
values end up in the same bucket, values are placed in a Linked List implementation, which reduces Hashmap performance from O(1) to O(n).
Improve the performance of
under high hash-collision conditions by using balanced trees rather than linked lists to store … Continue Reading ››
Since Hadoop jobs are often long running, its difficult for newbies to manage the processes in Unix unless they know some useful Unix commands to do so, so that they can increase their efficiency.
In this post, I will explain some of the commands that are very useful while executing some long running jobs .We will see how to execute a … Continue Reading ››
Twitter opensourced it's Hosebird client (hbc) , a robust Java HTTP library for consuming Twitter’s Streaming API
. In this post, I am going to present a demo of how we can use hbc
to create a Kafka twitter stream producer , which tracks few terms on twitter statuses and produces a kafka stream out of it, which can be … Continue Reading ››
In Apache Hive, It's always a matter of confusion that how SORT BY, ORDER BY, DISTRIBUTE BY and CLUSTER BY differs. I have compiled a set of differences between these based on attributes like how will final output look like and ordering of data in output -
SORT BY Continue Reading ››
What is Coprocessor in HBase ?
Coprocessor is a mechanism which helps to move computations closer to the data in HBase. It is like a Mapreduce framework to distribute tasks across the cluster.
You can think of them like either Aspects in Java where it intercepts code before and after some critical operations and … Continue Reading ››