What is RDD in Spark ? and Why do we need it ?

What is RDD in Spark? And Why Do We Need It?

A developer-friendly guide to understanding Resilient Distributed Datasets (RDDs) in Apache Spark, their properties, and why they are fundamental for fast, fault-tolerant distributed computing.

2 min read

What is Apache HCatalog ?

What is Apache HCatalog?

A developer-friendly introduction to Apache HCatalog, its architecture, features, and how it fits into the Hadoop ecosystem.

2 min read

Hashmap Performance Improvements in Java 8

HashMap Performance Improvements in Java 8

A developer-focused look at how Java 8 improved the performance of HashMap under high-collision scenarios, with code examples and practical explanations.

2 min read

Unix Job Control Commands – bg, fg, Ctrl+Z,jobs

Unix Job Control Commands: bg, fg, Ctrl+Z, jobs

A practical guide for developers and data engineers to manage long-running jobs in Unix, especially useful when working with Hadoop or other big data tools.

2 min read

How-To : Write a Kafka Producer using Twitter Stream ( Twitter HBC Client)

How-To: Write a Kafka Producer using Twitter Stream (Twitter HBC Client)

A step-by-step guide to building a Kafka producer that streams live tweets using Twitter’s Hosebird Client (HBC) and publishes them to a Kafka topic. This is a practical, developer-focused walkthrough with code, configuration, and troubleshooting tips.

3 min read

How-to : Write a CoProcessor in HBase

What is Coprocessor in HBase?

Coprocessor is a mechanism which helps to move computations closer to the data in HBase. It is like a Mapreduce framework to distribute tasks across the cluster.

2 min read

Hive : SORT BY vs ORDER BY vs DISTRIBUTE BY vs CLUSTER BY

In Apache Hive HQL, you can decide to order or sort your data differently based on ordering and distribution requirement. In this post we will look at how SORT BY, ORDER BY, DISTRIBUTE BY and CLUSTER BY behave differently in Hive. Let’s get started -

3 min read

How-To : Use HCatalog with Pig

Using HCatalog with Pig

This post is a step by step guide on running HCatalog and using HCatalog with Apache Pig:

1 min read

Hive Strict Mode

![Sort By vs Order By vs Group By vs Cluster By in Hive]/assets/uploads/2015/01/images.jpg)

What is Hive Strict Mode?

Hive Strict Mode (hive.mapred.mode=strict) enables Hive to restrict certain performance intensive operations. Such as:

~1 min read