What is RDD in Spark ? and Why do we need it ?
What is RDD in Spark? And Why Do We Need It?
A developer-friendly guide to understanding Resilient Distributed Datasets (RDDs) in Apache Spark, their properties, and why they are fundamental for fast, fault-tolerant distributed computing.
What is Apache HCatalog ?
What is Apache HCatalog?
A developer-friendly introduction to Apache HCatalog, its architecture, features, and how it fits into the Hadoop ecosystem.
Hashmap Performance Improvements in Java 8
HashMap Performance Improvements in Java 8
A developer-focused look at how Java 8 improved the performance of HashMap
under high-collision scenarios, with code examples and practical explanations.
Unix Job Control Commands – bg, fg, Ctrl+Z,jobs
Unix Job Control Commands: bg
, fg
, Ctrl+Z
, jobs
A practical guide for developers and data engineers to manage long-running jobs in Unix, especially useful when working with Hadoop or other big data tools.
How-To : Integrate Kafka with HDFS using Camus (Twitter Stream Example)
Integrate Kafka with HDFS using Camus (Twitter Stream Example)
A step-by-step guide to building a Kafka-to-HDFS data pipeline using Camus and a Twitter stream. This guide is aimed at developers looking for a practical, detailed walkthrough.
How-To : Write a Kafka Producer using Twitter Stream ( Twitter HBC Client)
How-To: Write a Kafka Producer using Twitter Stream (Twitter HBC Client)
A step-by-step guide to building a Kafka producer that streams live tweets using Twitter’s Hosebird Client (HBC) and publishes them to a Kafka topic. This is a practical, developer-focused walkthrough with code, configuration, and troubleshooting tips.
How-to : Write a CoProcessor in HBase
What is Coprocessor in HBase?
Coprocessor is a mechanism which helps to move computations closer to the data in HBase. It is like a Mapreduce framework to distribute tasks across the cluster.
Hive : SORT BY vs ORDER BY vs DISTRIBUTE BY vs CLUSTER BY
In Apache Hive HQL, you can decide to order or sort your data differently based on ordering and distribution requirement. In this post we will look at how SORT BY, ORDER BY, DISTRIBUTE BY and CLUSTER BY behave differently in Hive. Let’s get started -
How-To : Use HCatalog with Pig
Using HCatalog with Pig
This post is a step by step guide on running HCatalog and using HCatalog with Apache Pig:
Hive Strict Mode
![Sort By vs Order By vs Group By vs Cluster By in Hive]/assets/uploads/2015/01/images.jpg)
What is Hive Strict Mode?
Hive Strict Mode (hive.mapred.mode=strict
) enables Hive to restrict certain performance intensive operations. Such as: