Why Log analysis is important for you ?

Need for Log Analysis

Logs provide us with necessary information on how our system is behaving. However, the content and format of logs varies among different services or say, among different components of the same system. For example a scanner may log error messages related to communication with other devices, on the other hand a web server logs information on all … Continue Reading ››

Top 15 HDFS Interview Questions

HDFS is the distributed file system used in Hadoop and helps to achieve the purpose of storing very larger files on a commodity Hardware. While working on Hadoop and BigData in general it is very important to understand the basic concepts of they underlying file system, i.e. HDFS in case of Hadoop. When you are appearing in BigData Interviews , … Continue Reading ››

Top 20 Hadoop MapReduce Interview Questions

BigData and Data analytics Jobs are the most sought after jobs of current time. It is important to understand the basics before you appear for interview. In this post, I am covering few of the basic MapReduce interview questions for Hadoop MapReduce. saurzcode_hadoop
  1. What is MapReduce ?
  2. What is combiner and when you … Continue Reading ››

My Book on ELK Stack : Learning ELK Stack

What is RDD in Spark ? and Why do we need it ?

Resilient Distributed Datasets -RDDs in Spark

Apcahe Spark has already taken over Hadoop (MapReduce)  because of plenty of benefits it provides in terms of faster execution in iterative processing algorithms such as Machine learning. In this post, we will try to understand what makes Spark RDDs so useful in batch analytics .

Why RDD ?

When it comes to iterative distributed computing, i.e. processing data … Continue Reading ››

What is Apache HCatalog ?

What is HCatalog ?

Apache HCatalog is a Storage Management Layer for Hadoop that helps to users of different data processing tools in Hadoop ecosystem like Hive, Pig and MapReduce easily read and write data from the cluster.HCatalog enables with relational view of data  from RCFile format, Parquet, ORC files, Sequence files stored on HDFS. It also exposes REST API exposed … Continue Reading ››

BigData and Hadoop Tutorial

%d bloggers like this: