Tag Archives: bigdata

Top 15 HDFS Interview Questions

HDFS is the distributed file system used in Hadoop and helps to achieve the purpose of storing very larger files on a commodity Hardware. While working on Hadoop and BigData in general it is very important to understand the basic concepts of they underlying file system, i.e. HDFS in case of Hadoop. When you are appearing in BigData Interviews , … Continue Reading ››

What is RDD in Spark ? and Why do we need it ?

Resilient Distributed Datasets -RDDs in Spark

Apcahe Spark has already taken over Hadoop (MapReduce)  because of plenty of benefits it provides in terms of faster execution in iterative processing algorithms such as Machine learning. In this post, we will try to understand what makes Spark RDDs so useful in batch analytics .

Why RDD ?

When it comes to iterative distributed computing, i.e. processing data … Continue Reading ››

What is Apache HCatalog ?

What is HCatalog ?

Apache HCatalog is a Storage Management Layer for Hadoop that helps to users of different data processing tools in Hadoop ecosystem like Hive, Pig and MapReduce easily read and write data from the cluster.HCatalog enables with relational view of data  from RCFile format, Parquet, ORC files, Sequence files stored on HDFS. It also exposes REST API exposed … Continue Reading ››

Hive Strict Mode

Sort By vs Order By vs Group By vs Cluster By in Hive

What is Hive Strict Mode ?

Hive Strict Mode ( hive.mapred.mode=strict) enables hive to restrict certain performance intensive operations. Such as -
  • It restricts queries of partitioned tables without a WHERE clause.

How-To : Configure MySQL Metastore for Hive ?

Hive by default comes with Derby as its metastore storage, which is suited only for testing purposes and in most of the production scenarios it is recommended to use MySQL as a metastore. This is a step by step guide on How to Configure MySQL Metastore for Hive in place of Derby Metastore (Default). Assumptions : Basic knowledge of Unix is … Continue Reading ››

Getting Started with Hadoop : Free Online Hadoop Trainings

Hadoop Trainings

Oh Yes! It's Free !!!

With the rising popularity , increase in demand and lack of experts in Big Data and Hadoop technologies, various  paid training courses and certifications are available from various Enterprise Hadoop providers like Cloudera, Hortonworks , IBM, MapR etc .But if you don't want to shell out some money … Continue Reading ››