Category: Big Data

What is RDD in Spark ? and Why do we need it ?

Resilient Distributed Datasets -RDDs in Spark Apcahe Spark has already taken over Hadoop (MapReduce) because of plenty of benefits it provides in terms of faster execution in iterative processing algorithms such as Machine learning. In...

What is Apache HCatalog ?

What is HCatalog ? Apache HCatalog is a Storage Management Layer for Hadoop that helps to users of different data processing tools in Hadoop ecosystem like Hive, Pig and MapReduce easily read and write data...

%d bloggers like this: