HDFS is the distributed file system used in Hadoop and helps to achieve the purpose of storing very larger files on a commodity Hardware. While working on Hadoop and BigData in general it is very...
BigData and Data analytics Jobs are the most sought after jobs of current time. It is important to understand the basics before you appear for interview. In this post, I am covering few of the...
Learning ELK Stack I am writing this post to announce the general availability of my book on ELK stack titled ” Learning ELK Stack ” with PacktPub publications. Book aims to provide individuals/technologists who seek...
Resilient Distributed Datasets -RDDs in Spark Apcahe Spark has already taken over Hadoop (MapReduce) because of plenty of benefits it provides in terms of faster execution in iterative processing algorithms such as Machine learning. In...
What is HCatalog ? Apache HCatalog is a Storage Management Layer for Hadoop that helps to users of different data processing tools in Hadoop ecosystem like Hive, Pig and MapReduce easily read and write data...
Since Hadoop jobs are often long running, its difficult for newbies to manage the processes in Unix unless they know some useful Unix commands to do so, so that they can increase their efficiency. In...