Category: Big Data

Top 20 Hadoop and Big Data Books


Big Data Books

Hadoop: The Definitive Guide

i

Hadoop: The Definitive Guides the ideal guide for anyone who wants to know about the Apache Hadoop  and all that can be done with it.Good book on basics of Hadoop (HDFS, MapReduce & other related technologies). This book provides all necessary details to start work with Hadoop, program using it

“Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk.” — Doug Cutting, Hadoop Founder, Yahoo!

Latest version 4th Edition is available here  – Hadoop – The Definitive Guide 4e

(more…)


How-To :Become a Hadoop Certified Developer ?

Hadoop Certified

Hadoop

 Apache Hadoop is an open source framework for distributed storing and processing of large sets of data on commodity hardware. Hadoop enables businesses to gain insight from massive amounts of structured and unstructured data quickly.

Hadoop and Big Data are the hot trends of the Industry these days. Most of the companies are already implementing these or they have at least started to show interest to remain competitive in the market. Big Data and Analytic are certainly one of the great concepts for current and forthcoming IT generation as most of the innovation is driven by vast amount of data that is being generated exponentially.

There are many vendors for Enterprise Hadoop in the Industry – Cloudera, HortonWorks (forked out of Yahoo), MapR, IBM are some of the few front runners among them. They all have their own Hadoop Distributions which differs in one way or other in terms of features keeping Hadoop to its core. They provide training on various Hadoop and Big Data technologies and as an Industry trend are coming out to provide certifications around these technologies too.

In this article I am going to list down all the latest available certifications for Hadoop by different vendors in the industry. Certifications are helpful to your career or not , that’s altogether a different debate and out of scope of this article. It may be useful for some of the folks out there who thinks they have done enough reading about it and now they want to judge themselves or those who are looking to add values to their  portfolios.

Cloudera        


CCAH (Administrator) Exams

Cloudera Certified Administrator for Apache Hadoop (CCA-410)

There are three versions for this exam currently –

CCAH CDH 4 Exam
Exam Code: CCA-410
Number of Questions: 60 questions
Time Limit: 90 minutes
Passing Score: 70%
Language: English, Japanese
Price: USD $295

CCAH CDH 5 Exam
Exam Code: CCA-500
Number of Questions: 60 questions
Time Limit: 90 minutes
Passing Score: 70%
Language: English, Japanese (forthcoming)
Price: USD $295

CCAH CDH 5 Upgrade Exam
Exam Code: CCA-505
Number of Questions: 45 questions
Time Limit: 90 minutes
Passing Score: 70%
Language: English, Japanese (forthcoming)
Price: USD $125

CCAH Practice Test

CCAH Study Guide

CCDH (Developer) Exams

Cloudera Certified Developer for Apache Hadoop (CCD-410)

Exam Code: CCD-410
Number of Questions: 50 – 55 live questions
Time Limit: 90 minutes
Passing Score: 70%
Language: English, Japanese
Price: USD $295

Study Guide :- Available at Cloudera site.

Practice Tests :- Available at Cloudera site.

Hortonworks


For 2.x Certifications

1) Hadoop 2.0 Java Developer Certification

This certification is intended for developers who design, develop and architect Hadoop-based solutions written in the Java programming language.

Time Limit  : 90 minutes

Number of Questions : 50

Passing Score : 75%

Price : $150 USD

Practice tests can be taken by registering at certification site.

2) Hadoop 2.0 Developer Certification

The Certified Apache Hadoop 2.0 Developer certification is intended for developers who design, develop and architect Hadoop-based solutions, consultants who create Hadoop project proposals and Hadoop development instructors.

Time Limit  : 90 minutes.

Number of Questions :50

Passing Score : 75%

Price : $150 USD

Practice tests can be taken by registering at certification site.

3) Hortonworks Certified Apache Hadoop 2.0 Administrator

This is intended for administrators who deploy and manage Apache Hadoop 2.0 clusters, teaches students how to install,configure, maintain and scale the Hadoop 2.0 environment.

Time Limit  : 90 minutes.

Number of Questions :48

Passing Score : 75%

Price : $150 USD

For 1.x Certifications

1) Hadoop 1.0 Developer Certification

Time Limit  : 90 minutes.

Number of Questions :53

Passing Score : 75%

Price : $150 USD

2) Hadoop 1.0 Administrator Certification

Time Limit  : 60 minutes.

Number of Questions :41

Passing Score : 75%

Price : $150 USD

 


Related Articles :

 

Recommended Readings for Hadoop

I am writing this series to mention some of the recommended reading to understand Hadoop , its architecture, minute details of cluster setup etc.

Understanding Hadoop Cluster Setup and Network – Brad Hedlund, with his expertise in Networks, provide minute details of cluster setup, data exchange mechanisms of a typical Hadoop Cluster Setup.

MongoDB and Hadoop – Webinar by Mike O’Brien,Software Engineer, MongoDB on how MongoDB and Hadoop can be used together , using core MapReduce and Pig and Hive as well.

Please post comments if you have come across some great article/webinar link, which explains things in great details with ease.

Top 10 Hadoop Shell Commands to manage HDFS

Basically, our goal is to organize the world’s information and to make it universally accessible and useful.-Larry Page

So you already know what Hadoop is? Why it is used for ? and  What problems you can solve with it?  and you want to know how you can deal with files on HDFS ?  Don’t worry, you are at the right place.

In this article I will present Top 10 basic Hadoop HDFS operations managed through shell commands which are useful to manage files on HDFS clusters, For testing purposes ,you can invoke this commands using either some of the VMs from Cloudera, Hortonworks etc or if you have your own setup of pseudo distributed cluster .

Let’s get started.

1. Create a directory in HDFS at given path(s).

2.  List the contents of a directory.

3. Upload and download a file in HDFS.

Upload:

hadoop fs -put:

Copy single src file, or multiple src files from local file system to the Hadoop data file system

Download:

hadoop fs -get:

Copies/Downloads files to the local file system

4. See contents of a file

Same as unix cat command:

5. Copy a file from source to destination

This command allows multiple sources as well in which case the destination must be a directory.

6. Copy a file from/To Local file system to HDFS

copyFromLocal

Similar to put command, except that the source is restricted to a local file reference.

copyToLocal

Similar to get command, except that the destination is restricted to a local file reference.

7.Move file from source to destination.

Note:- Moving files across filesystem is not permitted.

8. Remove a file or directory in HDFS.

Remove files specified as argument. Deletes directory only when it is empty

Recursive version of delete.

9. Display last few lines of a file.

Similar to tail command in Unix.

10.Display the aggregate length of a file.

 

Please comment which of these commands you found most useful while dealing with Hadoop /HDFS.


 Related articles : 

%d bloggers like this: