bigdata Archives - SaurzCode

September 14, 2019

What does Skipped Stage means in Spark WebUI ?

Skipped Stages in Spark UI You must have come across various scenarios where you see a DAG like below, where you see a few stages shows greyed out with a text (skipped) after the stage...

Big Data / Scala / Spark / Technology

June 7, 2018

Dataframe Operations in Spark using Scala

Dataframe in Apache Spark is a distributed collection of data, organized in the form of columns. Dataframes can be transformed into various forms using DSL operations defined in Dataframes API, and its various functions. In...

Big Data / Technology

December 19, 2015

What is RDD in Spark ? and Why do we need it ?

Resilient Distributed Datasets -RDDs in Spark Apcahe Spark has already taken over Hadoop (MapReduce) because of plenty of benefits it provides in terms of faster execution in iterative processing algorithms such as Machine learning. In...

Big Data / Hive / Java

October 18, 2015

What is Apache HCatalog ?

What is HCatalog ? Apache HCatalog is a Storage Management Layer for Hadoop that helps to users of different data processing tools in Hadoop ecosystem like Hive, Pig and MapReduce easily read and write data...

Big Data / Hive / Java / Technology

January 27, 2015

Hive : SORT BY vs ORDER BY vs DISTRIBUTE BY vs CLUSTER BY

In Apache Hive HQL, you can decide to order or sort your data differently based on ordering and distribution requirement. In this post we will look at how SORT BY, ORDER BY, DISTRIBUTE BY and...

Tagged: bigdata

What does Skipped Stage means in Spark WebUI ?

Like this:

Dataframe Operations in Spark using Scala

Like this:

Top 15 HDFS Interview Questions

Like this:

What is RDD in Spark ? and Why do we need it ?

Like this:

What is Apache HCatalog ?

Like this:

Hive : SORT BY vs ORDER BY vs DISTRIBUTE BY vs CLUSTER BY

Like this:

Tagged: bigdata

What does Skipped Stage means in Spark WebUI ?

Share this:

Like this:

Dataframe Operations in Spark using Scala

Share this:

Like this:

Top 15 HDFS Interview Questions

Share this:

Like this:

What is RDD in Spark ? and Why do we need it ?

Share this:

Like this:

What is Apache HCatalog ?

Share this:

Like this:

Hive : SORT BY vs ORDER BY vs DISTRIBUTE BY vs CLUSTER BY

Share this:

Like this: