Category Archives: Hive

What is Apache HCatalog ?

What is HCatalog ?

Apache HCatalog is a Storage Management Layer for Hadoop that helps to users of different data processing tools in Hadoop ecosystem like Hive, Pig and MapReduce easily read and write data from the cluster.HCatalog enables with relational view of data  from RCFile format, Parquet, ORC files, Sequence files stored on HDFS. It also exposes REST API exposed … Continue Reading ››

Hive : SORT BY vs ORDER BY vs DISTRIBUTE BY vs CLUSTER BY

In Apache Hive, It's always a matter of confusion that how SORT BY,  ORDER BY, DISTRIBUTE BY and CLUSTER BY differs. I have compiled a set of differences between these based on attributes like  how will final output look like and ordering of data in output -

SORT BY

Sort By vs Order By … <a href=Continue Reading ››

How-To : Use HCatalog with Pig

 Using HCatalog with Pig :-

This post is a step by step guide on running HCatalog and using HCatalog with Apache Pig :- Assumptions : Pig and Hive are installed and tested with basic modes. It requires Hive Metastore and it's databse to be properly configured ( Refer to Post ) Versions Tested With :-  HCatalog … Continue Reading ››

Hive Strict Mode

Sort By vs Order By vs Group By vs Cluster By in Hive

What is Hive Strict Mode ?

Hive Strict Mode ( hive.mapred.mode=strict) enables hive to restrict certain performance intensive operations. Such as -
  • It restricts queries of partitioned tables without a WHERE clause.

How-To : Connect HiveServer2 service with JDBC Client ?

HiveServer2 (HS2) is a server interface that enables remote clientsto execute queries against Hive and retrieve the results. The current implementation, based on Thrift RPC, is an improved version of HiveServer and supports multi-client concurrency and authentication. It is designed to provide better support for open API clients like JDBC and ODBC.imagesContinue Reading ››

How-To : Configure MySQL Metastore for Hive ?

Hive by default comes with Derby as its metastore storage, which is suited only for testing purposes and in most of the production scenarios it is recommended to use MySQL as a metastore. This is a step by step guide on How to Configure MySQL Metastore for Hive in place of Derby Metastore (Default). Assumptions : Basic knowledge of Unix is … Continue Reading ››