How-To : Use HCatalog with Pig

 Using HCatalog with Pig :-

This post is a step by step guide on running HCatalog and using HCatalog with Apache Pig :-

Assumptions :

Pig and Hive are installed and tested with basic modes.

It requires Hive Metastore and it’s databse to be properly configured ( Refer to Post )

Versions Tested With :- 

HCatalog comes with Hive installation (0.11+) itself under folder $HIVE_HOME/hcatalog

Hive Version : 0.14 and above ( might work with version below this)

Pig Version : 0.14 and above (might work with version below this)

Let’s start with the configuration  –

Step 1 : Assuming HADOOP_HOME  is properly configured , let’s set up HCAT_HOME and PIG_CLASSPATH ( so that pig can know what jars needs to used for accessing Hcatalog Storage)  : –

 

Step 2 : It is required that Hive metastore should be running in remote mode so that MetaStore client knows where is the metastore –

in $HIVE_HOME/conf/hive-site.xml –

Add or edit the hive.metastore.uris property as follows:

And, run

and test if it is running through

 

Step 3 : Create a table using hcat –

Step 4 : Connecting Pig to Hcatalog : –

  • Create a pig script named hcatalogtest.pig

    Note : Please note that  org.apache.hive.hcatalog.pig.HCatLoader is used and not org.apache.hcatalog.pig.HCatLoader ( which you will find in most of the illustrations available)
  • Run Pig script using flag -useHCatalog

    This should give you the schema of the table  as output  :-

     

That’s it  !! You are all set with a basic HCatalog configuration up and running and integrated with Pig.

  • Ravi

    You don’t need the PIG_CLASSPATH= line if you use -useHCatalog on the command line. Some of the jars you have listed don’t even exist (because they have been renamed e.g. hcatalog-core*.jar is now hive-hcatalog-core*.jar

    • saurzcode

      Hi Ravi,

      Thanks for the comment. I will update this in post.

      Thanks,
      Saurabh

  • Pingback: What is HCatalog ?()

  • Adebiyi abdurrahman

    Step 3. Should have “desc hcatalogtest” not “desc hcataologtest”

    • saurzcode

      Yes, Thanks for pointing the typo.

  • Grégoire Lafortune

    work well for me with pig-0.15.0 and apache-hive-2.0.0-bin. Thanks a lot !! Before finding this post I lost a lot of time trying using hiveserver2 on thrift://localhost:10000 without success.

    • saurzcode

      Thanks. Good to know that.