How-To : Use HCatalog with Pig

 Using HCatalog with Pig :-

This post is a step by step guide on running HCatalog and using HCatalog with Apache Pig :-

Assumptions :

Pig and Hive are installed and tested with basic modes.

It requires Hive Metastore and it’s databse to be properly configured ( Refer to Post )

Versions Tested With :- 

HCatalog comes with Hive installation (0.11+) itself under folder $HIVE_HOME/hcatalog

Hive Version : 0.14 and above ( might work with version below this)

Pig Version : 0.14 and above (might work with version below this)

Let’s start with the configuration  –

Step 1 : Assuming HADOOP_HOME  is properly configured , let’s set up HCAT_HOME and PIG_CLASSPATH ( so that pig can know what jars needs to used for accessing Hcatalog Storage)  : –

export HIVE_HOME=/usr/local/hive
export HCAT_HOME=/usr/local/hive/hcatalog
export PIG_HOME=/usr/local/pig
export PATH=$PATH:$HCAT_HOME/bin
export PATH=$PATH:$PIG_HOME/bin
export PIG_CLASSPATH=$HCAT_HOME/share/hcatalog/hive-hcatalog-core*.jar:\
$HCAT_HOME/share/hcatalog/ hive-hcatalog-pig-adapter*.jar:\
$HIVE_HOME/lib/hive-metastore-*.jar:$HIVE_HOME/lib/libthrift-*.jar:\
$HIVE_HOME/lib/hive-exec-*.jar:$HIVE_HOME/lib/libfb303-*.jar:\
$HIVE_HOME/lib/jdo2-api-*-ec.jar:$HIVE_HOME/conf:$HADOOP_HOME/conf:\
$HIVE_HOME/lib/slf4j-api-*.jar

 

Step 2 : It is required that Hive metastore should be running in remote mode so that MetaStore client knows where is the metastore –

in $HIVE_HOME/conf/hive-site.xml –

Add or edit the hive.metastore.uris property as follows:

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://<hostname>:9083</value>
</property>

And, run

$ hive --service metastore &

and test if it is running through

$ netstat -an | grep 9083

 

Step 3 : Create a table using hcat –

# Create a table
$ hcat -e "create table hcatalogtest(name string,place string,id int) row format delimited fields terminated by ':' stored as textfile"
OK
# Get the schema for a table
$ hcat -e "desc hcatalogtest"
OK
name	string
place	string
id	int

Step 4 : Connecting Pig to Hcatalog : –

  • Create a pig script named hcatalogtest.pig
    A = LOAD 'hcatalogtest' USING org.apache.hive.hcatalog.pig.HCatLoader();
    DESCRIBE A;

    Note : Please note that  org.apache.hive.hcatalog.pig.HCatLoader is used and not org.apache.hcatalog.pig.HCatLoader ( which you will find in most of the illustrations available)

  • Run Pig script using flag -useHCatalog
    pig -useHCatalog hcatalogtest.pig

    This should give you the schema of the table  as output  :-

    A: {name: chararray,placeholder: chararray,id: int}
    

     

That’s it  !! You are all set with a basic HCatalog configuration up and running and integrated with Pig.

You may also like...

%d