How-to : Write a CoProcessor in HBase

What is Coprocessor in HBase ?

Coprocessor is a mechanism which helps to move computations closer to the data in HBase. It is like a Mapreduce framework to distribute tasks across the cluster.hbase_logo

You can think of them like either Aspects in Java  where it intercepts code before and after some critical operations and executes some user supplied behavior or Triggers or Stored Procedures in RDBMS which gets executed at run time and near to the data.

Coprocessor functionality helps to run a custom code directly on the region server which gives following benefits –

  • Coprocessor executes the code on Per-Region basis , and gives a RDBMS trigger and stored procedures like functionality ( we will see in a bit, how ? )
  • It can help to execute aggregation tasks like sum() , count() etc on each region server basis and then return the aggregated result, which is a huge performance benefit.
  • It is helpful to implement Authorization and Authentication mechanism in HBase.

How to Implement it?

Either your class should extend one of the Coprocessor classes (like BaseRegionObserver) or it should implement Coprocessor interfaces (like Coprocessor, CoprocessorService).

Loading  the Coprocessor 

Currently there are two ways to load the Coprocessor.

  • Static Loading – Loading from configuration at startup.
  • Dynamic Loading – Loading through HTableDescriptor either through Java code or through ‘hbase shell’).

Let’s see client-side code to call the Coprocessor with the static loading technique. This is the easiest step, as HBase handles the Coprocessor transparently and you don’t have to do much to call the Coprocessor.

We are trying to intercept the calls to get rows with a specific key called “TEST_COPROCESSOR” , and whenever we encounter get for that particular row, we will replace value of the column family name and column first with “saurzcode”. Let’s see it in action –

Create a maven project with dependency –

 

 

Create a jar with above coprocessor code using mvn package and add the class in hbase-site.xml as under and jar to HBASE classpath.

hbase-site.xml

and

Restart hbase so that configuration is in classpath –

That’s it !! We have successfully created a CoProcessor that intercepts calls to get for a particular row.

  • Bhanu Pratap

    Great article Saurzcode .Works for me in single node.Please give cluster node configuration also

    • Thanks Bhanu. I will soon post one for Cluster Node Configuration.

      • Bhanu Pratap

        That’s Great.

  • Karan Bansal

    Tried using above for preGet and preScan . Following dependency being used:

    org.apache.hbase
    hbase
    0.94.6-cdh4.3.0

    Get works fine but preGet doesnt get called. Please help!

    • saurzcode

      Hi Karan, Are you getting any errors?