How-To : Connect HiveServer2 service with JDBC Client ?

HiveServer2 (HS2) is a server interface that enables remote clients to execute queries against Hive and retrieve the results. The current implementation, based on Thrift RPC, is an improved version of HiveServer and supports multi-client concurrency and authentication. It is designed to provide better support for open API clients like JDBC and ODBC.

3 min read

How-To : Configure MySQL Metastore for Hive ?

How to Configure MySQL Metastore for Hive

Hive by default comes with Derby as its metastore storage, which is suited only for testing purposes and in most of the production scenarios it is recommended to use MySQL as a metastore. This is a step by step guide on How to Configure MySQL Metastore for Hive in place of Derby Metastore (Default).

2 min read

Java : What does finalize do and How?

Understanding the finalize Method in Java

finalize method in the Object class is often a point of discussion regarding whether it should be used or not. Below are some important pointers on the finalize method:

2 min read

How-To : Generate Restful API Documentation with Swagger ?

Swagger is a specification and complete framework implementation for describing, producing, consuming, and visualizing RESTful web services. The goal of Swagger is to enable client and documentation systems to update at the same pace as the server. The documentation of methods, parameters, and models are tightly integrated into the server code, allowing APIs to always stay in sync.

2 min read

How-To : Setup Realtime Alalytics over Logs with ELK Stack : Elasticsearch, Logstash, Kibana?

Once we know something, we find it hard to imagine what it was like not to know it.

— Chip & Dan Heath, Authors of Made to Stick, Switch


What is the ELK stack?

The ELK stack consists of open source tools ElasticSearch, Logstash, and Kibana. These three provide a fully working real-time data analytics tool for getting wonderful information sitting on your data.

3 min read

Hadoop : Getting Started with Pig

What is Apache Pig?

Apache Pig is a high level scripting language that is used with Apache Hadoop. It enables data analysts to write complex data transformations without knowing Java. Its simple SQL-like scripting language is called Pig Latin, and appeals to developers already familiar with scripting languages and SQL. Pig Scripts are converted into MapReduce Jobs which run on data stored in HDFS (refer to the diagram below).

3 min read

Top 10 Hadoop Shell Commands to manage HDFS

So you already know what Hadoop is? Why it is used for? What problems you can solve with it? And you want to know how you can deal with files on HDFS? Don’t worry, you are in the right place.

2 min read