Category: Kafka

How-To : Integrate Kafka with HDFS using Camus (Twitter Stream Example)


Simple String Example for Setting up Camus for Kafka-HDFS Data Pipeline

I came across Camus while building a Lambda Architecture framework recently. I couldn’t find a good Illustration of getting started with Kafk-HDFS pipeline ,  In this post we will see how we can use Camus to build a Kafka-HDFS data pipeline using a twitter stream produced by Kafka Producer as mentioned in last post .

What is Camus?

Camus is LinkedIn’s Kafka->HDFS pipeline. It is a mapreduce job that does distributed data loads out of Kafka. It includes the following features: (more…)


How-To : Write a Kafka Producer using Twitter Stream ( Twitter HBC Client)

Twitter opensourced it’s  Hosebird client (hbc) , a robust Java HTTP library for consuming Twitter’s Streaming API . In this post, I am going to present a demo of how we can use hbc to create a Kafka twitter stream producer , which tracks few terms on twitter statuses  and produces a kafka stream out of it, which can be utilized later for counting the terms, or putting that data from Kafka to Storm (Kafka-Storm pipeline ) or HDFS ( as we will see in next post for  using Camus API ).

You can download and run complete Sample here (more…)

%d bloggers like this: