How to Configure Spark Application (Scala and Java 8 Version with Maven) in Eclipse

A step-by-step, developer-friendly guide to setting up Apache Spark applications in Eclipse/Scala IDE using Maven, with both Java and Scala examples. —

How to Configure Spark Application (Scala and Java 8 Version with Maven) in Eclipse

Introduction

Apache Spark is a popular big data processing engine known for its fast, in-memory computation. This guide will help you set up a Spark project in Eclipse (Scala IDE), configure Maven, and run sample Java and Scala Spark applications.

Tools and Prerequisites

Scala IDE for Eclipse (Download)
- Example: Scala IDE 4.7.0 (supports both Scala and Java)
Scala Version: 2.11 (ensure your compiler matches this)
Spark Version: 2.2 (set in Maven dependency)
Java Version: 1.8
Maven Version: 3.3.9 (embedded in Eclipse)
winutils.exe (for Windows only)

Windows Note: winutils.exe

If running on Windows, you need Hadoop binaries in Windows format. winutils.exe provides this functionality. Set the hadoop.home.dir system property to the bin path containing winutils.exe.

Download winutils.exe
Place it at: C:/hadoop/bin/winutils.exe
See this guide for more details.

Creating a Sample Spark Application in Eclipse

Maven Project Setup

In Scala IDE, create a new Maven Project.
Replace the generated pom.xml with the following:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<groupId>com.saurzcode.spark</groupId>
	<artifactId>spark-app</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<dependencies>
		<dependency> <!-- Spark dependency -->
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-core\_2.11</artifactId>
			<version>2.2.0</version>
			<scope>provided</scope>
		</dependency>
	</dependencies>
</project>

Java WordCount Example

Create a new Java class (e.g., JavaWordCount) and use the following code:

package com.saurzcode.spark;

import java.util.Arrays;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import scala.Tuple2;

public class JavaWordCount {
	public static void main(String[] args) throws Exception {
		String inputFile = "src/main/resources/input.txt";
		// Set HADOOP_HOME for Windows
		System.setProperty("hadoop.home.dir", "c://hadoop//");
		// Initialize Spark Context
		JavaSparkContext sc = new JavaSparkContext(new SparkConf().setAppName("wordCount").setMaster("local[4]"));
		// Load data from Input File
		JavaRDD<String> input = sc.textFile(inputFile);
		// Split up into words and count
		JavaPairRDD<String, Integer> counts = input
			.flatMap(line -> Arrays.asList(line.split(" ")).iterator())
			.mapToPair(word -> new Tuple2<>(word, 1))
			.reduceByKey((a, b) -> a + b);
		System.out.println(counts.collect());
		sc.stop();
		sc.close();
	}
}

Scala WordCount Example

Create a new Scala object (e.g., ScalaWordCount) and use the following code:

package com.saurzcode.spark

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object ScalaWordCount {
	def main(args: Array[String]): Unit = {
		// Set HADOOP_HOME for Windows
		System.setProperty("hadoop.home.dir", "c://hadoop//")
		// Create Spark context
		val sc = new SparkContext(new SparkConf().setAppName("Spark WordCount").setMaster("local[4]"))
		// Load input file
		val inputFile = sc.textFile("src/main/resources/input.txt")
		val counts = inputFile
			.flatMap(line => line.split(" "))
			.map(word => (word, 1))
			.reduceByKey(_ + _)
		counts.foreach(println)
		sc.stop()
	}
}

Tip: Make sure your project is set as a Scala project and the Scala compiler version matches the version in your Spark dependency. You can set this in the build path.

Running the Code in Eclipse

Run the Java or Scala code as a standard Java or Scala Application in Eclipse.
You should see the word count output and Spark log lines in the console.

Output

You should see output similar to:

(hello, 3)
(world, 2)
(example, 1)
...

How to Configure Spark Application ( Scala and Java 8 Version with Maven ) in Eclipse.

Categories

Tags

How to Configure Spark Application (Scala and Java 8 Version with Maven) in Eclipse

Table of Contents

Introduction

Tools and Prerequisites

Windows Note: winutils.exe

Creating a Sample Spark Application in Eclipse

Maven Project Setup

Java WordCount Example

Scala WordCount Example

Running the Code in Eclipse

Output

Categories

Tags

How to Configure Spark Application (Scala and Java 8 Version with Maven) in Eclipse

Table of Contents

Introduction

Tools and Prerequisites

Windows Note: winutils.exe

Creating a Sample Spark Application in Eclipse

Maven Project Setup

Java WordCount Example

Scala WordCount Example

Running the Code in Eclipse

Output

Related Posts

Dataframe Operations in Spark using Scala