Spark – How to Run Spark Applications on Windows

dataframe

Whether you want to unit test your Spark Scala application using Scala Tests or want to run some Spark application on Windows, you need to perform a few basics settings and configurations before you do so. In this post, I will explain the configurations that will help you start your journey to run your spark application seamlessly on your windows machines. Let’s get started –

First, note that you don’t need Hadoop installation in your windows machine to run Spark. You need a way to use POSIX like file access operations in windows which is implemented using winutils.exe using some Windows APIs.

Step 1. Download winutils.exe binary from this link – https://github.com/steveloughran/winutils, and place it on a folder like this – – C:/hadoop/bin, make sure you are downloading the same version as on which your Spark version is compiled against. You can check the version of Hadoop your spark version was compiled with using pom of spark binary you are using – https://search.maven.org/artifact/org.apache.spark/spark-parent_2.11/2.4.4/pom

Step 2. set HADOOP_HOME and PATH – In your environment variables either using Control Panel ( available to all apps – recommended option) or on command prompt ( for the current session) –  set HADOOP_HOME as C:/hadoop or the path inside which you created bin directory where winutils.exe is present.

set HADOOP_HOME=c:/hadoop

Next is to add %HADOOP_HOME%/bin to the PATH.

set PATH=%HADOOP_HOME%/bin;%PATH%

That’s all !!

Now you can run any spark app on your local windows machine in IntelliJ, Eclipse, or in spark-shell. Please comment below for any issues!

More Spark Posts – 

What does Skipped Stage mean in Spark WebUI?

Dataframe Operations in Spark using Scala

How to Configure Spark Application ( Scala and Java 8 Version with Maven ) in Eclipse.


You may also like...

%d bloggers like this: