Install apache spark standalone

#INSTALL APACHE SPARK STANDALONE HOW TO#
#INSTALL APACHE SPARK STANDALONE INSTALL#
#INSTALL APACHE SPARK STANDALONE DOWNLOAD#

Download Apache Maven 3.1.1 from here Choose Maven 3.1.1.

#INSTALL APACHE SPARK STANDALONE INSTALL#

NOTE: Apache Maven installation is an optional step. I am mentioning it here because I want to install SparkR a R version of Spark. The spark shell only requires the Hadoop path which in this case holds the value to winutils that will let us compile the spark program on a windows environment.Īssign the variable value as the path to your Spark binary location. (Note: There is no need to install Hadoop. Create a new system variable and name it as.Set the variable value as the Python Path on your computer. Similarly, create a new system variable and name it as.(please type the path without the single quote) In my case it is 'C:\Program Files\Java\jdk1.7.0_79\' Next set the variable value as the JDK PATH. (in case JAVA is not installed on your computer then follow these steps). We will create two new system variables, So click on “New” button under System variable You will see the window divided into two parts, the upper part will read User variables for username and the lower part will read System variables.Under Start up & Recovery, Click on the button labelled as “Environment Variable”.Right click on Computer- Left click on Properties.If the Path variable is not properly setup, you will not be able to start the spark shell. Setting up the PATH variable in Windows environment : Remember, Spark is a engine built over Hadoop. Actually, the official release of Hadoop 2.6 does not include the required binaries (like winutils.exe) which are required to run Hadoop. Download and install winutils.exe and place it in any location in the D drive.If you are not a scala user then you also do not need to setup the scala path as the environment variable Download and install Scala version 2.10.4 from here only if you are a Scala user otherwise this step is not required.The benefit of using a pre-built binary is that you will not have to go through the trouble of building the spark binaries from scratch.(You can unzip it to any drive on your computer) Once downloaded I unzipped the *.tar file by using WinRar to the D drive.

I chose Spark release 1.2.1, package type Pre-built for Hadoop 2.3 or later from here. Download a pre-built Spark binary for Hadoop.If you are not a python user then you also do not need to setup the python path as the environment variable If you are a Python user then Install Python 2.6+ or above otherwise this step is not required.To install Spark on a windows based environment the following prerequisites should be fulfilled first. And finally, I was able to come up with the following brief steps that lead me to a working instantiation of Apache Spark.

#INSTALL APACHE SPARK STANDALONE HOW TO#

I invested two days searching the internet trying to find out how to install and configure it on a windows based environment. But I wanted to get a taste of this technology on my personal computer. However, they are using a pre-configured VM setup specific for the MOOC and for the lab exercises. In order to learn how to work on it currently there is a MOOC conducted by UC Berkley here. Apache Spark is a lightening fast cluster computing engine conducive for big data processing.