Ehsan Ullah: Installing Spark on Windows 10

Prerequisites

Install and configure hadoop

1.Download Apache Spark

https://spark.apache.org/downloads.html
Under the Download Apache Spark heading, there are two drop-down menus. Use the current non-preview version.
In our case, in Choose a Spark release drop-down menu select 2.4.5
In the second drop-down Choose a package type, leave the selection Pre-built for Apache Hadoop 2.7
Click the spark-2.4.5-bin-hadoop2.7.tgz link

02. Create Folder path ‘C:\Spark’ and Extrcat the downloaded Spark file from ‘Download’ folder to ‘C:\Spark’

03. Set in the ‘Environment Variable’

04. Launch Spark

Open a new command-prompt window using the right-click and Run as administr
Run below the command ‘spark-shell’ from C:\Spark\bin

4.1 Finally, the Spark logo appears, and the prompt displays the Scala shell.

4.2 Open a web browser and navigate to http://localhost:4040/

4.3 To exit Spark and close the Scala shell, press ctrl-d in the command-prompt window.

4.4 Start Spark in ‘Pyspark’ as Shell

The PySpark shell is responsible for linking the python API to the spark core and initializing the spark context.

5. Start Master and Slave

Setup and Run Spark Master and Save on the Machine (Standalone)

Run Master
— — Open the ‘command Prompt’ from the path ‘C:\Spark\bin’
— — Run Below the command

C:\Spark\bin>spark-class2.cmd org.apache.spark.deploy.master.Master
C:\Spark\bin>spark-class org.apache.spark.deploy.master.Master

Access master’s web UI on the Url http://localhost:8080/

2. Run Slave

Open the command Prompt from the path ‘C:\Spark\bin’
Run Below the command

C:\Spark\bin>spark-class2.cmd org.apache.spark.deploy.worker.Worker -c 1 -m 4G spark://10.0.0.4:7077

Access Slave’s web UI on the Url http://localhost:8081/

Note : Make Sure Master and Slave Command Prompt are running

6. Web GUI

Apache Spark provides suite Web UI for monitor the status of your Spark/PySpark application, resource consumption of
Spark cluster, and Spark configurations.
Apache Spark Web UI

— Jobs
— Stages
— Tasks
— Storage
— Environment
— Executors
— SQL

Open a web browser and navigate to http://localhost:4040/

Note : Master and Slave should be started

Create A python program as below and save it as spark_basic.py on the desktop

spark_basic.py

    import findspark
    findspark.init('C:\Spark')

    from pyspark import SparkConf
    from pyspark import SparkContext

    conf = SparkConf()
    conf.setMaster('spark://10.0.0.4:7077')   # Mention the Master Node
    conf.setAppName('spark-basic')
    sc = SparkContext(conf=conf)

    def mod(x):
        import numpy as np
        return (x, np.mod(x, 2))

    rdd = sc.parallelize(range(1000)).map(mod).take(10)
    print(rdd)