Muhammad Ehsan : Installing Apache Zeppelin on a Hadoop Cluster

Installing Apache Zeppelin on a Hadoop Cluster

Apache Zeppelin(https://zeppelin.incubator.apache.org/) is a web-based notebook that enables interactive data analytics. You can make data-driven, interactive and collaborative documents with SQL, Scala and more.

This document describes the steps you can take to install Apache Zeppelin on a CentOS 7 Machine.

Steps

Note: Run all the commands as Root

Configure the Environment

Install Maven (If not already done)

cd /tmp/

wget https://archive.apache.org/dist/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.tar.gz

tar xzf apache-maven-3.1.1-bin.tar.gz -C /usr/local

cd /usr/local

ln -s apache-maven-3.1.1 maven

Configure Maven (If not already done)

#Run the following

export M2_HOME=/usr/local/maven

export M2=${M2_HOME}/bin

export PATH=${M2}:${PATH}

Note: If you were to login as a different user or logout these settings will be whipped out so you won’t be able to run any mvn commands. To prevent this, you can append these export statements to the end of your ~/.bashrc file:

#append the export statements

vi ~/.bashrc

#apply the export statements

source ~/.bashrc

Install NodeJS

Note: Steps referenced from https://nodejs.org/en/download/package-manager/

curl --silent --location https://rpm.nodesource.com/setup_5.x | bash -

yum install -y nodejs

Install Dependencies

Note: Used for Zeppelin Web App

yum install -y bzip2 fontconfig

Install Apache Zeppelin

Select the version you would like to install

View the available releases and select the latest:

https://github.com/apache/zeppelin/releases

Override the {APACHE_ZEPPELIN_VERSION} placeholder with the value you would like to use.

Download Apache Zeppelin

cd /opt/

wget https://github.com/apache/zeppelin/archive/{APACHE_ZEPPELIN_VERSION}.zip

unzip {APACHE_ZEPPELIN_VERSION}.zip

ln -s /opt/zeppelin-{APACHE_ZEPPELIN_VERSION-WITHOUT_V_INFRONT} /opt/zeppelin

rm {APACHE_ZEPPELIN_VERSION}.zip

Get Build Variable Values

Get Spark Version

Running the following command

spark-submit --version

Override the {SPARK_VERSION} placeholder with this value.

Example: 1.6.0

Get Hadoop Version

Running the following command

hadoop version

Override the {HADOOP_VERSION} placeholder with this value.

Example: 2.6.0-cdh5.9.0

Take the this value and get the major and minor version of Hadoop. Override the {SIMPLE_HADOOP_VERSION} placeholder with this value.

Example: 2.6

Build Apache Zeppelin

Update the bellow placeholders and run

cd /opt/zeppelin

mvn clean package -Pspark-{SPARK_VERSION} -Dhadoop.version={HADOOP_VERSION} -Phadoop-{SIMPLE_HADOOP_VERSION} -Pvendor-repo -DskipTests

Note: this process will take a while

Configure Apache Zeppelin

Base Zeppelin Configuration

Setup Conf

cd /opt/zeppelin/conf/

cp zeppelin-env.sh.template zeppelin-env.sh

cp zeppelin-site.xml.template zeppelin-site.xml

Setup Hive Conf

# note: verify that the path to your hive-site.xml is correct

ln -s /etc/hive/conf/hive-site.xml /opt/zeppelin/conf/

Edit zeppelin-env.sh

Uncomment export HADOOP_CONF_DIR

Set it to export HADOOP_CONF_DIR=“/etc/hadoop/conf”

Starting/Stopping Apache Zeppelin

Start Zeppelin

/opt/zeppelin/bin/zeppelin-daemon.sh start

Restart Zeppelin

/opt/zeppelin/bin/zeppelin-daemon.sh restart

Stop Zeppelin

/opt/zeppelin/bin/zeppelin-daemon.sh stop

Viewing Web UI

Once the zeppelin process is running you can view the WebUI by opening a web browser and navigating to:

http://{HOST}:8080/

Note: Network rules will need to allow this communication

Runtime Apache Zeppelin Configuration

Further configurations maybe needed for certain operations to work

Configure Hive in Zeppelin

Open the cloudera manager and get the public host name of the machine that has the HiveServer2 role. Identify this as HIVESERVER2_HOST

Open the Web UI and click the Interpreter tab

Change the Hive default.url option to: jdbc:hive2://{HIVESERVER2_HOST}:10000

Muhammad Ehsan

Home

Installing Apache Zeppelin on a Hadoop Cluster

Recommendations

Application ISSUES

Designed By Webmaster

Contact Information

Topics

ME

Traffic Solution

City I live in