Big Data & Data Science: Cloudera HUE: Connecting HUE with Spark Livy Cluster

Hue is a web-based interface that supports Hadoop environment and its ecosystem tools. Hue notebook development environment is used for hive, pig, spark, impala, HBase etc.

Here we are going to connect apache spark with hue web-based interface and run spark a job with hue.

Tools Required:

● Apache Spark. (2.2.0)

● Hue. (4.1.0)

● Apache Livy server.(0.5.0)

Here we assume spark and hue is pre-installed on the system.

Apache Livy server:

Livy spark server is a REST full API for Apache spark, giving user remote interaction with Apache spark.

Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface. It enables easy submission of Spark jobs or snippets of Spark code, synchronous or asynchronous result retrieval, as well as Spark Context management, all via a simple REST interface or an RPC client library. Apache Livy also simplifies the interaction between Spark and application servers, thus enabling the use of Spark for interactive web/mobile applications. Additional features include:

● Have long-running Spark Contexts that can be used for multiple Spark jobs, by multiple clients

● Share cached RDDs or Dataframes across multiple jobs and clients

● Multiple Spark Contexts can be managed simultaneously, and the Spark Contexts run on the cluster (YARN/Mesos) instead of the Livy Server, for good fault tolerance and concurrency

● Jobs can be submitted as precompiled jars, snippets of code or via java/scala client API

● Ensure security via secure authenticated communication

Step 1: Download Apache Livy from below link:

https://www.apache.org/dyn/closer.lua/incubator/livy/0.5.0-incubating/livy-0.5.0-incubating-bin.zip

Make the directory in /usr/local with a proper name and unzip it in that directory.

Sudo mkdir /usr/local/livy

unzip livy-0.5.1.zip /usr/local/livy

Step 2: Add the following path to bashrc file in your system.

Command: Sudo gedit ~/.bashrc

#set environment variables for Livy

export LIVY_HOME=/usr/local/livy-0.5.0

export LIVY_LOG_DIR=$LIVY_HOME/logs

export PATH=$PATH:$LIVY_HOME/bin

export SPARK_HOME=$SPARK_HOME

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export HUE_SECRET_KEY=hue superuser passkey

And run source ~/.bashrc command.

Step 3: Make a directory in $LIVY_HOME to save logs

sudo makdir $LIVY_HOME/logs

Step 4: open hue.ini file in $HUE_HOME/desktop/conf/hue.ini and add bellow parameters in that file.

[spark]

# Host address of the Livy Server.

livy_server_host=localhost

# Port of the Livy Server.

livy_server_port=8998

# Configure Livy to start in local 'process' mode, or 'yarn' workers.

livy_server_session_kind=yarn

# Whether Livy requires the client to perform Kerberos authentication.

security_enabled=false

# Host of the SQL Server

sql_server_host=localhost

# Port of the Sql Server

sql_server_port=10000

Save hue.ini file.

Step 5: run livy server before starting hue server.

$LIVY_HOME/bin/livy-server

And start hue server after it.

Step 6: now open localport:8000 and log into hue with your user_id and Password.

Goto query>editor>notebook

And write a print function to check spark.

Congrats, nowhere you can write your pyspark, scala and R code for spark.

Spark-summit can also be used from it.

Big Data & Data Science

Go to

Saturday, October 17, 2020

Cloudera HUE: Connecting HUE with Spark Livy Cluster

No comments:

Post a Comment

Power BI Report and Dataset Performance Optimization