Go to

Saturday, October 17, 2020

Cloudera HUE: Connecting HUE with Spark Livy Cluster

Hue is a web-based interface that supports Hadoop environment and its ecosystem tools. Hue notebook development environment is used for hive, pig, spark, impala, HBase etc.

Here we are going to connect apache spark with hue web-based interface and run spark a job with hue.

 

                  

 

Tools Required:

      Apache Spark. (2.2.0)

      Hue. (4.1.0)

      Apache Livy server.(0.5.0)

Here we assume spark and hue is pre-installed on the system.

Apache Livy server:

Livy spark server is a REST full API for Apache spark, giving user remote interaction with Apache spark.

Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface. It enables easy submission of Spark jobs or snippets of Spark code, synchronous or asynchronous result retrieval, as well as Spark Context management, all via a simple REST interface or an RPC client library. Apache Livy also simplifies the interaction between Spark and application servers, thus enabling the use of Spark for interactive web/mobile applications. Additional features include:

      Have long-running Spark Contexts that can be used for multiple Spark jobs, by multiple clients

      Share cached RDDs or Dataframes across multiple jobs and clients

      Multiple Spark Contexts can be managed simultaneously, and the Spark Contexts run on the cluster (YARN/Mesos) instead of the Livy Server, for good fault tolerance and concurrency

      Jobs can be submitted as precompiled jars, snippets of code or via java/scala client API

      Ensure security via secure authenticated communication

 

 

 

Step 1: Download Apache Livy from below link:

https://www.apache.org/dyn/closer.lua/incubator/livy/0.5.0-incubating/livy-0.5.0-incubating-bin.zip

Make the directory in /usr/local with a proper name and unzip it in that directory.

             Sudo mkdir /usr/local/livy

unzip  livy-0.5.1.zip /usr/local/livy

 

Step 2: Add the following path to bashrc file in your system.

Command: Sudo gedit ~/.bashrc

#set environment variables for Livy

export LIVY_HOME=/usr/local/livy-0.5.0

export LIVY_LOG_DIR=$LIVY_HOME/logs

export PATH=$PATH:$LIVY_HOME/bin

export SPARK_HOME=$SPARK_HOME

 

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export HUE_SECRET_KEY=hue superuser passkey

And run source ~/.bashrc command.

 

Step 3: Make a directory in $LIVY_HOME to save logs

            sudo makdir $LIVY_HOME/logs

 

Step 4: open hue.ini file in $HUE_HOME/desktop/conf/hue.ini   and add bellow parameters in that file.

 

[spark]

  # Host address of the Livy Server.

  livy_server_host=localhost

 

  # Port of the Livy Server.

  livy_server_port=8998

 

  # Configure Livy to start in local 'process' mode, or 'yarn' workers.

  livy_server_session_kind=yarn

 

  # Whether Livy requires the client to perform Kerberos authentication.

  security_enabled=false

 

  # Host of the SQL Server

  sql_server_host=localhost

 

  # Port of the Sql Server

  sql_server_port=10000

 

Save hue.ini file.

 

Step 5: run livy server before starting hue server.


            $LIVY_HOME/bin/livy-server

           

 

And start hue server after it.

 

Step 6: now open localport:8000 and log into hue with your user_id and Password.

Goto query>editor>notebook

 

 

 

And write a print function to check spark.

 

 

 

Congrats, nowhere you can write your pyspark, scala and R code for spark.

Spark-summit can also be used from it.

 

 

No comments:

Post a Comment

Power BI Report and Dataset Performance Optimization

  Power BI Report and Dataset Performance Optimization     For any organization developing Power BI reports, there is a strong desire to des...