Hue is a web-based interface that supports Hadoop environment and its ecosystem tools. Hue notebook development environment is used for hive, pig, spark, impala, HBase etc.
Here
we are going to connect apache spark with hue web-based interface and run spark a job with hue.
Tools Required:
● Apache Spark. (2.2.0)
● Hue. (4.1.0)
● Apache Livy server.(0.5.0)
Here we assume spark and hue is pre-installed on the system.
Apache Livy server:
Livy
spark server is a REST full API for Apache spark, giving user remote
interaction with Apache spark.
Apache Livy is a
service that enables easy interaction with a Spark cluster over a REST
interface. It enables easy submission of Spark jobs or snippets of Spark code,
synchronous or asynchronous result retrieval, as well as Spark Context
management, all via a simple REST interface or an RPC client library. Apache
Livy also simplifies the interaction between Spark and application servers,
thus enabling the use of Spark for interactive web/mobile applications. Additional
features include:
●
Have long-running Spark Contexts that can be
used for multiple Spark jobs, by multiple clients
●
Share cached RDDs or Dataframes across multiple
jobs and clients
●
Multiple Spark Contexts can be managed
simultaneously, and the Spark Contexts run on the cluster (YARN/Mesos) instead
of the Livy Server, for good fault tolerance and concurrency
●
Jobs can be submitted as precompiled jars,
snippets of code or via java/scala client API
●
Ensure security via secure authenticated
communication
Step 1: Download Apache Livy from below link:
https://www.apache.org/dyn/closer.lua/incubator/livy/0.5.0-incubating/livy-0.5.0-incubating-bin.zip
Make the directory in /usr/local with a proper name and unzip it in that directory.
Sudo mkdir /usr/local/livy
unzip livy-0.5.1.zip /usr/local/livy
Step 2: Add the following path to bashrc file in
your system.
Command: Sudo gedit ~/.bashrc
#set
environment variables for Livy
export
LIVY_HOME=/usr/local/livy-0.5.0
export
LIVY_LOG_DIR=$LIVY_HOME/logs
export
PATH=$PATH:$LIVY_HOME/bin
export
SPARK_HOME=$SPARK_HOME
export
HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export
HUE_SECRET_KEY=hue superuser passkey
And
run source ~/.bashrc command.
Step 3: Make a directory in $LIVY_HOME to
save logs
sudo makdir $LIVY_HOME/logs
Step 4: open hue.ini file in
$HUE_HOME/desktop/conf/hue.ini and add
bellow parameters in that file.
[spark]
# Host address of the Livy Server.
livy_server_host=localhost
# Port of the Livy Server.
livy_server_port=8998
# Configure Livy to start in local 'process'
mode, or 'yarn' workers.
livy_server_session_kind=yarn
# Whether Livy requires the client to perform
Kerberos authentication.
security_enabled=false
# Host of the SQL Server
sql_server_host=localhost
# Port of the Sql Server
sql_server_port=10000
Save hue.ini file.
Step 5: run livy server before starting hue
server.
$LIVY_HOME/bin/livy-server
And
start hue server after it.
Step 6: now open localport:8000 and log into hue with your user_id and Password.
Goto
query>editor>notebook
And
write a print function to check spark.
Congrats,
nowhere you can write your pyspark, scala and R code for spark.
Spark-summit
can also be used from it.
No comments:
Post a Comment