Go to

Sunday, September 9, 2018

Big Data: Apache Tez Execution engine on Apache Hive




What is Tez?
Tez is a new application framework built on Hadoop Yarn, which can execute complex-directed acyclic graphs of general data processing tasks. In many ways, it can be considered to be a much more flexible and powerful successor to the map-reduce framework.
Tez provides developers an API framework to write native YARN applications on Hadoop that bridges the spectrum of interactive and batch workloads. It allows those data access applications to work with petabytes of data over thousands of nodes.
In simple terms, Tez is a processing engine on the top of the Hadoop ecosystem, which runs on YARN and performs well within mixed workload clusters.

Tools information:
Tools which i  am using in this project are as below.
  • Hadoop: 2.7.5 version
  • Hive: 2.2.0 version
  • Tez: 0.9.0 version
  • Protobuf: 2.5.0 version

Tez Installation

Step 1. After installing Maven, install Protocol buffer 2.5.0 or higher versions of the same. You can download protocol buffer-2.5.0 from the following link:
By default, it will downloaded into your ‘Downloads’ folder


Step 2: Now, untar the file using the below command:
tar -xvf protobuf-2.5.0.tar.gz
Now, you can view the extracted file in the ‘Downloads’ folder itself.
Step 3: Next, let’s move the protobuf-2.5.0 folder into your desired location. Here, we are moving it into the home directory.
mv protobuf-2.5.0 $HOME/


Step 4: Now, open the protobuf-2.5.0 folder using the command


cd protobuf-2.5.0


Step 5: Now type the below commands to configure protocol buffer.
 sudo apt-get install autoconf autogen
./autogen.sh
./configure --prefix=/usr/local
Step 6: Execute the make command
make
Once configure has done its job, we can invoke make to build the software. This runs a series of tasks defined in a Makefile to build the finished program from its source code.
Step 7: Type the make install command
Sudo make install
Now that the software is built and ready to run, the files can be copied to their final destinations. The make install command will copy the built program, and its libraries and documentation, to the correct location
It will take some time for the process to be completed.
Step 8: Now, check if the protocol buffer is installed or not with the below command:
protoc --version
Output: libprotoc 2.5.0
Note: if any error shows like below
ERROR: protoc: error while loading shared libraries: libprotoc.so.8: cannot open shared object file: No such file or directory
Than use this command: export LD_LIBRARY_PATH=/usr/local/lib
We have successfully installed the protocol buffer! Let’s install Apache Tez now.
Step 9: You can download Apache Tez from the following link:
wget http://redrockdigimark.com/apachemirror/tez/0.9.0/apache-tez-0.9.0-bin.tar.gz
By default, it will downloaded into your ‘Downloads’ folder. You can move the folder into your desired location. Here, we are moving it into the ‘Home’ directory.
Step 10: Now, untar the file using the below command:
tar -xvf apache-tez-0.9.0-bin.tar.gz
Step 11: now move this folder to /usr/local or $home, where you want. A safe place for it.
mv apache-tez-0.9.0-bin /usr/local
And also make a new directory in hdfs and copy this untared tez folder to there.
hdfs dfs -mkdir -p /app/tez
Step 12: now edit tez-site.xml and mapred-site.xml in tez/conf directory as below (in both local and hdfs tez folders). If any of this file doesn’t exist there, than create it first.
  • Tez-site.xml (edit this below property in tez-site.xml)
<property>
   <name>tez.lib.uris</name>
   <value>hdfs://localhost:9000/app/tez/apache-tez-0.9.0-bin/share/tez.tar.gz</value>
   <type>string</type>
 </property>
  • Mapred-site.xml (add this property in mapred-site.xml)
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn-tez</value>
</property>
</configuration>
 After making change save these two files.
Step 13:  add below properties in hadoop/etc/hadoop/conf/  files
  • Hadoop-env.sh (add this properties in hadoop-env.sh file)
# for tez engine
export TEZ_CONF_DIR=/usr/local/apache-tez-0.9.0-bin/conf
export TEZ_JARS=/usr/local/apache-tez-0.9.0-bin/
export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*:${HADOOP_CLASSPATH}:
${JAVA_JDBC_LIBS}:${MAPREDUCE_LIBS}


  • Yarn-site.xml (add this properties in yarn-site.xml file)
<property>
   <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
    <description>Whether virtual memory limits will be enforced for containers</description>
</property>
<property>
  <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>4</value>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
</property>
Note: setting yarn.nodemanager.vmem-check-enabled to false value stops yarn to check container virtual memory limit, if in case that exceeds.
If we don’t do that then we make face error like below while running Tez as hive execution engine.
ErrorCode: failed execution error return code 1 from org.apache.hadoop.hive.ql.exec.tez.teztask
Container [pid=22291,containerID=container_1521621072290_0018_02_000001] is running beyond virtual memory limits. Current usage: 62.0 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used. Killing container.
Step 14: adding hive lib jars to Tez in HDFS. (for using Tez engine in hive we must add hive lib jars to tez/lib  also).
To add hive lib jar to local tez:
sudo cp /usr/local/hive/lib/  /usr/local/apache-tez-.0.9.0-bin/lib
To add hive lib jar to hdfs tez:
hdfs dfs -copyFromLocal file:///usr/local/hive/lib/ /app/tez/apache-tez-0.9.0-bin/lib
Now all the installation steps are over. Now Tez engine will work on the top of hadoop.
Now it’s time to test Tez engine:
To check our Tez engine we will test a query in Hive.
Follow the commands:
Step 1: start hadoop environment
Start-all.sh
And check if everything is running with jps command.
Command: jps
Output:
7074 RunJar
8805 NameNode
9766 Jps
9432 NodeManager
9146 SecondaryNameNode
9309 ResourceManager
8926 DataNode

Step 2: start hive from  command shell with by typing hive.
hduser@rdharm:/home/dharm$ hive
Now when a query in MR mode (mapreduce as hive engine) and you will see map-reduce steps.
Command:
Select category, count(*) as cnt from youtube.youtube_videos group by category order by cnt desc;
Output:
         

Step 3: now set hive.execution.engine=tez and rune that same query again. And you will see output as below.
Commands:
Set hive.execution.engine=tez;
Select category, count(*) as cnt from youtube.youtube_videos group by category order by cnt desc;

If everything works fine, than your tez engine is working good with hive.

No comments:

Post a Comment

Power BI Report and Dataset Performance Optimization

  Power BI Report and Dataset Performance Optimization     For any organization developing Power BI reports, there is a strong desire to des...