Hadoop is a big data tool which is used for making ETL on Petascale data. Apache Hadoop
comes with HDFS files system, YARN resource manager and MapReduce data processing.
Hadoop uses parallel processing on distributed data over the cluster with the MapReduce algorithm.
Process for installation will be:
1. Installation
2. Update $HOME/.bashrc
4.Configuration. hadoop-env.sh. conf/*-site.xml
5. Formatting the HDFS filesystem via the NameNode
6. Starting your single-node cluster
Hadoop Installation steps:
sudo apt-get update #Update list of app in apt-get
sudo apt-get install default-jdk #Install java on system
java -version #Now check the version
java -version #Now check the version
2. Installing SSH:
ssh: The command we use to connect to remote machines - the client.
sshd: The daemon that is running on the server and allows clients to connect to the server.
The ssh is pre-enabled on Linux, but in order to start sshd daemon, we need to install ssh first.
Use this command to do that.
sudo apt-get install ssh
Use this command to do that.
sudo apt-get install ssh
3. Create and Setup SSH Certificates:
Hadoop requires SSH access to manage its nodes, i.e. remote machines plus our local machine.
For our single-node setup of Hadoop, therefore, we need to configure SSH access to Localhost.
ssh-keygen -t rsa -P ""
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
The second command adds the newly created key to the list of authorized keys so that
Hadoop can use ssh without prompting for a password.
Hadoop can use ssh without prompting for a password.
We can check if ssh work
ssh localhost
4. Download Hadoop:
We will be installing Hadoop 2.7.5 version from below link.
http://www-eu.apache.org/dist/hadoop/common/
Make a directory for hadoop in your /usr/local
sudo mkdir /usr/local/hadoop
And copy that downloaded file into this directory by below command
sudo cp ‘path to hadoop zip’ ‘usr/local/hadoop’
5. Installing hadoop:
Now when you have hadoop2.7.5 in /usr/local/hadoop folder, so open a command terminal and Make a directory for hadoop in your /usr/local
sudo mkdir /usr/local/hadoop
And copy that downloaded file into this directory by below command
sudo cp ‘path to hadoop zip’ ‘usr/local/hadoop’
5. Installing hadoop:
goto hadoop directory, extract hadoop file.
cd /usr/local/hadoop
sudo tar -xzvf hadoop-2.7.3.tar.gz
Now add hadoop path to your bashrc file in system.
sudo gedit ~/.bashrc
Bashrc file will open, add bellow path in it
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_INSTALL=/usr/local/hadoop/hadoop2.7.5
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
Run bashrc file now.
source ~/.bashrc
6. Changing Files:
In ~/hadoop/hadoop2.7.5/etc/hadoop/conf directory.
Open below files one by one and edit them.
1. Hadoop-env.sh:open this file in editor and add below command in it.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-am
1. Hadoop-env.sh:open this file in editor and add below command in it.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-am
2. Core-site.xml: create a directory in hadoop2.7.5 directory by name data where all name
and data node will be created.
<configuration>
directories. </description>
</property>
<description>
The name of the default file system.
A URI whose scheme and authority determine
the FileSystem implementation.
Theuri's scheme determines the config property
fs.SCHEME.impl) naming
the FileSystem implementation class.
The uri's authority is used to
determine the host, port, etc.for a filesystem.</description>
as a single map and reduce task.
</description>
</property>
</configuration>
4. hdfs-site.xml.
<description>
Default block replication.
The actual number of replications can
be specified when the file is created.
</property>
<value>f/usr/local/hadoop/hadoop2.7.5/data/datanode</value>
</property>
and data node will be created.
<configuration>
<property> <name>hadoop.tmp.dir</name>
<value>/path/hadoop2.7.5/data</value> <description> A base for other temporary directories. </description>
</property>
<property> <name>fs.default.name</name>
<value>hdfs://localhost:9000</value><description>
The name of the default file system.
A URI whose scheme and authority determine
the FileSystem implementation.
Theuri's scheme determines the config property
fs.SCHEME.impl) naming
the FileSystem implementation class.
The uri's authority is used to
determine the host, port, etc.for a filesystem.</description>
</property>
</configuration>
3. mapred-site.xml
3. mapred-site.xml
<configuration>
<property> <name>mapred.job.tracker</name>
<value>localhost:9000</value>
<description>
The host and port that the
MapReduce job tracker runs
at. If "local", then jobs are run in-process <value>localhost:9000</value>
<description>
The host and port that the
MapReduce job tracker runs
as a single map and reduce task.
</description>
</property>
</configuration>
4. hdfs-site.xml.
<configuration>
<property> <name>dfs.replication</name>
<value>1</value><description>
Default block replication.
The actual number of replications can
be specified when the file is created.
The default is used if replication is
not specified in create time.
</description>not specified in create time.
</property>
<property> <name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/hadoop2.7.5/data/namenode</value>
</property>
<property> <name>dfs.datanode.data.dir</name><value>/usr/local/hadoop/hadoop2.7.5/data/namenode</value>
</property>
<value>f/usr/local/hadoop/hadoop2.7.5/data/datanode</value>
</property>
</configuration>
Note: if any of this file have file.template.xml format or .template then change it to XML and edit.
Now save every file you have edit and close them.
Note: if any of this file have file.template.xml format or .template then change it to XML and edit.
Now save every file you have edit and close them.
7. Time to run hadoop:
Now, the Hadoop file system needs to be formatted so that we can start to use it.
The format command should be issued with write permission since it creates a current directory
under /usr/local/hadoop/hadoop2.7.5/hdfs/namenode folder:
hadoop namenode -format
Now start hadoop system with
Now, the Hadoop file system needs to be formatted so that we can start to use it.
The format command should be issued with write permission since it creates a current directory
under /usr/local/hadoop/hadoop2.7.5/hdfs/namenode folder:
hadoop namenode -format
Now start hadoop system with
start-all.sh
Now check if all is running properly or not with jps command.
hduser@rdharm:/home/dharm$ jps
6016 Jps
5232 SecondaryNameNode
5461 ResourceManager
5048 DataNode
4924 NameNode
5596 NodeManager
Note: If you want to check your resource manager and Namenode, Datanodes on web UI, then check
http://localhost:8088/cluster for resource manager monitoring.
http://localhost:50070 for namenode monitoring.
If everything works fine than congrats your Hadoop is working fine.
No comments:
Post a Comment