Installing Hadoop
SSH Localhost
Configuring Hadoop
Starting and Stopping Hadoop
Good to know
- Additional Resources
- Github Wordcount example.
Install HomeBrew
Found here: http:/// or simply paste this inside the terminal
$ ruby -e "$(curl -fsSL https://raw./Homebrew/install/master/install)"
Install Hadoop
$ brew install hadoop
Hadoop will be installed in the following directory
/usr/local/Cellar/hadoop
Configuring Hadoop
Edit hadoop-env.sh
The file can be located at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/hadoop-env.sh
where 2.6.0 is the hadoop version.
Find the line with
export
HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
and change it to
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
这里要不要在后面加上=true呢???(现在用的是原来的那个。没有改)
Edit Core-site.xml
The file can be located at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/core-site.xml
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/Cellar/hadoop/hdfs/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>
fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
Edit mapred-site.xml
The file can be located at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/mapred-site.xml and by default will be blank.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Edit hdfs-site.xml
The file can be located at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/Cellar/hadoop/2.6.0/hadoop_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/Cellar/hadoop/2.6.0/hadoop_data/hdfs/datanode</value>
</property>
</configuration>
Edit yarn-site.xml
$ sudo vi yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value> org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
####### 解决datanode链接namenode总是给出IP地址为0.0.0.0的问题######(待商榷)###
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8031</value>
</property>
######################end of yarn-site.xml #############
To simplify life edit your ~/.profile using vim, and add the following two commands
alias hstart="/usr/local/Cellar/hadoop/2.6.0/sbin/start-dfs.sh;/usr/local/Cellar/hadoop/2.6.0/sbin/start-yarn.sh"
alias hstop="/usr/local/Cellar/hadoop/2.6.0/sbin/stop-yarn.sh;/usr/local/Cellar/hadoop/2.6.0/sbin/stop-dfs.sh"
and execute
$ source ~/.profile
in the terminal to update.
Before we can run Hadoop we first need to format the HDFS using
$ hdfs namenode -format
SSH Localhost
Nothing needs to be done here if you have already generated ssh keys. To verify just check for the existance of ~/.ssh/id_rsa and the ~/.ssh/id_rsa.pub files. If not the keys can be generated using
$ ssh-keygen -t rsa
Enable Remote Login
“System Preferences” -> “Sharing”. Check “Remote Login”
Authorize SSH Keys
To allow your system to accept login, we have to make it aware of the keys that will be used
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Let’s try to login. (这样不用输密码了)
$ ssh localhost > Last login: Fri Mar 6 20:30:53 2015 $ exit
Change owner of foldersmkdir -p /usr/local/Cellar/hadoop/2.6.0/hadoop_data/hdfs/namenode
mkdir -p /usr/local/Cellar/hadoop/2.6.0/hadoop_data/hdfs/datanode
sudo chown -R Tiger /usr/local/Cellar/hadoop
Running Hadoop
Now we can run Hadoop just by typing
$ hstart
and stopping using
$ hstop
(在这之后,别的都可以了,但是resourcemanager没有启动起来)
$sbin/yarn-daemon.sh start resourcemanager (这个在hadoop目录下)
starting resourcemanager, logging to /usr/local/Cellar/hadoop/2.6.0/libexec/logs/yarn-Tiger-resourcemanager-MacMini.out
nohup: can't detach from console: No such file or directory
(但那个目录明明是有的啊???? FYI 如果成功的话,不会有nohup那一行)
解决方法是:
Download Examples
To run examples, Hadoop needs to be started.
Hadoop Examples 1.2.1 (Old)
Hadoop Examples 2.6.0 (Current)
Test them out using:
$ hadoop jar <path to the hadoop-examples file> pi 10 100
Good to know
We can access the Hadoop web interface by connecting to
Resource Manager: http://localhost:50070
JobTracker: http://localhost:8088
Specific Node Information: http://localhost:8042
This we can use to access the HDFS filesystem, for any resulting output files.