Installing Hadoop on Mac OSX Yosemite Tutorial

Installing Hadoop
SSH Localhost
Configuring Hadoop
Starting and Stopping Hadoop
Good to know

Additional Resources
Github Wordcount example.

Install HomeBrew

Found here: http:/// or simply paste this inside the terminal

$ ruby -e "$(curl -fsSL https://raw./Homebrew/install/master/install)"

Install Hadoop

$ brew install hadoop

Hadoop will be installed in the following directory
/usr/local/Cellar/hadoop

Configuring Hadoop

Edit hadoop-env.sh

The file can be located at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/hadoop-env.sh
where 2.6.0 is the hadoop version.

Find the line with

export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

and change it to

export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="

这里要不要在后面加上=true呢？？？（现在用的是原来的那个。没有改)

Edit Core-site.xml

The file can be located at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/core-site.xml

 <configuration>
  <property>
     <name>hadoop.tmp.dir</name>  
    <value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
    <description>A base for other temporary directories.</description>
  </property>
  <property>
     <name>fs.defaultFS</name>                                     
     <value>hdfs://localhost:9000</value>                             
  </property>                                                        
</configuration>

Edit mapred-site.xml

The file can be located at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/mapred-site.xml and by default will be blank.

<configuration>
        <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
        </property>
</configuration>

Edit hdfs-site.xml

The file can be located at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/hdfs-site.xml

    <property>
     <name>dfs.replication</name>
       <value>1</value>
       </property>
 
       <property>
          <name>dfs.namenode.name.dir</name>
          <value>file:/usr/local/Cellar/hadoop/2.6.0/hadoop_data/hdfs/namenode</value>
       </property>
 
       <property>
          <name>dfs.datanode.data.dir</name>
          <value>file:/usr/local/Cellar/hadoop/2.6.0/hadoop_data/hdfs/datanode</value>
       </property>
   </configuration>

Edit yarn-site.xml

$ sudo vi yarn-site.xml
         <configuration>
                  <property>
                      <name>yarn.nodemanager.aux-services</name>
                      <value>mapreduce_shuffle</value>
                  </property>
                  <property>
                      <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                      <value> org.apache.hadoop.mapred.ShuffleHandler</value>
                  </property>
          </configuration>

####### 解决datanode链接namenode总是给出IP地址为0.0.0.0的问题######（待商榷）###

     <property>
         <name>yarn.resourcemanager.address</name>
         <value>localhost:8032</value>
      </property>
      <property>
         <name>yarn.resourcemanager.scheduler.address</name>
         <value>localhost:8030</value>
      </property>
      <property>
         <name>yarn.resourcemanager.resource-tracker.address</name>
         <value>localhost:8031</value>
      </property>

######################end of yarn-site.xml #############

To simplify life edit your ~/.profile using vim, and add the following two commands

alias hstart="/usr/local/Cellar/hadoop/2.6.0/sbin/start-dfs.sh;/usr/local/Cellar/hadoop/2.6.0/sbin/start-yarn.sh"
alias hstop="/usr/local/Cellar/hadoop/2.6.0/sbin/stop-yarn.sh;/usr/local/Cellar/hadoop/2.6.0/sbin/stop-dfs.sh"

and execute

$ source ~/.profile

in the terminal to update.

Before we can run Hadoop we first need to format the HDFS using

$ hdfs namenode -format

SSH Localhost

Nothing needs to be done here if you have already generated ssh keys. To verify just check for the existance of ~/.ssh/id_rsa and the ~/.ssh/id_rsa.pub files. If not the keys can be generated using

$ ssh-keygen -t rsa

Enable Remote Login
“System Preferences” -> “Sharing”. Check “Remote Login”
Authorize SSH Keys
To allow your system to accept login, we have to make it aware of the keys that will be used

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Let’s try to login. (这样不用输密码了)

$ ssh localhost
> Last login: Fri Mar  6 20:30:53 2015
$ exit

Change owner of folders

 mkdir -p /usr/local/Cellar/hadoop/2.6.0/hadoop_data/hdfs/namenode
 mkdir -p /usr/local/Cellar/hadoop/2.6.0/hadoop_data/hdfs/datanode
 sudo chown -R Tiger /usr/local/Cellar/hadoop

Running Hadoop

Now we can run Hadoop just by typing

$ hstart

and stopping using

$ hstop

(在这之后，别的都可以了，但是resourcemanager没有启动起来)
$sbin/yarn-daemon.sh start resourcemanager    (这个在hadoop目录下）
starting resourcemanager, logging to /usr/local/Cellar/hadoop/2.6.0/libexec/logs/yarn-Tiger-resourcemanager-MacMini.out
nohup: can't detach from console: No such file or directory
(但那个目录明明是有的啊？？？？ FYI 如果成功的话，不会有nohup那一行)
解决方法是：

Download Examples

To run examples, Hadoop needs to be started.

Hadoop Examples 1.2.1 (Old)
Hadoop Examples 2.6.0 (Current)

Test them out using:

$ hadoop jar <path to the hadoop-examples file> pi 10 100

Good to know

We can access the Hadoop web interface by connecting to

Resource Manager: http://localhost:50070
JobTracker: http://localhost:8088
Specific Node Information: http://localhost:8042

This we can use to access the HDFS filesystem, for any resulting output files.

HDFS viewer