Wednesday, 13 January 2016

How to add a new datanode in existing hadoop cluster without restarting.

Follow the below instructions to add  a new datanode in existing hadoop cluster without restarting.

1. Create a file "includes" under /conf directory.

2. Include the IP of the datanode in this file.

3. Add the property below to hdfs-site.xml

<property>
    <name>dfs.hosts</name>
    <value>[HADOOP-HOME]/conf/includes</value>
    <final>true</final>
</property>
4. Add the property below to mapred-site.xml
<property>
    <name>mapred.hosts</name>
    <value>[HADOOP-HOME]/conf/includes</value>
</property>
5. In Namenode, execute
 bin/hadoop dfsadmin -refreshNodes
6. In Jobtracker node, execute
 bin/hadoop mradmin -refreshNodes

7. Login to the new slave node and execute:

$ cd path/to/hadoop
$ bin/hadoop-daemon.sh start datanode
$ bin/hadoop-daemon.sh start tasktracker

8. Add IP of the new datanode in conf/slaves file

Finally, Execute the below command during non-peak hour

$ bin/start-balancer.sh

Thursday, 7 January 2016

Oozie Map reduce action with send mail

Following things are to be set up for doing a mapred action and sending email after that. In your oozie-site.xml, Add the below settings and restart oozie. Replace values with the same specific to your environment.
<!-- SMTP params -->
<property>
    <name>oozie.email.smtp.host</name>
    <value></value>
</property>
<property>
    <name>oozie.email.smtp.port</name>
    <value>25</value>
</property>
<property>
    <name>oozie.email.from.address</name>
    <value></value>
</property>
<property>
    <name>oozie.email.smtp.auth</name>
    <value>false</value>
</property>
<property>
    <name>oozie.email.smtp.username</name>
    <value></value>
</property>
<property>
    <name>oozie.email.smtp.password</name>
    <value></value>
</property>
In your workflow.xml, add the below instructions with your environment settings. This includes the email triggering action once the mapreduce is completed.
<workflow-app name="WorkFlowJavaMapReduceAction" xmlns="uri:oozie:workflow:0.1">
    <start to="mapReduceAction" />
    <action name="mapReduceAction">
        <map-reduce>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${outputDir}" />
            </prepare>
            <configuration>
                <property>
                    <name>mapred.mapper.new-api</name>
                    <value>true</value>
                </property>
                <property>
                    <name>mapred.reducer.new-api</name>
                    <value>true</value>
                </property>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
                <property>
                    <name>mapreduce.map.class</name>
                    <value></value>
                </property>
                <property>
                    <name>mapreduce.reduce.class</name>
                    <value></value>
                </property>
                <property>
                    <name>mapred.mapoutput.key.class</name>
                    <value>org.apache.hadoop.io.Text</value>
                </property>
                <property>
                    <name>mapred.mapoutput.value.class</name>
                    <value>org.apache.hadoop.io.IntWritable</value>
                </property>
                <property>
                    <name>mapred.output.key.class</name>
                    <value>org.apache.hadoop.io.Text</value>
                </property>
                <property>
                    <name>mapred.output.value.class</name>
                    <value>org.apache.hadoop.io.IntWritable</value>
                </property>
                <property>
                    <name>mapred.input.dir</name>
                    <value>${inputDir}</value>
                </property>
                <property>
                    <name>mapred.output.dir</name>
                    <value>${outputDir}</value>
                </property>
                <property>
                    <name>mapreduce.job.acl-view-job</name>
                    <value>*</value>
                </property>
                <property>
                    <name>oozie.launcher.mapreduce.job.acl-view-job</name>
                    <value>*</value>
                </property>
                <property>
                    <name>oozie.use.system.libpath</name>
                    <value>false</value>
                </property>
                <property>
                    <name>oozie.libpath</name>
                    <value>${appPath}/lib</value>
                </property>
            </configuration>
        </map-reduce>
        <ok to="emailCommands" />
        <error to="killJob" />
    </action>

    <action name="emailCommands">
        <fs>
            <mkdir path='${makeDirectoryAbsPath}' />
            <move source='${dataInputDirectoryAbsPath}' target='${dataDestinationDirectoryRelativePath}' />
        </fs>
        <ok to="sendEmailSuccess" />
        <error to="sendEmailKill" />
    </action>
    <action name="sendEmailSuccess">
        <email xmlns="uri:oozie:email-action:0.1">
            <to>${emailToAddress}</to>
            <subject>Status of workflow ${wf:id()}</subject>
            <body>The workflow ${wf:id()} completed successfully</body>
        </email>
        <ok to="end" />
        <error to="end" />
    </action>
    <action name="sendEmailKill">
        <email xmlns="uri:oozie:email-action:0.1">
            <to>${emailToAddress}</to>
            <subject>Status of workflow ${wf:id()}</subject>
            <body>The workflow ${wf:id()} had issues and was killed. The error
                message is: ${wf:errorMessage(wf:lastErrorNode())}</body>
        </email>
        <ok to="end" />
        <error to="killJob" />
    </action>
    
    <kill name="killJob">
        <message>"Killed job due to error:
            ${wf:errorMessage(wf:lastErrorNode())}"</message>
    </kill>
    <end name="end" />    
</workflow-app>