Some Technical things to share

Friday, 26 February 2016

Generic type Insertion Sort

This below code is capable of doing insertion sorting with Multiple data types like String, double, int as of now. It can be altered for any object that implements comparable.

 import java.util.Scanner;  
   
 public class MyInsertionSort {  
           public static void main(String[] args) {  
                     Scanner in = new Scanner(System.in);  
   
                     System.out.print("Enter data type to sort : ");  
                     String type = in.nextLine();  
   
                     System.out.print("Enter number of elements : ");  
                     String insertionSort = in.nextLine();  
                     int num = Integer.parseInt(insertionSort);  
                     String array[] = new String[num];  
                     for (int i = 0; i < array.length; i++) {  
                               System.out.print("Input the Number at array index " + i + ": ");  
                               array[i] = in.nextLine();  
                     }  
                     MyInsertionSort.insertionSortByType(array, type);  
                     in.close();  
           }  
   
           public static void insertionSortByType(String array[], String type) {  
                     switch (type) {  
                     case "double":  
                               Double[] ConvertedArrayDouble = new Double[array.length];  
                               for (int i = 0; i < array.length; i++)  
                                         ConvertedArrayDouble[i] = Double.parseDouble(array[i]);  
                               MyInsertionSort.insertionSort(ConvertedArrayDouble);  
                               break;  
                     case "int":  
                               Integer[] ConvertedArrayInt = new Integer[array.length];  
                               for (int i = 0; i < array.length; i++)  
                                         ConvertedArrayInt[i] = Integer.parseInt(array[i]);  
                               MyInsertionSort.insertionSort(ConvertedArrayInt);  
                               break;  
                     default:  
                               MyInsertionSort.insertionSort(array);  
                     }  
           }  
   
           public static <E extends Comparable<? super E>> void insertionSort(  
                               E array[]) {  
                     int n = array.length;  
                     for (int j = 1; j < n; j++) {  
                               E key = array[j];  
                               int i = j - 1;  
                               while ((i > -1) && (array[i].compareTo(key) > 0)) {  
                                         array[i + 1] = array[i];  
                                         i--;  
                               }  
                               array[i + 1] = key;  
                     }  
   
                     printNumbers(array);  
           }  
   
           public static <E> void printNumbers(E array[]) {  
                     for (E i : array) {  
                               System.out.println(i);  
                     }  
           }  
 }

Tuesday, 16 February 2016

Maven Build JAR once and share it offline

In Maven, you can build your project only once and get a JAR file fully packed with all dependencies. So that, you can share this JAR to other machines off-line.
Below are the steps to make it.

First update your pom.xml with the setting
<build> <plugins> <plugin> <artifactId>maven-assembly-plugin</artifactId> <configuration> <archive> <manifest> <mainClass>com.thanga.MyTest[REPLACE WITH YOUR MAIN CLASS] </mainClass> </manifest> </archive> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> </plugin> </plugins> </build>
Package your project with the goal package assembly:single as shown below

This is equivalent to "mvn package assembly:single"

Run this and you can get the two JAR files. One of them MyFullPack-0.0.1-SNAPSHOT-jar-with-dependencies.jar has the full dependencies loaded.

You can open the JAR to see the dependencies are packed as shown below.

You can share this JAR to other machines off-line without any more build

Wednesday, 13 January 2016

How to add a new datanode in existing hadoop cluster without restarting.

Follow the below instructions to add a new datanode in existing hadoop cluster without restarting.

1. Create a file "includes" under /conf directory.

2. Include the IP of the datanode in this file.

3. Add the property below to hdfs-site.xml

<property>
   <name>dfs.hosts</name>
   <value>[HADOOP-HOME]/conf/includes</value>
   <final>true</final>
</property>

4. Add the property below to mapred-site.xml

<property>
<name>mapred.hosts</name>
<value>[HADOOP-HOME]/conf/includes</value>
</property>

5. In Namenode, execute

bin/hadoop dfsadmin -refreshNodes

6. In Jobtracker node, execute

bin/hadoop mradmin -refreshNodes

7. Login to the new slave node and execute:

$ cd path/to/hadoop
$ bin/hadoop-daemon.sh start datanode
$ bin/hadoop-daemon.sh start tasktracker

8. Add IP of the new datanode in conf/slaves file

Finally, Execute the below command during non-peak hour

$ bin/start-balancer.sh

Thursday, 7 January 2016

Oozie Map reduce action with send mail

Following things are to be set up for doing a mapred action and sending email after that. In your oozie-site.xml, Add the below settings and restart oozie. Replace values with the same specific to your environment.

<!-- SMTP params -->
<property>
    <name>oozie.email.smtp.host</name>
    <value></value>
</property>
<property>
    <name>oozie.email.smtp.port</name>
    <value>25</value>
</property>
<property>
    <name>oozie.email.from.address</name>
    <value></value>
</property>
<property>
    <name>oozie.email.smtp.auth</name>
    <value>false</value>
</property>
<property>
    <name>oozie.email.smtp.username</name>
    <value></value>
</property>
<property>
    <name>oozie.email.smtp.password</name>
    <value></value>
</property>

In your workflow.xml, add the below instructions with your environment settings. This includes the email triggering action once the mapreduce is completed.

<workflow-app name="WorkFlowJavaMapReduceAction" xmlns="uri:oozie:workflow:0.1">
    <start to="mapReduceAction" />
    <action name="mapReduceAction">
        <map-reduce>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${outputDir}" />
            </prepare>
            <configuration>
                <property>
                    <name>mapred.mapper.new-api</name>
                    <value>true</value>
                </property>
                <property>
                    <name>mapred.reducer.new-api</name>
                    <value>true</value>
                </property>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
                <property>
                    <name>mapreduce.map.class</name>
                    <value></value>
                </property>
                <property>
                    <name>mapreduce.reduce.class</name>
                    <value></value>
                </property>
                <property>
                    <name>mapred.mapoutput.key.class</name>
                    <value>org.apache.hadoop.io.Text</value>
                </property>
                <property>
                    <name>mapred.mapoutput.value.class</name>
                    <value>org.apache.hadoop.io.IntWritable</value>
                </property>
                <property>
                    <name>mapred.output.key.class</name>
                    <value>org.apache.hadoop.io.Text</value>
                </property>
                <property>
                    <name>mapred.output.value.class</name>
                    <value>org.apache.hadoop.io.IntWritable</value>
                </property>
                <property>
                    <name>mapred.input.dir</name>
                    <value>${inputDir}</value>
                </property>
                <property>
                    <name>mapred.output.dir</name>
                    <value>${outputDir}</value>
                </property>
                <property>
                    <name>mapreduce.job.acl-view-job</name>
                    <value>*</value>
                </property>
                <property>
                    <name>oozie.launcher.mapreduce.job.acl-view-job</name>
                    <value>*</value>
                </property>
                <property>
                    <name>oozie.use.system.libpath</name>
                    <value>false</value>
                </property>
                <property>
                    <name>oozie.libpath</name>
                    <value>${appPath}/lib</value>
                </property>
            </configuration>
        </map-reduce>
        <ok to="emailCommands" />
        <error to="killJob" />
    </action>

    <action name="emailCommands">
        <fs>
            <mkdir path='${makeDirectoryAbsPath}' />
            <move source='${dataInputDirectoryAbsPath}' target='${dataDestinationDirectoryRelativePath}' />
        </fs>
        <ok to="sendEmailSuccess" />
        <error to="sendEmailKill" />
    </action>
    <action name="sendEmailSuccess">
        <email xmlns="uri:oozie:email-action:0.1">
            <to>${emailToAddress}</to>
            <subject>Status of workflow ${wf:id()}</subject>
            <body>The workflow ${wf:id()} completed successfully</body>
        </email>
        <ok to="end" />
        <error to="end" />
    </action>
    <action name="sendEmailKill">
        <email xmlns="uri:oozie:email-action:0.1">
            <to>${emailToAddress}</to>
            <subject>Status of workflow ${wf:id()}</subject>
            <body>The workflow ${wf:id()} had issues and was killed. The error
                message is: ${wf:errorMessage(wf:lastErrorNode())}</body>
        </email>
        <ok to="end" />
        <error to="killJob" />
    </action>
    
    <kill name="killJob">
        <message>"Killed job due to error:
            ${wf:errorMessage(wf:lastErrorNode())}"</message>
    </kill>
    <end name="end" />    
</workflow-app>

Wednesday, 23 December 2015

Hive 1.2.1 Installation with Mysql

Download a binary pack from http://apache.claz.org/hive/stable/

user@host:~$  cd /opt/
user@host:~$  sudo mkdir hive
user@host:~$  cd Downloads
user@host:~$  sudo mv <HIVE-BIN-FILE> /opt/hive

user@host:~$  cd /opt/hive

user@host:~$  cp conf/hive-default.xml.template conf/hive-site.xml

Edit the hive-site.xml with the below settings

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
  <description>the URL of the MySQL database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>root</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>root</value>
</property>
<property>
  <name>hive.hwi.listen.host</name>
  <value>0.0.0.0</value>
</property>
<property>
  <name>hive.hwi.listen.port</name>
  <value>9999</value>
</property>
<property>
  <name>hive.hwi.war.file</name>
  <value>lib/hive-hwi-0.12.0.war</value>
</property>

<property>
  <name>hive.support.concurrency</name>
  <value>true</value>
</property>

<property>
  <name>hive.enforce.bucketing</name>
  <value>true</value>
</property>

<property>
  <name>hive.exec.dynamic.partition.mode</name>
  <value>nonstrict</value>
</property>

<property>
  <name>hive.compactor.initiator.on</name>
  <value>true</value>
</property>

<property>
  <name>hive.compactor.worker.threads</name>
  <value>1</value>
</property>

Mysql Connector

Download a Mysql connector jar file

Put that in /opt/hive/lib folder with name mysql-connector.jar

~/opt/hive/> bin/hive

Happy Hiving...

Some Errors and fixes

Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx--x--x
 at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx--x--x
 at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
 at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
 at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)

For this issue, Give the write permission for /tmp/hive
 in hadoop with the below command

bin/hadoop dfs -chmod 777 /tmp/hive

Then hit hive.

Pages

Friday, 26 February 2016

Generic type Insertion Sort

Generic type Insertion Sort

Tuesday, 16 February 2016

Maven Build JAR once and share it offline

Wednesday, 13 January 2016

How to add a new datanode in existing hadoop cluster without restarting.

Follow the below instructions to add a new datanode in existing hadoop cluster without restarting.

Thursday, 7 January 2016

Oozie Map reduce action with send mail

Wednesday, 23 December 2015

Hive 1.2.1 Installation with Mysql

Hive 1.2.1 Installation with Mysql

Mysql Connector

Some Errors and fixes