Friday 26 February 2016

Generic type Insertion Sort

Generic type Insertion Sort

This below code is capable of doing insertion sorting with Multiple data types like String, double, int as of now. It can be altered for any object that implements comparable.
 import java.util.Scanner;  
   
 public class MyInsertionSort {  
           public static void main(String[] args) {  
                     Scanner in = new Scanner(System.in);  
   
                     System.out.print("Enter data type to sort : ");  
                     String type = in.nextLine();  
   
                     System.out.print("Enter number of elements : ");  
                     String insertionSort = in.nextLine();  
                     int num = Integer.parseInt(insertionSort);  
                     String array[] = new String[num];  
                     for (int i = 0; i < array.length; i++) {  
                               System.out.print("Input the Number at array index " + i + ": ");  
                               array[i] = in.nextLine();  
                     }  
                     MyInsertionSort.insertionSortByType(array, type);  
                     in.close();  
           }  
   
           public static void insertionSortByType(String array[], String type) {  
                     switch (type) {  
                     case "double":  
                               Double[] ConvertedArrayDouble = new Double[array.length];  
                               for (int i = 0; i < array.length; i++)  
                                         ConvertedArrayDouble[i] = Double.parseDouble(array[i]);  
                               MyInsertionSort.insertionSort(ConvertedArrayDouble);  
                               break;  
                     case "int":  
                               Integer[] ConvertedArrayInt = new Integer[array.length];  
                               for (int i = 0; i < array.length; i++)  
                                         ConvertedArrayInt[i] = Integer.parseInt(array[i]);  
                               MyInsertionSort.insertionSort(ConvertedArrayInt);  
                               break;  
                     default:  
                               MyInsertionSort.insertionSort(array);  
                     }  
           }  
   
           public static <E extends Comparable<? super E>> void insertionSort(  
                               E array[]) {  
                     int n = array.length;  
                     for (int j = 1; j < n; j++) {  
                               E key = array[j];  
                               int i = j - 1;  
                               while ((i > -1) && (array[i].compareTo(key) > 0)) {  
                                         array[i + 1] = array[i];  
                                         i--;  
                               }  
                               array[i + 1] = key;  
                     }  
   
                     printNumbers(array);  
           }  
   
           public static <E> void printNumbers(E array[]) {  
                     for (E i : array) {  
                               System.out.println(i);  
                     }  
           }  
 }  

Tuesday 16 February 2016

Maven Build JAR once and share it offline

In Maven, you can build your project only once and get a JAR file fully packed with all dependencies. So that, you can share this JAR to other machines off-line.
Below are the steps to make it.
  1. First update your pom.xml with the setting
    <build>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <archive>
                        <manifest>
                            <mainClass>com.thanga.MyTest[REPLACE WITH YOUR MAIN CLASS]
                            </mainClass>
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
            </plugin>
        </plugins>
    </build>
  2. Package your project with the goal package assembly:single as shown below
This is equivalent to "mvn package assembly:single"


  1. Run this and you can get the two JAR files. One of them MyFullPack-0.0.1-SNAPSHOT-jar-with-dependencies.jar has the full dependencies loaded.

  1. You can open the JAR to see the dependencies are packed as shown below.

  1. You can share this JAR to other machines off-line without any more build

Wednesday 13 January 2016

How to add a new datanode in existing hadoop cluster without restarting.

Follow the below instructions to add  a new datanode in existing hadoop cluster without restarting.

1. Create a file "includes" under /conf directory.

2. Include the IP of the datanode in this file.

3. Add the property below to hdfs-site.xml

<property>
    <name>dfs.hosts</name>
    <value>[HADOOP-HOME]/conf/includes</value>
    <final>true</final>
</property>
4. Add the property below to mapred-site.xml
<property>
    <name>mapred.hosts</name>
    <value>[HADOOP-HOME]/conf/includes</value>
</property>
5. In Namenode, execute
 bin/hadoop dfsadmin -refreshNodes
6. In Jobtracker node, execute
 bin/hadoop mradmin -refreshNodes

7. Login to the new slave node and execute:

$ cd path/to/hadoop
$ bin/hadoop-daemon.sh start datanode
$ bin/hadoop-daemon.sh start tasktracker

8. Add IP of the new datanode in conf/slaves file

Finally, Execute the below command during non-peak hour

$ bin/start-balancer.sh

Thursday 7 January 2016

Oozie Map reduce action with send mail

Following things are to be set up for doing a mapred action and sending email after that. In your oozie-site.xml, Add the below settings and restart oozie. Replace values with the same specific to your environment.
<!-- SMTP params -->
<property>
    <name>oozie.email.smtp.host</name>
    <value></value>
</property>
<property>
    <name>oozie.email.smtp.port</name>
    <value>25</value>
</property>
<property>
    <name>oozie.email.from.address</name>
    <value></value>
</property>
<property>
    <name>oozie.email.smtp.auth</name>
    <value>false</value>
</property>
<property>
    <name>oozie.email.smtp.username</name>
    <value></value>
</property>
<property>
    <name>oozie.email.smtp.password</name>
    <value></value>
</property>
In your workflow.xml, add the below instructions with your environment settings. This includes the email triggering action once the mapreduce is completed.
<workflow-app name="WorkFlowJavaMapReduceAction" xmlns="uri:oozie:workflow:0.1">
    <start to="mapReduceAction" />
    <action name="mapReduceAction">
        <map-reduce>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${outputDir}" />
            </prepare>
            <configuration>
                <property>
                    <name>mapred.mapper.new-api</name>
                    <value>true</value>
                </property>
                <property>
                    <name>mapred.reducer.new-api</name>
                    <value>true</value>
                </property>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
                <property>
                    <name>mapreduce.map.class</name>
                    <value></value>
                </property>
                <property>
                    <name>mapreduce.reduce.class</name>
                    <value></value>
                </property>
                <property>
                    <name>mapred.mapoutput.key.class</name>
                    <value>org.apache.hadoop.io.Text</value>
                </property>
                <property>
                    <name>mapred.mapoutput.value.class</name>
                    <value>org.apache.hadoop.io.IntWritable</value>
                </property>
                <property>
                    <name>mapred.output.key.class</name>
                    <value>org.apache.hadoop.io.Text</value>
                </property>
                <property>
                    <name>mapred.output.value.class</name>
                    <value>org.apache.hadoop.io.IntWritable</value>
                </property>
                <property>
                    <name>mapred.input.dir</name>
                    <value>${inputDir}</value>
                </property>
                <property>
                    <name>mapred.output.dir</name>
                    <value>${outputDir}</value>
                </property>
                <property>
                    <name>mapreduce.job.acl-view-job</name>
                    <value>*</value>
                </property>
                <property>
                    <name>oozie.launcher.mapreduce.job.acl-view-job</name>
                    <value>*</value>
                </property>
                <property>
                    <name>oozie.use.system.libpath</name>
                    <value>false</value>
                </property>
                <property>
                    <name>oozie.libpath</name>
                    <value>${appPath}/lib</value>
                </property>
            </configuration>
        </map-reduce>
        <ok to="emailCommands" />
        <error to="killJob" />
    </action>

    <action name="emailCommands">
        <fs>
            <mkdir path='${makeDirectoryAbsPath}' />
            <move source='${dataInputDirectoryAbsPath}' target='${dataDestinationDirectoryRelativePath}' />
        </fs>
        <ok to="sendEmailSuccess" />
        <error to="sendEmailKill" />
    </action>
    <action name="sendEmailSuccess">
        <email xmlns="uri:oozie:email-action:0.1">
            <to>${emailToAddress}</to>
            <subject>Status of workflow ${wf:id()}</subject>
            <body>The workflow ${wf:id()} completed successfully</body>
        </email>
        <ok to="end" />
        <error to="end" />
    </action>
    <action name="sendEmailKill">
        <email xmlns="uri:oozie:email-action:0.1">
            <to>${emailToAddress}</to>
            <subject>Status of workflow ${wf:id()}</subject>
            <body>The workflow ${wf:id()} had issues and was killed. The error
                message is: ${wf:errorMessage(wf:lastErrorNode())}</body>
        </email>
        <ok to="end" />
        <error to="killJob" />
    </action>
    
    <kill name="killJob">
        <message>"Killed job due to error:
            ${wf:errorMessage(wf:lastErrorNode())}"</message>
    </kill>
    <end name="end" />    
</workflow-app>



Wednesday 23 December 2015

Hive 1.2.1 Installation with Mysql

Hive 1.2.1 Installation with Mysql


Download a binary pack from http://apache.claz.org/hive/stable/

user@host:~$  cd /opt/
user@host:~$  sudo mkdir hive
user@host:~$  cd Downloads
user@host:~$  sudo mv <HIVE-BIN-FILE> /opt/hive
user@host:~$  cd /opt/hive 
user@host:~$  cp conf/hive-default.xml.template conf/hive-site.xml
 
Edit the hive-site.xml with the below settings
<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
  <description>the URL of the MySQL database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>root</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>root</value>
</property>
<property>
  <name>hive.hwi.listen.host</name>
  <value>0.0.0.0</value>
</property>
<property>
  <name>hive.hwi.listen.port</name>
  <value>9999</value>
</property>
<property>
  <name>hive.hwi.war.file</name>
  <value>lib/hive-hwi-0.12.0.war</value>
</property>

<property>
  <name>hive.support.concurrency</name>
  <value>true</value>
</property>

<property>
  <name>hive.enforce.bucketing</name>
  <value>true</value>
</property>

<property>
  <name>hive.exec.dynamic.partition.mode</name>
  <value>nonstrict</value>
</property>

<property>
  <name>hive.compactor.initiator.on</name>
  <value>true</value>
</property>

<property>
  <name>hive.compactor.worker.threads</name>
  <value>1</value>
</property>
 

Mysql Connector

Download a Mysql connector jar file
Put that in /opt/hive/lib folder with name mysql-connector.jar
 
~/opt/hive/> bin/hive
 
Happy Hiving... 
 
 

Some Errors and fixes

 
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx--x--x
 at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx--x--x
 at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
 at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
 at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
 
For this issue, Give the write permission for /tmp/hive
 in hadoop with the below command
 
bin/hadoop dfs -chmod 777 /tmp/hive
 
Then hit hive.