Thursday, 1 September 2016

Map Reduce Program of Wordcount


PROGRAM:

package wordcount;
       
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class WordCount
{
  public static class Map extends MapReduceBase implements
            Mapper<LongWritable, Text, Text, IntWritable>
      {

 public void map(LongWritable key, Text value, OutputCollector<Text,              IntWritable> output, Reporter reporter)
            throws IOException
        {
            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);

            while (tokenizer.hasMoreTokens())
           { 
                value.set(tokenizer.nextToken());
                output.collect(value, new IntWritable(1));
           }

      }
    }

    public static class Reduce extends MapReduceBase implements
            Reducer<Text, IntWritable, Text, IntWritable>
      {
        public void reduce(Text key, Iterator<IntWritable> values,
                OutputCollector<Text, IntWritable> output, Reporter reporter)
                throws IOException
        {
               int sum = 0;
               while (values.hasNext())
           {
               sum += values.next().get();
            }

            output.collect(key, new IntWritable(sum));
       }
    }

    public static void main(String[] args) throws Exception

 {     JobConf conf = new JobConf(WordCount.class);                                                conf.setJobName("wordcount");

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);

        conf.setMapperClass(Map.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
          conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

       JobClient.runJob(conf); 

    }
}




Tuesday, 16 August 2016

installation of Hadoop-2.6.0

                         Steps to Install hadoop on  ubuntu




1. Install Jdk 1.6 or greater here.
2. Download the required hadoop tar file .
3. Extract by tar xvzf hadoop-2.6.0.tar.gz in terminal.
4. update the JAVA_HOME inside the hadoop-env.sh file.
5. update your bashrc by 
    $sudo gedit .bashrc

paste this export lines atlast line
export HADOOP_HOME=/home/ratul/hadoop-2.6.0
export HADOOP_CONF_DIR=/home/ratul/hadoop-2.6.0/etc/hadoop
export HADOOP_MAPRED_HOME=/home/ratul/hadoop-2.6.0
export HADOOP_COMMON_HOME=/home/ratul/hadoop-2.6.0
export HADOOP_HDFS_HOME=/home/ratul/hadoop-2.6.0
export YARN_HOME=/home/ratul/hadoop-2.6.0

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-i386
export PATH=$PATH:/home/ratul/hadoop-2.6.0
export HADOOP_USER_CLASSPATH_FIRST=true

6. modify your core-site.xml, hdfs-site.xml yarn-site.xml and mapred-site.xml.
7. Install ssh on your system using sudo apt-get install ssh.
8. run the below two commands to save the auth keys.
       $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
       $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
9. now your system is setup and installed with hadoop, format your namenode
   > go to bin folder of hadoop-2.6.0 then run below command
     $ ./hadoop namenode -format
10. now go to sbin folder of hadoop-2.6.0
11. run ./start-all.sh to run your namenode,datanode,secondarynamenode,resourcemanager and nodemanager.
12. You can view the namenode on http://localhost:50070
13. You can view the cluster at http://localhost:8088

Tuesday, 9 August 2016

HDFS Commands

                         
                                   <<<<<<HDFS COMMANDS>>>>>>>>>




hadoop fs ls:

The hadoop ls command is used to list out the directories and files. An example is shown below:


$./hadoop fs -ls input/
Found 1 items
drwxr-xr-x   - hadoop hadoop  0 2013-09-10 09:47 /input/abc.txt

-------------------------------------
hadoop fs lsr:

The hadoop lsr command recursively displays the directories, sub directories and files in the specified directory. The usage example is shown below:

$./hadoop fs -lsr /user/hadoop/dir
Found 2 items
drwxr-xr-x   - hadoop hadoop  0 2013-09-10 09:47 /user/hadoop/dir/products
-rw-r--r--   2 hadoop hadoop    1971684 2013-09-10 09:47 /user/hadoop/dir/products/products.dat

------------------------------------
hadoop fs cat:

Hadoop cat command is used to print the contents of the file on the terminal. The usage example of hadoop cat command is shown below:
EX:
hadoop fs -cat input/abc.txt

-------------------------------
hadoop fs chmod:

The hadoop chmod command is used to change the permissions of files. The usage is shown below:
SYNTAX:
hadoop fs -chmod  <octal mode> <file or directory name>
EX:
$./hadoop fs -chmod 700 input/abc.txt

--------------------------------------------
hadoop fs chown:

The hadoop chown command is used to change the ownership of files. The usage is shown below:
SYNTAX
hadoop fs -chown <NewOwnerName> <file or directory name>
EX:
$./hadoop fs -chown hadoop input/abc.txt
---------------------------------------
hadoop fs mkdir:

The hadoop mkdir command is for creating directories in the hdfs. You can use the -p option for creating parent directories. This is similar to the unix mkdir command. The usage example is shown below:


$./hadoop fs -mkdir -p input/

The above command creates the input directory in /user/ratul directory.
-------------------------------
hadoop fs copyFromLocal:

The hadoop copyFromLocal command is used to copy a file from the local file system to the hadoop hdfs. The syntax and usage example are shown below:

Syntax:
hadoop fs -copyFromLocal <source> <destination>

Example:

Check the data in local file
> cat sales.txt
2000,iphone
2001, htc

Now copy this file to hdfs

$./hadoop fs -copyFromLocal /home/ratul/sales.txt input/

View the contents of the hdfs file.

$./hadoop fs -cat input/sales.txt
2000,iphone
2001, htc
-----------------------------------
hadoop fs copyToLocal:

The hadoop copyToLocal command is used to copy a file from the hdfs to the local file system. The syntax and usage example is shown below:
SYNTAX
hadoop fs -copyToLocal <source> <destination>
EX:
$./hadoop fs -copyToLocal input/sales.txt /home/ratul/

---------------------------
hadoop fs cp:

The hadoop cp command is for copying the source into the target. The cp command can also be used to copy multiple files into the target. In this case the target should be a directory. The syntax is shown below:
SYNTAX
>hadoop fs -cp <source> <destination>
EX:
$./hadoop fs -cp input/sales.txt new/

----------------------------
hadoop fs -put:

Hadoop put command is used to copy multiple sources to the destination system. The syntax for the put command are shown below:

Syntax1: copy single file to hdfs

>./hadoop fs -put home/ratul/abc.txt input/

Syntax2: copy multiple files to hdfs

>./hadoop fs -put home/ratul/abc.txt home/ratul/qwerty.txt /new_folder

---------------------------------
hadoop fs get:

Hadoop get command copies the files from hdfs to the local file system. The syntax of the get command is shown below:
SYNTAX:
hadoop fs -get <source_from_hdfs> <destination_to_local>
EX:
$./hadoop fs -get input/abc.txt /home/ratul/
-------------------------------------

hadoop fs moveFromLocal:

The hadoop moveFromLocal command moves a file from local file system to the hdfs directory. It removes the original source file. The usage example is shown below:
SYNTAX:
hadoop fs -moveFromLocal <source_from_local> <destination_to_hdfs>
EX:
$./hadoop fs -moveFromLocal /home/ratul/abc.txt input/

-------------------------------
hadoop fs mv:

It moves the files from source hdfs to destination hdfs. Hadoop mv command can also be used to move multiple source files into the target directory. The syntax is shown below:
SYNTAX:
hadoop fs -mv <SrcFile> <destinationFile>
EX:
$./hadoop fs -mv input/abc.txt input/a/


----------------------
hadoop fs du:

The du command displays aggregate length of files contained in the directory or the length of a file in case its just a file. The syntax and usage is shown below:

$./hadoop fs -du abc.txt
------------------------------
hadoop fs rm:

Removes the specified list of files and empty directories. An example is shown below:

$./hadoop fs -rm input/file.txt
--------------------------------
hadoop fs -rmr:

Recursively deletes the files and sub directories. The usage of rmr is shown below:
EX:
$./hadoop fs -rmr input/folder/
-------------------------------------
hadoop fs setrep:

Hadoop setrep is used to change the replication factor of a file.

EX:
$./hadoop fs -setrep - 3 /input/abc.txt


---------------------------------
hadoop fs stat:

Hadoop stat returns the stats information on a path. The syntax of stat is shown below:
EX:
$./hadoop fs -stat /input/abc.txt
2013-09-24 07:53:04
----------------------------
hadoop fs tail:

Hadoop tail command prints the last 10 lines of the file. 
EX:
$./hafoop fs -tail /user/hadoop/abc.txt

12345 abc
2456 xyz
---------------------
hadoop fs text:

The hadoop text command displays the source file in text format. The syntax is shown below:
SYNTAX:
hadoop fs -text <src>
EX:
$./hadoop fs -text input/abc.txt

----------------------------------------
hadoop fs touchz:

The hadoop touchz command creates a zero byte file. This is similar to the touch command in unix. The syntax is shown below:
EX:
$./hadoop fs -touchz /input/aaa.txt

Saturday, 23 July 2016

learn about hadoop

 Hadoop

Apache Hadoop is, an open-source software framework, written in Java, by Doug Cutting and Michael J. Cafarella, that supports data-intensive distributed licensed under the Apache v2 license. It supports of applications on large clusters of commodity hardware. Hadoop was derived from Google's MapReduce and Google File System (GFS) papers.

The name "Hadoop" was given by Doug Cutting's, he named it after his son's toy elephant. Doug used the name for his open source project because it was easy to pronounce and to Google.The Hadoop framework transparently provides both reliability and data motion to applications. Hadoop implements a computational paradigm named MapReduce, where the is divided into many small of work, each of which may be executed or re-executed on any node in the cluster. It provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both reduce and the distributed file system are designed so that node failures are automatically handled by the framework. It enables applications to work with thousands of computation-independent computers and petabytes of data. The entire Apache Hadoop platform is commonly considered to consist of the Hadoop kernel, MapReduce and Hadoop Distributed File System (HDFS), and number of related projects including Apache Hive, Apache HBase, Apache Pig, Zookeeper etc.

Before you start proceeding with this hadoop, you should have prior exposure to Core Java, database concepts, and any of the Linux operating system flavors.