Big Data Series: Running Our First Application on Hadoop

Hadoop comes with lots of sample applications for you to run. To see what they have, you can type ‘hadoop jar /usr/jars/hadoop-examples.jar’ into the terminal. For this article, we’re going to use the wordcount script.

We should verify that our file still exists by using ‘hadoop fs -ls’

we should learn how to run wordcount app by examining it’s command line arguments ‘hadoop jar /usr/jars/hadoop-examples.jar wordcount’

Once we’ve done that, we can start the job running. we will have progress updates on the map and reduce tasks (as below) : ‘hadoop jar /usr/jars/hadoop-examples.jar wordcount words.txt out’. Note: the ‘out’ at the end defines the folder name that it will output to.

We should now check HDFS for the ‘out‘ directory ‘hadoop fs -ls’

Now, let’s check what’s in the ‘out’ directory ‘hadoop fs -ls out’

The file called ‘part-r-00000’ contains the results of the script. The _SUCCESS file, simply means it executed successfully

We can now copy the results to our local drive ‘hadoop fs -copyToLocal out/part-r-00000 local.txt’

Now, we can type ‘more local.txt’ to read the contents.