So, you’ve got your Hadoop cluster all set up and you’re ready to begin your data analysis. You could go down the route of learning how to use Pig, a nosql language which can be used with Hadoop, but, unless you’re familiar with Java, that’s probably going to be a bit of a challenge.
So, what can you do? Well, if you’re already familiar with SQL, you can take advantage of HiveQL (Hive Query Language), which is an SQL-like language that can be used to query your Hadoop databases.
It should be noted that Hive is very good for querying your database, but due to the sheer amount of data held within the Hadoop cluster, it’s not the best way to extract real time data, as queries do have a certain level of latency.
You can access hive in three ways: through the web user interface; by using Microsoft HD Insight or you can access it through the command line interface.
The great thing about Hive is that he user does not need to know about Map Reduce (described in a little detail here), as Hive automatically converts your SQL scripts.
Hive supports the following operators:
- Group By
- Order By
- Sort By
- Left Join
- Right Join
- Inner Join
- Outer Join
- Cross Join
This list is growing with each Hive release, so as time goes on, you’ll become even more familiar with the commands that are available to you.