Today, I was provided with a beta version of a data feed that would be consumed by the Hadoop platform. As it’s not been configured to run into the platform yet, there was no way to query the data to ensure we had all the raw data we’d eventually need to extract insights and run vital reports for business consumers.
The data I was provided was a feed from the test environment and was in CSV format. For this data to be useful, I wanted to load it into the Hadoop cluster and run some queries, calculations & aggregations on it using Hue. To to this, I needed to create a new table and populate it with my shiny new data set. So, I created a new folder in my user area called /newfeed and uploaded the CSV to this directory.
I then opened the Hive query engine & executed the below query:
create external table mynewtablename(
ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
In the above query, it’s important to note a few things:
- mynewtablename to be replaced with your desired table name
- ‘/user/username/newfeed/’ to be replaced with the directory in which you placed your CSV
- DELIMITED FIELDS TERMINATED BY “,” sets the delimiter to comma for the CSV file
If your CSV is in the correct directory & is structured as the new table is, the data will be available in the table when queried – simple huh?