cs9223 - Massive Data Analysis - Homework Assignment 1
Hands-on experience with Hadoop/MapReduce
As a good citizen of NYC, you want to be prepared for potential disaster scenarios and you want to learn about hurricane evacuation centers in NYC (https://www.google.com/fusiontables/DataSource?docid=1BiOUN5JP94FT5pV9UNCQIvFloHqS_NTboOTvVD8). You will:
- Install Hadoop and HDFS on your laptop
- Copy to HDFS the hurricane center data that in the files located at: http://vgc.poly.edu/~juliana/courses/cs9223/Homework/HW1
- File1 (hurricane-center-addresses.csv) contains the addresses for hurricane centers, while File2 ("hurricane-center-coordinates.csv") contains their coordinates.
- Write and run MapReduce programs that:
a) Select the tuples from File1 where zipcode<10030
b) Eliminate duplicates from File1
c) Compute the natural join between File1 and File2, i.e., create a new file that contains both the addresses and coordinates for the hurricane centers
- You should submit your java program source and binary (named hw1.java and hw1.jar). The program should accept as an argument the question letter (a,b, and c) and output the answer for that question.