Course: Massive Data Analysis 2014/Hadoop Exercise

From VistrailsWiki
Revision as of 19:27, 3 October 2014 by Fchirigati (talk | contribs)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Before you start

  • You must have Hadoop installed and working on your local machine. You also need to setup your Amazon AWS account. Refer to the instruction in the course page.
  • Download the following package: http://vgc.poly.edu/~fchirigati/mda-class/hadoop-exercise.zip. This package contains the basic WordCount example to help you get started.
  • What to submit
    • Code: place your code in a public GitHub repository
    • Results: put the results in your S3 bucket (don't forget to make it public)
    • Complete this form to add the links to your GitHub repository and S3 bucket

Exercise 0: WordCount

  • Run the basic WordCount example on your local machine and AWS


Exercise 1: Fixed-Length WordCount

Exercise 2: InitialCount

Exercise 3 Top-K WordCount