Difference between revisions of "Course: Massive Data Analysis 2014/Hadoop Exercise"

From VistrailsWiki
Jump to navigation Jump to search
(Created page with '== Before you start == * You '''must''' have Hadoop installed and working on your local machine. You also need to setup your Amazon AWS account. Refer to the instruction in the c…')
(No difference)

Revision as of 19:23, 3 October 2014

Before you start

  • You must have Hadoop installed and working on your local machine. You also need to setup your Amazon AWS account. Refer to the instruction in the course page.
  • Download the following package: http://vgc.poly.edu/~fchirigati/mda-class/hadoop-exercise.zip. This package contains the basic WordCount example to help you get started.
  • What to submit:
    • Code: place your code in a public GitHub repository
    • Results: put the results in your S3 bucket (don't forget to make it public)
    • Complete this form to add the links to your GitHub repository and S3 bucket

Exercise 0: WordCount

  • Run the basic WordCount example on your local machine and AWS


Exercise 1: Fixed-Length WordCount

Exercise 2: InitialCount

Exercise 3 Top-K WordCount