Difference between revisions of "Course: Massive Data Analysis 2014/Hadoop Exercise"

From VistrailsWiki
Jump to navigation Jump to search
(Created page with '== Before you start == * You '''must''' have Hadoop installed and working on your local machine. You also need to setup your Amazon AWS account. Refer to the instruction in the c…')
 
Line 2: Line 2:
* You '''must''' have Hadoop installed and working on your local machine. You also need to setup your Amazon AWS account. Refer to the instruction in the course page.
* You '''must''' have Hadoop installed and working on your local machine. You also need to setup your Amazon AWS account. Refer to the instruction in the course page.
* Download the following package: http://vgc.poly.edu/~fchirigati/mda-class/hadoop-exercise.zip. This package contains the basic WordCount example to help you get started.
* Download the following package: http://vgc.poly.edu/~fchirigati/mda-class/hadoop-exercise.zip. This package contains the basic WordCount example to help you get started.
* What to submit:
* What to submit
** Code: place your code in a public GitHub repository
** Code: place your code in a public GitHub repository
** Results: put the results in your S3 bucket (don't forget to make it public)
** Results: put the results in your S3 bucket (don't forget to make it public)
** Complete this form to add the links to your GitHub repository and S3 bucket
** Complete this [http://bit.ly/1vAxovu form] to add the links to your GitHub repository and S3 bucket


== Exercise 0: WordCount ==
== Exercise 0: WordCount ==

Revision as of 19:27, 3 October 2014

Before you start

  • You must have Hadoop installed and working on your local machine. You also need to setup your Amazon AWS account. Refer to the instruction in the course page.
  • Download the following package: http://vgc.poly.edu/~fchirigati/mda-class/hadoop-exercise.zip. This package contains the basic WordCount example to help you get started.
  • What to submit
    • Code: place your code in a public GitHub repository
    • Results: put the results in your S3 bucket (don't forget to make it public)
    • Complete this form to add the links to your GitHub repository and S3 bucket

Exercise 0: WordCount

  • Run the basic WordCount example on your local machine and AWS


Exercise 1: Fixed-Length WordCount

Exercise 2: InitialCount

Exercise 3 Top-K WordCount