Difference between revisions of "Course: Massive Data Analysis 2014/Hadoop Exercise"

From VistrailsWiki
Jump to navigation Jump to search
Line 7: Line 7:
** Complete this [http://bit.ly/1vAxovu form] to add the links to your GitHub repository and S3 bucket. '''Deadline: 11:59 PM on the same day of class (Oct 6, 2014)'''
** Complete this [http://bit.ly/1vAxovu form] to add the links to your GitHub repository and S3 bucket. '''Deadline: 11:59 PM on the same day of class (Oct 6, 2014)'''


== Exercise 0: WordCount ==
== Hands-on exercises ==
* Run the basic WordCount example on your local machine and AWS
* Exercise 0: WordCount
* Follow the instruction here to create your Amazon Elastic MapReduce (EMR): http://vgc.poly.edu/~fchirigati/mda-class/RunHadoopAWS.pdf
** Run the basic WordCount example on your local machine and AWS
* Instructions to run WordCount on your local machine and EMR cluster will be given in class
** Follow the instruction here to create your Amazon Elastic MapReduce (EMR): http://vgc.poly.edu/~fchirigati/mda-class/RunHadoopAWS.pdf
* '''Note: You don't have to submit code and results for this exercise.'''
** Instructions to run WordCount on your local machine and EMR cluster will be given in class
** '''Note: You don't have to submit code and results for this exercise.'''


== Exercise 1: Fixed-Length WordCount ==
* Exercise 1: Fixed-Length WordCount
* For this exercise, you will only count words with 5 characters
** For this exercise, you will only count words with 5 characters


== Exercise 2: InitialCount ==
* Exercise 2: InitialCount


== Exercise 3 Top-K WordCount ==
* Exercise 3 Top-K WordCount

Revision as of 19:50, 3 October 2014

Before you start

  • You must have Hadoop installed and working on your local machine. You also need to setup your Amazon AWS account. Refer to the instruction in the course page.
  • Download the following package: http://vgc.poly.edu/~fchirigati/mda-class/hadoop-exercise.zip. This package contains the basic WordCount example to help you get started.
  • What to submit
    • Code: place your code in a public GitHub repository
    • Results: put the results in your S3 bucket (don't forget to make it public)
    • Complete this form to add the links to your GitHub repository and S3 bucket. Deadline: 11:59 PM on the same day of class (Oct 6, 2014)

Hands-on exercises

  • Exercise 0: WordCount
    • Run the basic WordCount example on your local machine and AWS
    • Follow the instruction here to create your Amazon Elastic MapReduce (EMR): http://vgc.poly.edu/~fchirigati/mda-class/RunHadoopAWS.pdf
    • Instructions to run WordCount on your local machine and EMR cluster will be given in class
    • Note: You don't have to submit code and results for this exercise.
  • Exercise 1: Fixed-Length WordCount
    • For this exercise, you will only count words with 5 characters
  • Exercise 2: InitialCount
  • Exercise 3 Top-K WordCount