VisTrails Home

Course: Massive Data Analysis 2014

From VisTrailsWiki

(Difference between revisions)
Jump to: navigation, search
(Week 5 -- Oct 6: Cloud computing, Map Reduce and Hadoop)
Line 87: Line 87:
== Week 6 -- Oct  13: Fall Break ==
== Week 6 -- Oct  13: Fall Break ==
 +
== Week 7 -- Oct  20: Big Data Analysis with Myria  ==
 +
* Lecture notes: 
 +
** http://bigdata.poly.edu/~fchirigati/mda-class/dan-myria.pdf
-
== Week 7 -- Oct  20: Algorithm Design for MapReduce  ==
+
* Useful reading:
 +
** Myria Demo Paper: http://myria.cs.washington.edu/publications/Halperin_Myria_demo_SIGMOD_2014.pdf
 +
 
 +
== Week 7 -- Oct  27: Algorithm Design for MapReduce  ==
* Lecture notes:   
* Lecture notes:   
Line 98: Line 104:
** Mining of Massive Datasets (2nd Edition), Chapter 2.
** Mining of Massive Datasets (2nd Edition), Chapter 2.
-
 
+
== Week 8 -- Nov 3: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages ==
-
== Week 8 -- Oct 27: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages ==
+
* Lecture notes:
* Lecture notes:
Line 119: Line 124:
= Big Data Algorithms and Techniques (3 weeks) =
= Big Data Algorithms and Techniques (3 weeks) =
-
== Week 9 -- Nov 3: Association Rules  ==
+
== Week 9 -- Nov 10: Association Rules  ==
* Lecture notes:
* Lecture notes:
Line 128: Line 133:
-
== Week 10 -- Nov 10:  Finding similar items ==
+
== Week 10 -- Nov 17:  Finding similar items ==
* Lecture notes:
* Lecture notes:
Line 136: Line 141:
-
== Week 11 -- Nov 17: Graph Analysis ==
+
== Week 11 -- Nov 25: Graph Analysis ==
* Lecture notes:
* Lecture notes:
Line 142: Line 147:
-
== Week 12 -- Nov 25: Large-Scale Visualization -- Invited lecture by Dr. Lauro Lins (AT&T Research) ==
+
== Week 12 -- Dec 1: Large-Scale Visualization -- Invited lecture by Dr. Lauro Lins (AT&T Research) ==
* Lecture notes:
* Lecture notes:
Line 161: Line 166:
-
== Week 13 -- Dec 1: Data Cleaning and Integration ==
+
== Week 13 -- Dec 8: Data Cleaning and Integration ==
-
== Week 14 -- Dec 8: Project Presentations  ==
+
== Week 14 -- Dec 15: Project Presentations  ==
-
== Week 15 -- Dec 15: Project Presentations ==
+
<!--== Week 15 -- Dec 15: Project Presentations ==-->

Revision as of 15:56, 20 October 2014

Contents

CS-GY 6333 Massive Data Analysis: Tentative Schedule -- subject to change

  • Lecture: Mondays, 1:00pm-3:25pm at 2MTC, room 9.011.

News

  • On Sept 22nd, I distributed AWS tokens that will be needed for your assignments. If you have not received your token, let me know.
  • Your first assignment has been posted -- see details below and in NYU Classes.

Background (4 weeks)

Week 1 -- Sept 8: Course Overview; the evolution of Data Management

Week 2 -- Sept 15: Provenance and Reproducibility

  • Github setup:

Week 3 -- Sept 22: Introduction to Databases; Relational Model and SQL

Week 4 -- Sept 29: Overview: Advanced SQL and Query Optimization

Big Data Foundations and Infrastructure (3 weeks)

Week 5 -- Oct 6: Cloud computing, Map Reduce and Hadoop

  • Lab: after the lecture, you will work on an in-class exercise. For this you need to install Hadoop on your laptop and have your account setup on AWS. See instructions below.
  • You will use two different Hadoop configurations:
    • Local (on your laptop)
    • Amazon AWS: Each student should have received a token with $100 credit towards computing time at AWS. If you have not received the token yet, contact us immediately! When using AWS, always remember to terminate your instances! If you don't, you will be charged and you are responsible for the charges beyond your credit.
    • See installation instructions for Hadoop on your local machine and how to setup your AWS account in http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/HadoopExerciseInstructions.pdf
    • Warning: Install Hadoop in your machine and setup your AWS account before class starts. There will be no time for installing software during our in-class exercise.


  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2 - 2.1 and 2.2 (Large-Scale File Systems and Map-Reduce).

Week 6 -- Oct 13: Fall Break

Week 7 -- Oct 20: Big Data Analysis with Myria

Week 7 -- Oct 27: Algorithm Design for MapReduce

  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2.

Week 8 -- Nov 3: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages



Big Data Algorithms and Techniques (3 weeks)

Week 9 -- Nov 10: Association Rules


Week 10 -- Nov 17: Finding similar items


Week 11 -- Nov 25: Graph Analysis


Week 12 -- Dec 1: Large-Scale Visualization -- Invited lecture by Dr. Lauro Lins (AT&T Research)

  • Reading:

The Value of Visualization, Jarke Van Wijk http://www.win.tue.nl/~vanwijk/vov.pdf

Tamara Munzner's Book draft 2 available online http://www.cs.ubc.ca/~tmm/courses/533/book/

Nanocubes Paper http://nanocubes.net http://nanocubes.net/assets/pdf/nanocubes_paper_preprint.pdf


Week 13 -- Dec 8: Data Cleaning and Integration

Week 14 -- Dec 15: Project Presentations

Personal tools