Difference between revisions of "Course: Massive Data Analysis 2014"

From VistrailsWiki
Jump to navigation Jump to search
Line 13: Line 13:
= Background (4 weeks) =
= Background (4 weeks) =


== Week 1 -- Jan 27: Course Overview; the evolution of Data Management ==
== Week 1 -- Sept 8: Course Overview; the evolution of Data Management ==


* Lecture notes:  http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/course-overview.pdf
* Lecture notes:  http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/course-overview.pdf
Line 19: Line 19:
* Course survey: https://docs.google.com/spreadsheet/embeddedform?formkey=dFpwTjROVzhLUWY2NVNXb0xvNTVLMnc6MA
* Course survey: https://docs.google.com/spreadsheet/embeddedform?formkey=dFpwTjROVzhLUWY2NVNXb0xvNTVLMnc6MA


== Week 2 -- Feb 3: Introduction to Databases ==
== Week 2 -- Sept 15: Introduction to Databases ==
* Lecture notes:  http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/intro-to-db.pdf
* Lecture notes:  http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/intro-to-db.pdf
* Other useful reading:  
* Other useful reading:  
** [http://philip.greenspun.com/sql/introduction.html Greenspun's SQL for Web Nerds Intro]
** [http://philip.greenspun.com/sql/introduction.html Greenspun's SQL for Web Nerds Intro]
** [http://philip.greenspun.com/sql/data-modeling.html SQL/Nerds Modeling (parts)]
** [http://philip.greenspun.com/sql/data-modeling.html SQL/Nerds Modeling (parts)]
* Feb 6: Lab: Data Exploration and Reproducibility ==
** [[Lab notes 02/06/14]]


* Homework assignment: [[Assignment 1 - Data Exploration]]
* Homework assignment: [[Assignment 1 - Data Exploration]]


== Week 3 -- Feb 10: Overview: Relational Model and SQL  ==
== Week 3 -- Sept 22: Overview: Relational Model and SQL  ==
* Lecture notes:   
* Lecture notes:   
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/relational-algebra.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/relational-algebra.pdf
Line 39: Line 36:
** [http://philip.greenspun.com/sql/data-modeling.html SQL/Nerds Modeling (parts)]
** [http://philip.greenspun.com/sql/data-modeling.html SQL/Nerds Modeling (parts)]


* Feb 13: Lab: Canceled -- University closed due to snow ==
== Week 3.1 -- Feb 17:  Holiday ==
* No class, holiday
* Feb 20 Lab: hands-on SQL
** [[Big Data Lab notes 02/19/14]]


== Week 4 -- Feb 24: Overview: Advanced SQL and Query Optimization  ==
== Week 4 -- Sept 29: Overview: Advanced SQL and Query Optimization  ==


* Lecture notes:   
* Lecture notes:   
Line 57: Line 47:
= Big Data Foundations and Infrastructure (2 weeks) =
= Big Data Foundations and Infrastructure (2 weeks) =


== Week 5 -- Mar 3: Cloud computing, Map Reduce and  Hadoop ==
== Week 5 -- Oct 6: Cloud computing, Map Reduce and  Hadoop ==
* Lecture notes:   
* Lecture notes:   
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/mapreduce-intro.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/mapreduce-intro.pdf
Line 70: Line 60:
* Homework Assignment -- Your first quiz is available on [http://www.newgradiance.com Gradiance]. It is ''due on March 17th at 5pm.''
* Homework Assignment -- Your first quiz is available on [http://www.newgradiance.com Gradiance]. It is ''due on March 17th at 5pm.''


== Week 6 -- Mar 10: Algorithm Design for MapReduce  ==
== Week 6 -- Oct  13: Algorithm Design for MapReduce  ==


* Lecture notes:   
* Lecture notes:   
Line 82: Line 72:
= Machine Learning and Big Data  (3 weeks) =
= Machine Learning and Big Data  (3 weeks) =


== Week 7 -- Mar 23: Hashing and AllReduce ==
== Week 7 -- Oct  20: Hashing and AllReduce ==
* Invited lecture by John Langford
* Invited lecture by John Langford


Line 93: Line 83:
* Homework assignment: [[Assignment 3 - MapReduce algorithm design]]
* Homework assignment: [[Assignment 3 - MapReduce algorithm design]]


== Week 8 -- Mar 30: Bandits ==
== Week 8 -- Oct 27: Bandits ==
* Invited lecture by John Langford
* Invited lecture by John Langford


Line 101: Line 91:
** http://cilvr.cs.nyu.edu/diglib/lsml/lecture10_doing_exploration.pdf
** http://cilvr.cs.nyu.edu/diglib/lsml/lecture10_doing_exploration.pdf


== Week 9 -- Apr 7: Large Scale Machine Learning in the Real World ==
== Week 9 -- Nov 3: Large Scale Machine Learning in the Real World ==
* Invited lecture by Leon Bottou
* Invited lecture by Leon Bottou


Line 111: Line 101:
= Big Data Foundations and Infrastructure -- cont. (2 weeks) =
= Big Data Foundations and Infrastructure -- cont. (2 weeks) =


== Week 10 -- April 14:  Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages ==
== Week 10 -- Nov 10:  Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages ==


* Lecture notes:
* Lecture notes:
Line 130: Line 120:
= Big Data Algorithms and Techniques (3 weeks) =
= Big Data Algorithms and Techniques (3 weeks) =


== Week 11 -- April 21: Data Management for Big Data (cont) and Association Rules  ==
== Week 11 -- Nov 17: Data Management for Big Data (cont) and Association Rules  ==


* Lecture notes:
* Lecture notes:
Line 139: Line 129:
* Homework Assignment -- Your  quiz is available on [http://www.newgradiance.com Gradiance]. It is ''due on April  28th.''
* Homework Assignment -- Your  quiz is available on [http://www.newgradiance.com Gradiance]. It is ''due on April  28th.''


== Week 12 -- Apr 28: Finding similar items: Invited lecture by Dr. Harish Doraiswami  ==
== Week 12 -- Nov 25: Finding similar items: Invited lecture by Dr. Harish Doraiswami  ==


* Lecture notes:
* Lecture notes:
Line 150: Line 140:
** Your final assignment is available at http://www.vistrails.org/index.php/Assignment_4_-_Querying_with_Pig_and_Mapreduce. This is an optional assignment and will count towards extra credit
** Your final assignment is available at http://www.vistrails.org/index.php/Assignment_4_-_Querying_with_Pig_and_Mapreduce. This is an optional assignment and will count towards extra credit


== Week 13 -- May 5: Graph Analysis and Exam Review ==
== Week 13 -- Dec 1: Graph Analysis and Exam Review ==


* Lecture notes:
* Lecture notes:
Line 156: Line 146:
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/exam-review.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/exam-review.pdf


== Week 14 -- May 12: Final Exam  ==
== Week 14 -- Dec 8: Final Exam  ==




== Week 15 -- May 19: Large-Scale Visualization -- Invited lecture by Dr. Lauro Lins (AT&T Research) ==
== Week 15 -- Dec 15: Large-Scale Visualization -- Invited lecture by Dr. Lauro Lins (AT&T Research) ==


* Lecture notes:
* Lecture notes:

Revision as of 01:57, 8 September 2014

CS-GY 6333 Massive Data Analysis: Tentative Schedule -- subject to change

  • Lecture: Mondays, 1:00pm-3:25pm at 2MTC, room 9.011.

News

  • Welcome!

Background (4 weeks)

Week 1 -- Sept 8: Course Overview; the evolution of Data Management

Week 2 -- Sept 15: Introduction to Databases

Week 3 -- Sept 22: Overview: Relational Model and SQL


Week 4 -- Sept 29: Overview: Advanced SQL and Query Optimization

Big Data Foundations and Infrastructure (2 weeks)

Week 5 -- Oct 6: Cloud computing, Map Reduce and Hadoop

  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2 - 2.1 and 2.2 (Large-Scale File Systems and Map-Reduce).
  • Homework Assignment -- Your first quiz is available on Gradiance. It is due on March 17th at 5pm.

Week 6 -- Oct 13: Algorithm Design for MapReduce

  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2.


Machine Learning and Big Data (3 weeks)

Week 7 -- Oct 20: Hashing and AllReduce

  • Invited lecture by John Langford

Week 8 -- Oct 27: Bandits

  • Invited lecture by John Langford

Week 9 -- Nov 3: Large Scale Machine Learning in the Real World

  • Invited lecture by Leon Bottou

Big Data Foundations and Infrastructure -- cont. (2 weeks)

Week 10 -- Nov 10: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages


Big Data Algorithms and Techniques (3 weeks)

Week 11 -- Nov 17: Data Management for Big Data (cont) and Association Rules

  • Homework Assignment -- Your quiz is available on Gradiance. It is due on April 28th.

Week 12 -- Nov 25: Finding similar items: Invited lecture by Dr. Harish Doraiswami

Week 13 -- Dec 1: Graph Analysis and Exam Review

Week 14 -- Dec 8: Final Exam

Week 15 -- Dec 15: Large-Scale Visualization -- Invited lecture by Dr. Lauro Lins (AT&T Research)

  • Reading:

The Value of Visualization, Jarke Van Wijk http://www.win.tue.nl/~vanwijk/vov.pdf

Tamara Munzner's Book draft 2 available online http://www.cs.ubc.ca/~tmm/courses/533/book/

Nanocubes Paper http://nanocubes.net http://nanocubes.net/assets/pdf/nanocubes_paper_preprint.pdf