Difference between revisions of "Course: Massive Data Analysis 2014"

From VistrailsWiki
Jump to navigation Jump to search
Line 9: Line 9:
= News =
= News =


* The final exam will take place on May 12th.
* Welcome!
 
* We will have our last class on May 19th.
 
* 4/21/2014: There are two new quizes on gradiance. They are due on 2014-04-28 23:59 PST.
 
* Homework assignment 4 has been posted: [[Assignment 4 - Querying with Pig and Mapreduce]]
 
* Homework assignment 3 has been posted: [[Assignment 3 - MapReduce algorithm design]]
** You can find instructions on how to log into the NYU Hadoop cluster at: http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/MapReduceExample/readme-nyu-hadoop.txt
** I have created a list of frequently-asked questions which I hope will help you with your assignment: [[Assignment 3 - FAQ]]
 
* Your first assignment has been posted and it is due on Feb 17, 2014 5:00 pm. Here are the instructions: http://vistrails.org/index.php/Assignment_1_-_Data_Exploration
 
* I have sent a test email to the class list. If you have not received the message, make sure to sign up:  http://www.cs.nyu.edu/mailman/listinfo/csci_ga_2568_001_sp14
 
* Starting on Feb 10th, our class will meet at a new location: Cantor 101
 
* We will have lab on Thu at CIWW, room 109. ''Bring your laptop!''


= Background (4 weeks) =
= Background (4 weeks) =

Revision as of 01:44, 8 September 2014

CS-GY 6333 Massive Data Analysis: Tentative Schedule -- subject to change

  • Lecture: Mondays, 1:00pm-3:25pm at 2MTC, room 9.011.

News

  • Welcome!

Background (4 weeks)

Week 1 -- Jan 27: Course Overview; the evolution of Data Management


Week 2 -- Feb 3: Introduction to Databases

Week 3 -- Feb 10: Overview: Relational Model and SQL

  • Feb 13: Lab: Canceled -- University closed due to snow ==


Week 3.1 -- Feb 17: Holiday

Week 4 -- Feb 24: Overview: Advanced SQL and Query Optimization

Big Data Foundations and Infrastructure (2 weeks)

Week 5 -- Mar 3: Cloud computing, Map Reduce and Hadoop

  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2 - 2.1 and 2.2 (Large-Scale File Systems and Map-Reduce).
  • Homework Assignment -- Your first quiz is available on Gradiance. It is due on March 17th at 5pm.

Week 6 -- Mar 10: Algorithm Design for MapReduce

  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2.


Machine Learning and Big Data (3 weeks)

Week 7 -- Mar 23: Hashing and AllReduce

  • Invited lecture by John Langford

Week 8 -- Mar 30: Bandits

  • Invited lecture by John Langford

Week 9 -- Apr 7: Large Scale Machine Learning in the Real World

  • Invited lecture by Leon Bottou

Big Data Foundations and Infrastructure -- cont. (2 weeks)

Week 10 -- April 14: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages


Big Data Algorithms and Techniques (3 weeks)

Week 11 -- April 21: Data Management for Big Data (cont) and Association Rules

  • Homework Assignment -- Your quiz is available on Gradiance. It is due on April 28th.

Week 12 -- Apr 28: Finding similar items: Invited lecture by Dr. Harish Doraiswami

Week 13 -- May 5: Graph Analysis and Exam Review

Week 14 -- May 12: Final Exam

Week 15 -- May 19: Large-Scale Visualization -- Invited lecture by Dr. Lauro Lins (AT&T Research)

  • Reading:

The Value of Visualization, Jarke Van Wijk http://www.win.tue.nl/~vanwijk/vov.pdf

Tamara Munzner's Book draft 2 available online http://www.cs.ubc.ca/~tmm/courses/533/book/

Nanocubes Paper http://nanocubes.net http://nanocubes.net/assets/pdf/nanocubes_paper_preprint.pdf