VisTrails Home

Course: Big Data 2016

From VisTrailsWiki

(Difference between revisions)
Jump to: navigation, search
Juliana (Talk | contribs)
(Created page with '= DS-GA 1004- Big Data: Tentative Schedule -- ''subject to change'' = * Course Web page: * Instructor: Professor Juliana Freire…')
Newer edit →

Revision as of 16:27, 6 January 2016


DS-GA 1004- Big Data: Tentative Schedule -- subject to change

  • Lecture: Mondays, 4:55pm-7:35pm at 19 University Pl., room 102.
  • Some classes will include a lab session, please "always bring your laptop.


  • 1/25/2016: Amazon has kindly donated time on AWS for all the student in this class. To obtain your credit, please follow the instructions at
  • 1/25/2016: Access you NYU HPC account, which you will use for in-class exercises and homework assignments. See instructions at NYU Hadoop

Background (2 weeks)

Week 1 - Feb 2: Course Overview; The evolution of Data Management and introduction to Big Data

Week 2 - Feb 9: Introduction to Databases, Relational Model and SQL

  • Programming assignment: Using SQL for data analysis and cleaning (see NYU Classes)

Feb 16: Holiday

Big Data Foundations and Infrastructure (3 weeks)

Week 3 - Feb 23: Introduction to Map Reduce

Week 4 - March 2: Algorithm Design for MapReduce: Relational Operations

  • Lab: Hands-on Hadoop (local)
  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2.
  • Programming assignment: Map Reduce (check NYU Classes)

Week 5 - March 9: MapReduce Algorithm Design Patterns; Parallel Databases vs MapReduce

  • Programming assignment: check NYU Classes on March 10th

March 16th: Spring Break

Transparency and Reproducibility (1 week)

Week 6 - March 23: Data Exploration and Reproducibility

  • Programming assignment 4: Exploring urban data (see NYU Classes)

Big Data Algorithms, Mining Techniques, and Visualization (6 weeks)

Week 7 - March 30th: Finding similar items

  • Homework Assignment
    • See quizzes on Gradiance -- Distance measures and document similarity.

Week 8 - April 6th: Association Rules

  • Suggested additional reading:
    • Fast algorithms for mining association rules, Agrawal and Srikant, VLDB 1994.
    • Data Mining Concepts and Techniques, Jiawei Han and Micheline Kamber, Morgan Kaufmann
    • Dynamic Itemset Counting and Implication Rules for Market Basket Data. Brin et al., SIGMOD 1997.
  • Homework Assignment
    • See quizes on Gradiance -- Distance measures and document similarity.

Week 9 - April 13th: Visualization and Spatio-Temporal Data -- Invited lecture by Dr. Harish Doraiswamy (NYU CUSP)

Week 10 - April 20th: Parallel Databases

Week 11 - April 27th: Graph Analysis

  • Required Reading: Data-Intensive Text Processing with MapReduce. Chapters 5 -- Graph Algorithms

Week 12 - May 4: Final Exam

Week 13 - May 11: Project Presentations

Week 14 - May 18: Project Presentations

Personal tools