Course: Big Data 2015

From VistrailsWiki
Revision as of 05:49, 26 January 2015 by Juliana (talk | contribs)
Jump to navigation Jump to search

DS-GA 1004- Big Data: Tentative Schedule -- subject to change

  • Lecture: Mondays, 4:55pm-7:35pm at Silver, room 208.
  • Some classes will include a lab session, please "always bring your laptop.

Background (4 weeks)

Week 1: Course Overview; The evolution of Data Management and introduction to Big Data

Week 2: Introduction to Databases, Relational Model and SQL

Week 3: Other Data Models and Query Optimization

  • Lab: SQL
  • Programming assignment: Using SQL for data analysis and cleaning

Week 4: Data Exploration and Reproducibility

  • Lab: VisTrails
  • Programming assignment: Exploring urban data


Big Data Foundations and Infrastructure (3 weeks)

Week 5: Cloud computing, Map Reduce and Hadoop

  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2 - 2.1 and 2.2 (Large-Scale File Systems and Map-Reduce).
  • Lab: Hands-on Hadoop
  • Homework Assignment -- Your first quiz is available on Gradiance. It is due on March 17th at 5pm.

Week 6: Algorithm Design for MapReduce

  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2.


Week 7: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages


Big Data Algorithms, Mining Techniques, and Visualization (3 weeks)

Week 8: Visualization and Spatio-Temporal Data -- Invited lecture by Dr. Huy Vo (NYU CUSP)


Week 9: Association Rules

  • Suggested additional reading:
    • Fast algorithms for mining association rules, Agrawal and Srikant, VLDB 1994.
    • Data Mining Concepts and Techniques, Jiawei Han and Micheline Kamber, Morgan Kaufmann
    • Dynamic Itemset Counting and Implication Rules for Market Basket Data. Brin et al., SIGMOD 1997. http://www-db.stanford.edu/~sergey/dic.html


Week 10: Finding similar items

Week 11: Graph Analysis

Week 12: TBD

Week 13: TBD

Week 14: Final Exam

Week 15: Project Presentations