Course: Big Data 2016

From VistrailsWiki
Revision as of 22:01, 23 January 2016 by Juliana (talk | contribs)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

DS-GA 1004- Big Data: Tentative Schedule -- subject to change

  • TAs:
    • Yuan Feng
    • Kevin Ye
  • Lecture: Mondays, 4:55pm-7:35pm at 19 University Pl., room 102.
  • Some classes will include a lab session, please always bring your laptop.

News

Week 1 - Jan 25: Course Overview

Week 2 - Feb 1: The evolution of Data Management and introduction to Big Data; Introduction to Databases, Relational Model and SQL

Week 3 - Feb 8: Introduction to Databases, Relational Model and SQL (cont.)

Week 4 - Feb 15: Holiday

Big Data Foundations and Infrastructure (3 weeks)

Week 5 - Feb 22: Introduction to Map Reduce

Week 6 - Feb 29: MapReduce Algorithm Design Patterns

Week 7 - March 7: Parallel Databases vs MapReduce; Storage Solutions; Introduction to SPARK

Week 8 -- March 14th: Spring Break

Transparency and Reproducibility (1 week)

Week 9 - March 21: Data Exploration and Reproducibility

  • Programming assignment 4: Exploring urban data (see NYU Classes)

Big Data Algorithms, Mining Techniques, and Visualization (6 weeks)

Week 10 - March 28th: Finding similar items

  • Homework Assignment
    • See quizzes on Gradiance -- Distance measures and document similarity.

Week 11 - April 4th: Association Rules


  • Suggested additional reading:
    • Fast algorithms for mining association rules, Agrawal and Srikant, VLDB 1994.
    • Data Mining Concepts and Techniques, Jiawei Han and Micheline Kamber, Morgan Kaufmann
    • Dynamic Itemset Counting and Implication Rules for Market Basket Data. Brin et al., SIGMOD 1997. http://www-db.stanford.edu/~sergey/dic.html
  • Homework Assignment
    • See quizes on Gradiance -- Distance measures and document similarity.

Week 12 - April 11th: Visualization and Spatio-Temporal Data -- Invited lecture by Dr. Harish Doraiswamy (NYU CUSP)

Week 13 - April 18th: Parallel Databases

Week 14 - April 25th: Graph Analysis

  • Required Reading: Data-Intensive Text Processing with MapReduce. Chapters 5 -- Graph Algorithms

Week 15 - May 2: Final Exam

Week 16 - May 9: Project Presentations

Week 17 - May 16: Project Presentations