Course: Big Data 2015

From VistrailsWiki
Revision as of 05:45, 26 January 2015 by Juliana (talk | contribs) (Created page with '= DS-GA 1004- Big Data: Tentative Schedule -- ''subject to change'' = * Course Web page: http://vgc.poly.edu/~juliana/courses/BigData2015 * Instructor: Professor Juliana Freire…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

DS-GA 1004- Big Data: Tentative Schedule -- subject to change

  • Lecture: Mondays, 4:55pm-7:35pm at Silver, room 208.
  • Some classes will include a lab session, please "always bring your laptop.

Background (4 weeks)

Week 1 Course Overview; The evolution of Data Management and introduction to Big Data

Week 2 Introduction to Databases, Relational Model and SQL

Week 3 Other Data Models and Query Optimization

  • Lab: SQL
  • Programming assignment: Using SQL for data analysis and cleaning

Week 4 Data Exploration and Reproducibility

  • Lab: VisTrails
  • Programming assignment: Exploring urban data


Big Data Foundations and Infrastructure (3 weeks)

Week 5 -- Cloud computing, Map Reduce and Hadoop

  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2 - 2.1 and 2.2 (Large-Scale File Systems and Map-Reduce).
  • Lab: Hands-on Hadoop
  • Homework Assignment -- Your first quiz is available on Gradiance. It is due on March 17th at 5pm.

Week 6 -- Algorithm Design for MapReduce

  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2.


Week 7 -- : Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages


Big Data Algorithms, Mining Techniques, and Visualization (3 weeks)

Week 8 -- Visualization and Spatio-Temporal Data -- Invited lecture by Dr. Huy Vo (NYU CUSP)


Week 9 -- Association Rules

  • Suggested additional reading:
    • Fast algorithms for mining association rules, Agrawal and Srikant, VLDB 1994.
    • Data Mining Concepts and Techniques, Jiawei Han and Micheline Kamber, Morgan Kaufmann
    • Dynamic Itemset Counting and Implication Rules for Market Basket Data. Brin et al., SIGMOD 1997. http://www-db.stanford.edu/~sergey/dic.html


Week 10 --- Finding similar items

Week 13 -- May 5: Graph Analysis

Week 14 -- May 12: Final Exam

Week 15 -- Project Presentations