Difference between revisions of "Course: Big Data 2015"

From VistrailsWiki
Jump to navigation Jump to search
Line 10: Line 10:
= Background (4 weeks) =
= Background (4 weeks) =


== Week 1:  Course Overview; The evolution of Data Management and introduction to Big Data ==
== Week 1 - Feb 2:  Course Overview; The evolution of Data Management and introduction to Big Data ==


* Lecture notes:  http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/course-overview.pdf
* Lecture notes:  http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/course-overview.pdf
Line 42: Line 42:


* Programming assignment: Exploring urban data
* Programming assignment: Exploring urban data


= Big Data Foundations and Infrastructure (3 weeks) =
= Big Data Foundations and Infrastructure (3 weeks) =

Revision as of 16:40, 2 February 2015

DS-GA 1004- Big Data: Tentative Schedule -- subject to change

  • Lecture: Mondays, 4:55pm-7:35pm at Silver, room 208.
  • Some classes will include a lab session, please "always bring your laptop.

Background (4 weeks)

Week 1 - Feb 2: Course Overview; The evolution of Data Management and introduction to Big Data

Week 2: Introduction to Databases, Relational Model and SQL

Week 3: Other Data Models and Query Optimization

  • Lab: SQL
  • Programming assignment: Using SQL for data analysis and cleaning

Week 4: Data Exploration and Reproducibility

  • Lab: VisTrails
  • Programming assignment: Exploring urban data

Big Data Foundations and Infrastructure (3 weeks)

Week 5: Cloud computing, Map Reduce and Hadoop

  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2 - 2.1 and 2.2 (Large-Scale File Systems and Map-Reduce).
  • Lab: Hands-on Hadoop
  • Homework Assignment -- Your first quiz is available on Gradiance. It is due on March 17th at 5pm.

Week 6: Algorithm Design for MapReduce

  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2.


Week 7: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages


Big Data Algorithms, Mining Techniques, and Visualization (6 weeks)

Week 8: Visualization and Spatio-Temporal Data -- Invited lecture by Dr. Huy Vo (NYU CUSP)


Week 9: Association Rules

  • Suggested additional reading:
    • Fast algorithms for mining association rules, Agrawal and Srikant, VLDB 1994.
    • Data Mining Concepts and Techniques, Jiawei Han and Micheline Kamber, Morgan Kaufmann
    • Dynamic Itemset Counting and Implication Rules for Market Basket Data. Brin et al., SIGMOD 1997. http://www-db.stanford.edu/~sergey/dic.html


Week 10: Finding similar items

Week 11: Graph Analysis

Week 12: TBD

Week 13: TBD

Week 14: Final Exam

Week 15: Project Presentations