VisTrails Home

Course: Big Data 2016

From VisTrailsWiki

(Difference between revisions)
Jump to: navigation, search
(Week 2 - Feb 1: The evolution of Data Management and introduction to Big Data; Introduction to Databases, Relational Model and SQL)
Line 28: Line 28:
* '''Course survey:''' https://docs.google.com/forms/d/1LTiJwkDVvp0cF62Fw_d9Y86US5LCkorRUIQtV2T8KWE/viewform?usp=send_form
* '''Course survey:''' https://docs.google.com/forms/d/1LTiJwkDVvp0cF62Fw_d9Y86US5LCkorRUIQtV2T8KWE/viewform?usp=send_form
-
== Week 2 - Feb 1:  The evolution of Data Management and introduction to Big Data; Introduction to Databases, Relational Model and SQL==
+
== Week 2 - Feb 1:  The evolution of Data Management and introduction to Big Data; Introduction to Databases and Relational Model ==
* '''Lecture notes:'''
* '''Lecture notes:'''
-
** http://vgc.poly.edu/~juliana/courses/BigData2016/Lectures/datamanagement.pdf
+
** http://vgc.poly.edu/~juliana/courses/BigData2016/Lectures/data-management-evolution.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2016/Lectures/intro-db.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2016/Lectures/intro-db.pdf
-
* '''Lab:''' in-class assignment on relational algebra
+
* '''Lab:''' getting started with MySQL
-
* '''Readings:'''  
+
* '''Required Reading:'''
 +
** Chapter 1 of Mining of Massive Data Analysis
 +
* '''Suggested Reading:'''  
** [http://philip.greenspun.com/sql/introduction.html Greenspun's SQL for Web Nerds Intro]
** [http://philip.greenspun.com/sql/introduction.html Greenspun's SQL for Web Nerds Intro]
** [http://philip.greenspun.com/sql/data-modeling.html SQL/Nerds Modeling (parts)]
** [http://philip.greenspun.com/sql/data-modeling.html SQL/Nerds Modeling (parts)]
 +
** [https://docs.google.com/file/d/0B7lNUaak0bK1NDBWZU5XTmItdGc/edit History Repeats Itself: Sensible and NonsenSQL Aspects of the NoSQL Hoopla], by C. Mohan, EDBT 2013
== Week 3 - Feb 8: Introduction to Databases, Relational Model and SQL (cont.) ==
== Week 3 - Feb 8: Introduction to Databases, Relational Model and SQL (cont.) ==

Revision as of 13:02, 1 February 2016

Contents

DS-GA 1004- Big Data: Tentative Schedule -- subject to change

  • TAs:
    • Yuan Feng
    • Kevin Ye
  • Lecture: Mondays, 4:55pm-7:35pm at 19 University Pl., room 102.
  • Some classes will include a lab session, please always bring your laptop.

News

Week 1 - Jan 25: Course Overview

Week 2 - Feb 1: The evolution of Data Management and introduction to Big Data; Introduction to Databases and Relational Model

Week 3 - Feb 8: Introduction to Databases, Relational Model and SQL (cont.)

Week 4 - Feb 15: Holiday

Big Data Foundations and Infrastructure (3 weeks)

Week 5 - Feb 22: Introduction to Map Reduce

Week 6 - Feb 29: MapReduce Algorithm Design Patterns

Week 7 - March 7: Parallel Databases vs MapReduce; Storage Solutions; Introduction to SPARK

Week 8 -- March 14th: Spring Break

Transparency and Reproducibility (1 week)

Week 9 - March 21: Data Exploration and Reproducibility

Big Data Algorithms, Mining Techniques, and Visualization (6 weeks)

Week 10 - March 28th: Finding similar items

  • Homework Assignment
    • See quizzes on Gradiance -- Distance measures and document similarity.

Week 11 - April 4th: Association Rules


  • Suggested additional reading:
    • Fast algorithms for mining association rules, Agrawal and Srikant, VLDB 1994.
    • Data Mining Concepts and Techniques, Jiawei Han and Micheline Kamber, Morgan Kaufmann
    • Dynamic Itemset Counting and Implication Rules for Market Basket Data. Brin et al., SIGMOD 1997. http://www-db.stanford.edu/~sergey/dic.html
  • Homework Assignment
    • See quizes on Gradiance -- Distance measures and document similarity.

Week 12 - April 11th: Visualization and Spatio-Temporal Data -- Invited lecture by Dr. Harish Doraiswamy (NYU CDS)

Week 13 - April 18th: Data Cleaning - Invited lecture by Dr. Divesh Srivastava, AT&T Research

Week 14 - April 25th: Graph Analysis

  • Required Reading: Data-Intensive Text Processing with MapReduce. Chapters 5 -- Graph Algorithms

Week 15 - May 2: TBD

Week 16 - May 9: Final Exam

Week 17 - May 16: Project Presentations

Personal tools