Difference between revisions of "Course: Big Data 2016"

From VistrailsWiki
Jump to navigation Jump to search
Line 50: Line 50:


== Week 4 - Feb 15: Holiday ==
== Week 4 - Feb 15: Holiday ==
= Transparency and Reproducibility  (1 week) =
== Week 5 - Feb 22: Data Exploration and Reproducibility  ==
* '''Lecture notes:'''  http://vgc.poly.edu/~juliana/courses/BigData2016/Lectures/data-science-reproducibility.pdf
* '''Lab:''' Hands-on reproducibility.
* '''Programming assignment:''' Exploring urban data (see NYU Classes)


= Big Data Foundations and Infrastructure (3 weeks) =
= Big Data Foundations and Infrastructure (3 weeks) =


== Week 5 - Feb 22:  Introduction to Map Reduce ==
== Week 6 - Feb 29:  Introduction to Map Reduce ==


*''' Lecture notes:''' http://vgc.poly.edu/~juliana/courses/BigData2016/Lectures/mapreduce-intro.pdf
*''' Lecture notes:''' http://vgc.poly.edu/~juliana/courses/BigData2016/Lectures/mapreduce-intro.pdf
Line 59: Line 68:
* Quiz 1 (Map Reduce) -- check http://www.newgradiance.com/services
* Quiz 1 (Map Reduce) -- check http://www.newgradiance.com/services


== Week 6 - Feb 29: MapReduce Algorithm Design Patterns  ==
== Week 7 - March 7: MapReduce Algorithm Design Patterns  ==


*''' Lecture notes:''' http://vgc.poly.edu/~juliana/courses/BigData2016/Lectures/mapreduce-algo-design.pdf
*''' Lecture notes:''' http://vgc.poly.edu/~juliana/courses/BigData2016/Lectures/mapreduce-algo-design.pdf
Line 68: Line 77:
** Data-Intensive Text Processing with MapReduce (Jan 27, 2013), Chapter 6 -- Processing Relational Data (this chapter appears in the 2013 version of the textbook -- http://lintool.github.io/MapReduceAlgorithms/ed1n/MapReduce-algorithms.pdf)
** Data-Intensive Text Processing with MapReduce (Jan 27, 2013), Chapter 6 -- Processing Relational Data (this chapter appears in the 2013 version of the textbook -- http://lintool.github.io/MapReduceAlgorithms/ed1n/MapReduce-algorithms.pdf)


== Week 7 - March 7: Parallel Databases vs MapReduce; Storage Solutions; Introduction to SPARK==  
 
== Week 8-- March 14th: Spring Break ==
 
 
== Week 9- March 21st: Parallel Databases vs MapReduce; Storage Solutions; Introduction to SPARK==  


*''' Lecture notes:''' ** http://vgc.poly.edu/~juliana/courses/BigData2016/Lectures/paralleldb-vs-hadoop.pdf
*''' Lecture notes:''' ** http://vgc.poly.edu/~juliana/courses/BigData2016/Lectures/paralleldb-vs-hadoop.pdf
Line 82: Line 95:
** BigTable: http://fcoffice.googlecode.com/svn/%E4%B9%A6%E7%B1%8D/bigtable-osdi06.pdf
** BigTable: http://fcoffice.googlecode.com/svn/%E4%B9%A6%E7%B1%8D/bigtable-osdi06.pdf
** Spark: Cluster Computing with Working Sets. http://static.usenix.org/legacy/events/hotcloud10/tech/full_papers/Zaharia.pdf
** Spark: Cluster Computing with Working Sets. http://static.usenix.org/legacy/events/hotcloud10/tech/full_papers/Zaharia.pdf
== Week 8 -- March 14th: Spring Break ==
= Transparency and Reproducibility  (1 week) =
== Week 9 - March 21: Data Exploration and Reproducibility  ==
* '''Lecture notes:'''  http://vgc.poly.edu/~juliana/courses/BigData2016/Lectures/data-science-reproducibility.pdf
* '''Lab:''' Hands-on reproducibility.
* '''Programming assignment:''' Exploring urban data (see NYU Classes)


= Big Data Algorithms, Mining Techniques, and Visualization (6 weeks) =
= Big Data Algorithms, Mining Techniques, and Visualization (6 weeks) =

Revision as of 18:40, 22 February 2016

DS-GA 1004- Big Data: Tentative Schedule -- subject to change

  • TAs:
    • Yuan Feng
    • Kevin Ye
  • Lecture: Mondays, 4:55pm-7:35pm at Silver 207
  • Some classes will include a lab session, please always bring your laptop.

News

Week 1 - Jan 25: Course Overview

Week 2 - Feb 1: The evolution of Data Management and introduction to Big Data; Introduction to Databases and Relational Model

Week 3 - Feb 8: Introduction to Databases, Relational Model and SQL (cont.)

Week 4 - Feb 15: Holiday

Transparency and Reproducibility (1 week)

Week 5 - Feb 22: Data Exploration and Reproducibility


Big Data Foundations and Infrastructure (3 weeks)

Week 6 - Feb 29: Introduction to Map Reduce

Week 7 - March 7: MapReduce Algorithm Design Patterns


Week 8-- March 14th: Spring Break

Week 9- March 21st: Parallel Databases vs MapReduce; Storage Solutions; Introduction to SPARK

Big Data Algorithms, Mining Techniques, and Visualization (6 weeks)

Week 10 - March 28th: Finding similar items

  • Homework Assignment
    • See quizzes on Gradiance -- Distance measures and document similarity.

Week 11 - April 4th: Association Rules


  • Suggested additional reading:
    • Fast algorithms for mining association rules, Agrawal and Srikant, VLDB 1994.
    • Data Mining Concepts and Techniques, Jiawei Han and Micheline Kamber, Morgan Kaufmann
    • Dynamic Itemset Counting and Implication Rules for Market Basket Data. Brin et al., SIGMOD 1997. http://www-db.stanford.edu/~sergey/dic.html
  • Homework Assignment
    • See quizes on Gradiance -- Distance measures and document similarity.

Week 12 - April 11th: Visualization and Spatio-Temporal Data -- Invited lecture by Dr. Harish Doraiswamy (NYU CDS)

Week 13 - April 18th: Data Cleaning - Invited lecture by Dr. Divesh Srivastava, AT&T Research

Week 14 - April 25th: Graph Analysis

  • Required Reading: Data-Intensive Text Processing with MapReduce. Chapters 5 -- Graph Algorithms

Week 15 - May 2: TBD

Week 16 - May 9: Final Exam

Week 17 - May 16: Project Presentations