Difference between revisions of "Course: Massive Data Analysis 2014"

From VistrailsWiki
Jump to navigation Jump to search
Line 15: Line 15:
== Week 1 -- Sept 8: Course Overview; the evolution of Data Management ==
== Week 1 -- Sept 8: Course Overview; the evolution of Data Management ==


* Lecture notes:  http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/course-overview.pdf
* Lecture notes:  http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/course-overview.pdf
* Reading: Chapter 1 of Mining of Massive Data Sets (version 1.1)
* Reading: Chapter 1 of Mining of Massive Data Sets (version 1.1)
* Course survey: https://docs.google.com/spreadsheet/embeddedform?formkey=dFpwTjROVzhLUWY2NVNXb0xvNTVLMnc6MA
* Course survey: https://docs.google.com/spreadsheet/embeddedform?formkey=dFpwTjROVzhLUWY2NVNXb0xvNTVLMnc6MA


== Week 2 -- Sept 15: Introduction to Databases ==
== Week 2 -- Sept 15: Introduction to Databases ==
* Lecture notes:  http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/intro-to-db.pdf
* Lecture notes:  http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/intro-to-db.pdf
* Other useful reading:  
* Other useful reading:  
** [http://philip.greenspun.com/sql/introduction.html Greenspun's SQL for Web Nerds Intro]
** [http://philip.greenspun.com/sql/introduction.html Greenspun's SQL for Web Nerds Intro]
Line 29: Line 29:
== Week 3 -- Sept 22: Overview: Relational Model and SQL  ==
== Week 3 -- Sept 22: Overview: Relational Model and SQL  ==
* Lecture notes:   
* Lecture notes:   
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/relational-algebra.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/relational-algebra.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/sql-intro.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/sql-intro.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/sql-more.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/sql-more.pdf
* Other useful reading:  
* Other useful reading:  
** [http://philip.greenspun.com/sql/introduction.html Greenspun's SQL for Web Nerds Intro]
** [http://philip.greenspun.com/sql/introduction.html Greenspun's SQL for Web Nerds Intro]
Line 40: Line 40:


* Lecture notes:   
* Lecture notes:   
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/xml_schema_query.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/xml_schema_query.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/query-opt.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/query-opt.pdf


* Homework assignment: [[Assignment 2 - Data Exploration using SQL]]
* Homework assignment: [[Assignment 2 - Data Exploration using SQL]]
Line 49: Line 49:
== Week 5 -- Oct 6: Cloud computing, Map Reduce and  Hadoop ==
== Week 5 -- Oct 6: Cloud computing, Map Reduce and  Hadoop ==
* Lecture notes:   
* Lecture notes:   
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/mapreduce-intro.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/mapreduce-intro.pdf


* Required reading:  
* Required reading:  
Line 67: Line 67:


* Lecture notes:   
* Lecture notes:   
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/mapreduce-algo-design.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/mapreduce-algo-design.pdf


* Required reading:  
* Required reading:  
Line 77: Line 77:


* Lecture notes:
* Lecture notes:
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/paralleldb-vs-hadoop-2014.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/paralleldb-vs-hadoop-2014.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/hive-pig.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/hive-pig.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/data-analysis-mapreduce.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/data-analysis-mapreduce.pdf


* Required reading:  
* Required reading:  
** Data-Intensive Text Processing with MapReduce (Jan 27, 2013), Chapter 6 -- Processing Relational Data (this chapter appears in the 2013 version of the textbook -- I have placed this version in http://vgc.poly.edu/~juliana/courses/BigData2014/Textbooks/MapReduce-algorithms-Jan2013-draft.pdf)
** Data-Intensive Text Processing with MapReduce (Jan 27, 2013), Chapter 6 -- Processing Relational Data (this chapter appears in the 2013 version of the textbook -- I have placed this version in http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Textbooks/MapReduce-algorithms-Jan2013-draft.pdf)
** Benchmark DBMS vs MapReduce (2009): http://database.cs.brown.edu/sigmod09/benchmarks-sigmod09.pdf
** Benchmark DBMS vs MapReduce (2009): http://database.cs.brown.edu/sigmod09/benchmarks-sigmod09.pdf
** MapReduce: A Flexible Data Processing Tool: http://cacm.acm.org/magazines/2010/1/55744-mapreduce-a-flexible-data-processing-tool/fulltext
** MapReduce: A Flexible Data Processing Tool: http://cacm.acm.org/magazines/2010/1/55744-mapreduce-a-flexible-data-processing-tool/fulltext
Line 97: Line 97:


* Lecture notes:
* Lecture notes:
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/association-rules.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/association-rules.pdf


* Reading: Chapter 6 [http://vgc.poly.edu/~juliana/courses/BigData2014/Textbooks/ullman-book-v1.1-mining-massive-data.pdf Mining of Massive Datasets]  
* Reading: Chapter 6 [http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Textbooks/ullman-book-v1.1-mining-massive-data.pdf Mining of Massive Datasets]  


* Homework Assignment -- Your  quiz is available on [http://www.newgradiance.com Gradiance]. It is ''due on April  28th.''
* Homework Assignment -- Your  quiz is available on [http://www.newgradiance.com Gradiance]. It is ''due on April  28th.''
Line 109: Line 109:


* Lecture notes:
* Lecture notes:
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/similarity.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/similarity.pdf


* Reading: Chapter 3 [http://vgc.poly.edu/~juliana/courses/BigData2014/Textbooks/ullman-book-v1.1-mining-massive-data.pdf Mining of Massive Datasets]  
* Reading: Chapter 3 [http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Textbooks/ullman-book-v1.1-mining-massive-data.pdf Mining of Massive Datasets]  


* Homework Assignment
* Homework Assignment
Line 121: Line 121:


* Lecture notes:
* Lecture notes:
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/graph-algos.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/graph-algos.pdf




Line 127: Line 127:


* Lecture notes:
* Lecture notes:
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/intro-to-visualization.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/intro-to-visualization.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/nanocubes.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/nanocubes.pdf


* Reading:  
* Reading:  

Revision as of 02:57, 8 September 2014

CS-GY 6333 Massive Data Analysis: Tentative Schedule -- subject to change

  • Lecture: Mondays, 1:00pm-3:25pm at 2MTC, room 9.011.

News

  • Welcome!

Background (4 weeks)

Week 1 -- Sept 8: Course Overview; the evolution of Data Management

Week 2 -- Sept 15: Introduction to Databases

Week 3 -- Sept 22: Overview: Relational Model and SQL


Week 4 -- Sept 29: Overview: Advanced SQL and Query Optimization

Big Data Foundations and Infrastructure (3 weeks)

Week 5 -- Oct 6: Cloud computing, Map Reduce and Hadoop

  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2 - 2.1 and 2.2 (Large-Scale File Systems and Map-Reduce).
  • Homework Assignment -- Your first quiz is available on Gradiance. It is due on March 17th at 5pm.

Week 6 -- Oct 13: Fall Break

Week 7 -- Oct 20: Algorithm Design for MapReduce

  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2.


Week 8 -- Oct 27: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages



Big Data Algorithms and Techniques (3 weeks)

Week 9 -- Nov 3: Association Rules

  • Homework Assignment -- Your quiz is available on Gradiance. It is due on April 28th.



Week 10 -- Nov 10: Finding similar items


Week 11 -- Nov 17: Graph Analysis


Week 12 -- Nov 25: Large-Scale Visualization -- Invited lecture by Dr. Lauro Lins (AT&T Research)

  • Reading:

The Value of Visualization, Jarke Van Wijk http://www.win.tue.nl/~vanwijk/vov.pdf

Tamara Munzner's Book draft 2 available online http://www.cs.ubc.ca/~tmm/courses/533/book/

Nanocubes Paper http://nanocubes.net http://nanocubes.net/assets/pdf/nanocubes_paper_preprint.pdf


== Week 13 -- Dec 1:

Week 14 -- Dec 8: Final Exam

== Week 15 -- Dec 15: