Difference between revisions of "Course: Massive Data Analysis 2014"

From VistrailsWiki
Jump to navigation Jump to search
Line 20: Line 20:


== Week 2 -- Sept 15: Provenance and Reproducibility ==
== Week 2 -- Sept 15: Provenance and Reproducibility ==
* Lecture notes:  http://vgc.poly.edu/~fchirigati/mda-class/provenance-reproducibility.pdf
* The class will have a lab component. Please bring your laptops.
* The class will have a lab component. Please bring your laptops.
* Before class, follow the instructions below to install and set up VisTrails as well as github
* Before class, follow the instructions below to install and set up VisTrails as well as github
Line 26: Line 28:
** Download VisTrails 2.1.4 from http://www.vistrails.org/index.php/Downloads and follow the installation instructions. Start the system and then quit.
** Download VisTrails 2.1.4 from http://www.vistrails.org/index.php/Downloads and follow the installation instructions. Start the system and then quit.
** Download the following packages:
** Download the following packages:
***http://vgc.poly.edu/~fchirigati/gmaps.zip.
***http://vgc.poly.edu/~fchirigati/mda-class/gmaps.zip.
***http://vgc.poly.edu/~fchirigati/tabledata-backport.zip
***http://vgc.poly.edu/~fchirigati/mda-class/tabledata-backport.zip
** After you extract the content of the zip files, place them under $HOME/.vistrails/userpackages  
** After you extract the content of the zip files, place them under $HOME/.vistrails/userpackages  


Line 52: Line 54:
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/xml_schema_query.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/xml_schema_query.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/query-opt.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/query-opt.pdf


= Big Data Foundations and Infrastructure (3 weeks) =
= Big Data Foundations and Infrastructure (3 weeks) =

Revision as of 20:15, 13 September 2014

CS-GY 6333 Massive Data Analysis: Tentative Schedule -- subject to change

  • Lecture: Mondays, 1:00pm-3:25pm at 2MTC, room 9.011.

News

  • Welcome!

Background (4 weeks)

Week 1 -- Sept 8: Course Overview; the evolution of Data Management

Week 2 -- Sept 15: Provenance and Reproducibility

  • Github setup:

Week 3 -- Sept 22: Introduction to Databases; Relational Model and SQL


Week 4 -- Sept 29: Overview: Advanced SQL and Query Optimization

Big Data Foundations and Infrastructure (3 weeks)

Week 5 -- Oct 6: Cloud computing, Map Reduce and Hadoop

  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2 - 2.1 and 2.2 (Large-Scale File Systems and Map-Reduce).


Week 6 -- Oct 13: Fall Break

Week 7 -- Oct 20: Algorithm Design for MapReduce

  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2.


Week 8 -- Oct 27: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages



Big Data Algorithms and Techniques (3 weeks)

Week 9 -- Nov 3: Association Rules


Week 10 -- Nov 10: Finding similar items


Week 11 -- Nov 17: Graph Analysis


Week 12 -- Nov 25: Large-Scale Visualization -- Invited lecture by Dr. Lauro Lins (AT&T Research)

  • Reading:

The Value of Visualization, Jarke Van Wijk http://www.win.tue.nl/~vanwijk/vov.pdf

Tamara Munzner's Book draft 2 available online http://www.cs.ubc.ca/~tmm/courses/533/book/

Nanocubes Paper http://nanocubes.net http://nanocubes.net/assets/pdf/nanocubes_paper_preprint.pdf


Week 13 -- Dec 1:

Week 14 -- Dec 8: Project Presentations

Week 15 -- Dec 15: Project Presentations