VisTrails Home

Course: Big Data 2016

From VisTrailsWiki

(Difference between revisions)
Jump to: navigation, search
(DS-GA 1004- Big Data: Tentative Schedule -- subject to change)
Line 21: Line 21:
* 1/25/2016: Access you NYU HPC account, which you will use for in-class exercises and homework assignments. See  [[NYU HPC Access Instructions]]
* 1/25/2016: Access you NYU HPC account, which you will use for in-class exercises and homework assignments. See  [[NYU HPC Access Instructions]]
-
== Week 1 - Jan 25:  Course Overview; The evolution of Data Management and introduction to Big Data ==
+
== Week 1 - Jan 25:  Course Overview; Lab: Computing infrastructure for the course ==
-
 
+
* Lecture notes:  http://vgc.poly.edu/~juliana/courses/BigData2016/Lectures/course-overview.pdf
-
== Week 2 - Feb 1:  Course Overview; The evolution of Data Management and introduction to Big Data ==
+
-
 
+
-
* Lecture notes:  http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/course-overview.pdf
+
* Reading: Chapter 1 of Mining of Massive Data Sets (version 1.1)
* Reading: Chapter 1 of Mining of Massive Data Sets (version 1.1)
* Course survey: https://docs.google.com/forms/d/1LTiJwkDVvp0cF62Fw_d9Y86US5LCkorRUIQtV2T8KWE/viewform?usp=send_form
* Course survey: https://docs.google.com/forms/d/1LTiJwkDVvp0cF62Fw_d9Y86US5LCkorRUIQtV2T8KWE/viewform?usp=send_form
-
== Week 3 - Feb 8: Introduction to Databases, Relational Model and SQL ==
+
== Week 2 - Feb 1: The evolution of Data Management and introduction to Big Data; Introduction to Databases, Relational Model and SQL==
-
* Lecture notes: 
+
-
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/intro-to-db.pdf
+
-
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/relational-algebra.pdf
+
-
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/sql-intro.pdf
+
-
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/sql-more.pdf
+
-
* Lab:
+
* In-class assignment: relational algebra
-
** SQL hands on: [[Big Data 2015 - SQL Lab]]
+
-
* Other useful reading:  
+
== Week 3 - Feb 8: Introduction to Databases, Relational Model and SQL (cont.) ==
-
** [http://philip.greenspun.com/sql/introduction.html Greenspun's SQL for Web Nerds Intro]
+
-
** [http://philip.greenspun.com/sql/data-modeling.html SQL/Nerds Modeling (parts)]
+
-
* Programming assignment: Using SQL for data analysis and cleaning (see NYU Classes)
+
* Lab: SQL
 +
* Programming assignment: Using SQL for data analysis and cleaning  
== Week 4 - Feb 15: Holiday ==
== Week 4 - Feb 15: Holiday ==
Line 51: Line 41:
== Week 5 - Feb 22:  Introduction to Map Reduce ==
== Week 5 - Feb 22:  Introduction to Map Reduce ==
-
* Lab: (continuation)
 
-
** SQL hands on: [[Big Data 2015 - SQL Lab]]
 
-
* Lecture notes: 
 
-
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/mapreduce-intro.pdf
 
-
* Required Reading:
 
-
** Data-Intensive Text Processing with MapReduce. Chapters 1 and 2
 
-
** Mining of Massive Datasets (v 2.1).  Chapter 2 - 2.1, 2.2, and 2.3
 
-
* Other useful reading:
 
-
** Hadoop: The Definitive Guide.  http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520
 
-
* Quiz 1 (Map Reduce) assigned -- check http://www.newgradiance.com/services
+
* Quiz 1 (Map Reduce) -- check http://www.newgradiance.com/services
 +
* Lab: Hands-on Hadoop (local and AWS)
-
== Week 6 - Feb 29: Algorithm Design for MapReduce: Relational Operations ==
+
== Week 6 - Feb 29: MapReduce Algorithm Design Patterns ==
-
 
+
-
* Lecture notes: 
+
-
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/mapreduce-algo-design-relations.pdf
+
-
 
+
-
* Lab: Hands-on Hadoop (local)
+
-
 
+
-
* Required reading:
+
-
** Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
+
-
** Mining of Massive Datasets (2nd Edition), Chapter 2.
+
 +
* Lab: Hands-on Hadoop (HPC)
* Programming assignment: Map Reduce (check NYU Classes)
* Programming assignment: Map Reduce (check NYU Classes)
-
== Week 7 - March 7: MapReduce Algorithm Design Patterns; Parallel Databases vs MapReduce ==  
+
== Week 7 - March 7: Parallel Databases vs MapReduce; Introduction to SPARK==  
-
 
+
-
* Lecture notes:
+
-
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/mapreduce-algo-design-patterns.pdf
+
-
 
+
-
* Lab: Hands-on Hadoop on AWS
+
-
** Lab materials: http://bigdata.poly.edu/~tuananh/files/awscli-examples.zip
+
-
** Install aws command-line interface: http://docs.aws.amazon.com/AWSEC2/latest/CommandLineReference/set-up-ec2-cli-linux.html
+
-
 
+
-
* Some links to AWS CLI documentation:
+
-
** http://docs.aws.amazon.com/AWSEC2/latest/CommandLineReference/set-up-ec2-cli-linux.html
+
-
** http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-set-up.html
+
-
** http://www.linux.com/learn/tutorials/761430-an-introduction-to-the-aws-command-line-tool
+
-
**EMR Through Commandline: https://www.safaribooksonline.com/library/view/programming-elastic-mapreduce/9781449364038/ch04.html
+
-
** Importing Key: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html#how-to-generate-your-own-key-and-import-it-to-aws
+
-
** EMR Job Flow: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/EMR_CreateJobFlow.html
+
-
 
+
-
 
+
-
* Required reading:
+
-
** Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
+
-
** Data-Intensive Text Processing with MapReduce (Jan 27, 2013), Chapter 6 -- Processing Relational Data (this chapter appears in the 2013 version of the textbook -- http://lintool.github.io/MapReduceAlgorithms/ed1n/MapReduce-algorithms.pdf)
+
 +
* Lab: Hands-on SPARK (HPC)
* Programming assignment: check NYU Classes on March 10th
* Programming assignment: check NYU Classes on March 10th

Revision as of 12:48, 23 January 2016

Contents

DS-GA 1004- Big Data: Tentative Schedule -- subject to change

  • TAs:
    • Yuan Feng
    • Kevin Ye
  • Lecture: Mondays, 4:55pm-7:35pm at 19 University Pl., room 102.
  • Some classes will include a lab session, please always bring your laptop.

News

Week 1 - Jan 25: Course Overview; Lab: Computing infrastructure for the course

Week 2 - Feb 1: The evolution of Data Management and introduction to Big Data; Introduction to Databases, Relational Model and SQL

  • In-class assignment: relational algebra

Week 3 - Feb 8: Introduction to Databases, Relational Model and SQL (cont.)

  • Lab: SQL
  • Programming assignment: Using SQL for data analysis and cleaning

Week 4 - Feb 15: Holiday

Big Data Foundations and Infrastructure (3 weeks)

Week 5 - Feb 22: Introduction to Map Reduce

  • Lab: Hands-on Hadoop (local and AWS)

Week 6 - Feb 29: MapReduce Algorithm Design Patterns

  • Lab: Hands-on Hadoop (HPC)
  • Programming assignment: Map Reduce (check NYU Classes)

Week 7 - March 7: Parallel Databases vs MapReduce; Introduction to SPARK

  • Lab: Hands-on SPARK (HPC)
  • Programming assignment: check NYU Classes on March 10th

Week 8 -- March 14th: Spring Break

Transparency and Reproducibility (1 week)

Week 9 - March 21: Data Exploration and Reproducibility

  • Programming assignment 4: Exploring urban data (see NYU Classes)

Big Data Algorithms, Mining Techniques, and Visualization (6 weeks)

Week 10 - March 28th: Finding similar items

  • Homework Assignment
    • See quizzes on Gradiance -- Distance measures and document similarity.

Week 11 - April 4th: Association Rules


  • Suggested additional reading:
    • Fast algorithms for mining association rules, Agrawal and Srikant, VLDB 1994.
    • Data Mining Concepts and Techniques, Jiawei Han and Micheline Kamber, Morgan Kaufmann
    • Dynamic Itemset Counting and Implication Rules for Market Basket Data. Brin et al., SIGMOD 1997. http://www-db.stanford.edu/~sergey/dic.html
  • Homework Assignment
    • See quizes on Gradiance -- Distance measures and document similarity.

Week 12 - April 11th: Visualization and Spatio-Temporal Data -- Invited lecture by Dr. Harish Doraiswamy (NYU CUSP)

Week 13 - April 18th: Parallel Databases

Week 14 - April 25th: Graph Analysis

  • Required Reading: Data-Intensive Text Processing with MapReduce. Chapters 5 -- Graph Algorithms

Week 15 - May 2: Final Exam

Week 16 - May 9: Project Presentations

Week 17 - May 16: Project Presentations

Personal tools