Course: Big Data 2014
Jump to navigation
Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
DS-GA 1004/CSCI-GA 2568 Big Data: Tentative Schedule -- subject to change
- Course Web page: http://cs.nyu.edu/courses/spring14/CSCI-GA.2568-001/index.html
- Instructor: Professor Juliana Freire (http://vgc.poly.edu/~juliana/)
- Lecture: Mondays, 7:10pm-9:00pm at Cantor, room 101. Note new location!
- Cantor Film Center (CANTR), 36 E 8th St, New York, NY 10003
- Lab: Thursdays, 7:10pm-8:00pm at CIWW, room 109. Always bring your laptop.
- Warren Weaver Hall (CIWW), 251 Mercer St, New York, NY 10012
News
- The final exam will take place on May 12th.
- We will have our last class on May 19th.
- 4/21/2014: There are two new quizes on gradiance. They are due on 2014-04-28 23:59 PST.
- Homework assignment 4 has been posted: Assignment 4 - Querying with Pig and Mapreduce
- Homework assignment 3 has been posted: Assignment 3 - MapReduce algorithm design
- You can find instructions on how to log into the NYU Hadoop cluster at: http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/MapReduceExample/readme-nyu-hadoop.txt
- I have created a list of frequently-asked questions which I hope will help you with your assignment: Assignment 3 - FAQ
- Your first assignment has been posted and it is due on Feb 17, 2014 5:00 pm. Here are the instructions: http://vistrails.org/index.php/Assignment_1_-_Data_Exploration
- I have sent a test email to the class list. If you have not received the message, make sure to sign up: http://www.cs.nyu.edu/mailman/listinfo/csci_ga_2568_001_sp14
- Starting on Feb 10th, our class will meet at a new location: Cantor 101
- We will have lab on Thu at CIWW, room 109. Bring your laptop!
Background (4 weeks)
Week 1 -- Jan 27: Course Overview; the evolution of Data Management
- Lecture notes: http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/course-overview.pdf
- Reading: Chapter 1 of Mining of Massive Data Sets (version 1.1)
- Course survey: https://docs.google.com/spreadsheet/embeddedform?formkey=dDRoTVcyMnRQUXhFUjl0cFFuTEVER1E6MA
Week 2 -- Feb 3: Introduction to Databases
- Lecture notes: http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/intro-to-db.pdf
- Other useful reading:
- Feb 6: Lab: Data Exploration and Reproducibility ==
- Homework assignment: Assignment 1 - Data Exploration
Week 3 -- Feb 10: Overview: Relational Model and SQL
- Lecture notes:
- Other useful reading:
- Feb 13: Lab: Canceled -- University closed due to snow ==
Week 3.1 -- Feb 17: Holiday
- No class, holiday
- Feb 20 Lab: hands-on SQL
Week 4 -- Feb 24: Overview: Advanced SQL and Query Optimization
- Lecture notes:
- Homework assignment: Assignment 2 - Data Exploration using SQL
Big Data Foundations and Infrastructure (2 weeks)
Week 5 -- Mar 3: Cloud computing, Map Reduce and Hadoop
- Required reading:
- Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
- Mining of Massive Datasets (2nd Edition), Chapter 2 - 2.1 and 2.2 (Large-Scale File Systems and Map-Reduce).
- Other useful reading:
- Hadoop: The Definitive Guide. http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520
- Homework Assignment -- Your first quiz is available on Gradiance. It is due on March 17th at 5pm.
Week 6 -- Mar 10: Algorithm Design for MapReduce
- Required reading:
- Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
- Mining of Massive Datasets (2nd Edition), Chapter 2.
Machine Learning and Big Data (3 weeks)
Week 7 -- Mar 23: Hashing and AllReduce
- Invited lecture by John Langford
- Lecture notes:
- Homework assignment: Assignment 3 - MapReduce algorithm design
Week 8 -- Mar 30: Bandits
- Invited lecture by John Langford
- Lecture notes:
Week 9 -- Apr 7: Large Scale Machine Learning in the Real World
- Invited lecture by Leon Bottou
- Lecture notes:
Big Data Foundations and Infrastructure -- cont. (2 weeks)
Week 10 -- April 14: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages
- Lecture notes:
- Required reading:
- Data-Intensive Text Processing with MapReduce (Jan 27, 2013), Chapter 6 -- Processing Relational Data (this chapter appears in the 2013 version of the textbook -- I have placed this version in http://vgc.poly.edu/~juliana/courses/BigData2014/Textbooks/MapReduce-algorithms-Jan2013-draft.pdf)
- Benchmark DBMS vs MapReduce (2009): http://database.cs.brown.edu/sigmod09/benchmarks-sigmod09.pdf
- MapReduce: A Flexible Data Processing Tool: http://cacm.acm.org/magazines/2010/1/55744-mapreduce-a-flexible-data-processing-tool/fulltext
- Additional reading:
- Pig Latin: A Not-So-Foreign Language for Data Processing: http://pages.cs.brandeis.edu/~olga/cs228/Reading%20List_files/piglatin.pdf
- Hive - A Warehousing Solution Over a Map-Reduce Framework: http://www.vldb.org/pvldb/2/vldb09-938.pdf
Big Data Algorithms and Techniques (3 weeks)
Week 11 -- April 21: Data Management for Big Data (cont) and Association Rules
- Reading: Chapter 6 Mining of Massive Datasets
- Homework Assignment -- Your quiz is available on Gradiance. It is due on April 28th.
Week 12 -- Apr 28: Finding similar items: Invited lecture by Dr. Harish Doraiswami
- Reading: Chapter 3 Mining of Massive Datasets
- Homework Assignment
- There are two new quizes on Gradiance -- Distance measures and document similarity. They due on May 5th.
- Your final assignment is available at http://www.vistrails.org/index.php/Assignment_4_-_Querying_with_Pig_and_Mapreduce. This is an optional assignment and will count towards extra credit
Week 13 -- May 5: Graph Analysis and Exam Review
- Lecture notes:
Week 14 -- May 12: Final Exam
Week 15 -- May 19: Large-Scale Visualization -- Invited lecture by Dr. Lauro Lins (AT&T Research)
- Lecture notes:
- Reading:
The Value of Visualization, Jarke Van Wijk http://www.win.tue.nl/~vanwijk/vov.pdf
Tamara Munzner's Book draft 2 available online http://www.cs.ubc.ca/~tmm/courses/533/book/
Nanocubes Paper http://nanocubes.net http://nanocubes.net/assets/pdf/nanocubes_paper_preprint.pdf