Difference between revisions of "Course: Big Data Analysis"

From VistrailsWiki
Jump to navigation Jump to search
Line 1: Line 1:
== Fall 2013 ==
'''''This schedule is tentative and subject to change'''''
'''''This schedule is tentative and subject to change'''''


Line 6: Line 7:
[http://www.vistrails.org/index.php/Course_Project:_Wikipedia_Analysis Project description]
[http://www.vistrails.org/index.php/Course_Project:_Wikipedia_Analysis Project description]


== Week 1: Monday Sept. 10th - Course Overview ==
== Week 1: Monday Sept. 9th - Course Overview ==


* Course overview and introduction to Big Data Analysis
* Course overview and introduction to Big Data Analysis
* Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/intro.pdf  
* Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/intro.pdf  
* [https://docs.google.com/spreadsheet/viewform?fromEmail=true&formkey=dFdHT3BST2l1TW9KeHYzYjBDaTU0V1E6MQ Student survey] -- to be filled out today!
* [https://docs.google.com/spreadsheet/viewform?fromEmail=true&formkey=dFdHT3BST2l1TW9KeHYzYjBDaTU0V1E6MQ Student survey] -- to be filled out today!
Line 24: Line 25:
* [http://practicalanalytics.wordpress.com/2011/12/12/big-data-analytics-use-cases/ BigData Analytics Usecases]
* [http://practicalanalytics.wordpress.com/2011/12/12/big-data-analytics-use-cases/ BigData Analytics Usecases]


== Week 2:  Monday Sept. 17th - Map-Reduce ==
== Week 2:  Monday Sept. 16th - Map-Reduce/Hadoop ==


* Introduction to Map-Reduce
* Introduction to Map-Reduce and high-level data processing languages
* Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/Hadoop.pdf  
* Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/Hadoop.pdf  
* Introduction to [http://hadoop.apache.org/Hadoop]
* Introduction to [http://hadoop.apache.org/Hadoop]
Line 33: Line 34:
=== Required Reading ===
=== Required Reading ===
* [http://infolab.stanford.edu/~ullman/mmds/ch2.pdf Mining of Massive Datasets, Chapter 2]
* [http://infolab.stanford.edu/~ullman/mmds/ch2.pdf Mining of Massive Datasets, Chapter 2]
* [http://lintool.github.com/MapReduceAlgorithms/MapReduce-book-final.pdf Data-Intensive Text Processing with MapReduce, Chapter 2, Chapter 3]
* [http://lintool.github.com/MapReduceAlgorithms/MapReduce-book-final.pdf Data-Intensive Text Processing with MapReduce, Chapter 2 and Chapter 3]
* [http://research.google.com/archive/mapreduce.html original google map-reduce paper]
* [http://research.google.com/archive/mapreduce.html original google map-reduce paper]


Line 41: Line 42:
* [http://www.vldb.org/pvldb/2/vldb09-938.pdf Hive - A Warehousing Solution Over a Map-Reduce Framework]
* [http://www.vldb.org/pvldb/2/vldb09-938.pdf Hive - A Warehousing Solution Over a Map-Reduce Framework]


== Week 3: Monday Sept. 24th - Databases and Big Data ==
== Week 3: Monday Sept. 23rd - Data Management for Big Data ==


* Databases and Big Data: Persistence, Querying, Indexing, Transactions
* Databases and Big Data: Persistence, Querying, Indexing, Transactions
* Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/paralleldb-vs-hadoop.pdf
* Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/paralleldb-vs-hadoop.pdf
* In-class exercise (to be distributed in class)
* In-class exercise on Map-Reduce (to be distributed in class)


=== Related Topics ===
=== Related Topics ===
Line 68: Line 69:
* [http://research.google.com/pubs/pub36726.html Large-scale Incremental Processing Using Distributed Transactions and Notifications]
* [http://research.google.com/pubs/pub36726.html Large-scale Incremental Processing Using Distributed Transactions and Notifications]


== Week 4:  Monday Oct. 1st - Statistics is easy - Invited Speaker: Dennis Shasha ==
== Week 4:  Monday Sept 30th - Statistics is easy - Invited Speaker: Dennis Shasha ==


* Guest lecture by [http://cs.nyu.edu/shasha/ Dennis Shasha]:  [http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/stateasy.pdf Statistics is Easy]
* Guest lecture by [http://cs.nyu.edu/shasha/ Dennis Shasha]:  [http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/stateasy.pdf Statistics is Easy]

Revision as of 19:55, 3 September 2013

Fall 2013

This schedule is tentative and subject to change

Make sure to check my.poly.edu for course announcements

News

Project description

Week 1: Monday Sept. 9th - Course Overview

Required Reading

Additional References

Week 2: Monday Sept. 16th - Map-Reduce/Hadoop

Required Reading

Additional References

Week 3: Monday Sept. 23rd - Data Management for Big Data

Related Topics

Required Reading

Additional Readings

Week 4: Monday Sept 30th - Statistics is easy - Invited Speaker: Dennis Shasha

Required Reading

Homework Assignment

Due October 9th BigDataHW1

Week 5: Monday Oct. 8st - Finding Similar Items

Required Reading

Homework Assignment

Due October 15th at noon Your assignment is in http://www.newgradiance.com/services. Please see http://vgc.poly.edu/~juliana/courses/cs9223 for instructions on how to access this service.

Week 6: Wednesday Oct. 17th - Invited Speaker: Torsten Suel

Note this class will be held on Wednesday!

Readings

Week 7: Monday Oct. 22st - Invited lecture by and Lauro Lins

Readings

The Value of Visualization. IEEE Visualization 2005. Jarke J. van Wijk. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.78.1138

Visualization Analysis and Design: Principles, Methods, and Practice. Tamara Munzner (Book Draft 2 from Sep. 2012). http://www.cs.ubc.ca/~tmm/courses/533-11/book/vispmp-draft.pdf

Week 8: Monday Oct 29th- Class canceled due to storm

Week 9: Monday Nov 5th- Data infrastructure and information integration

Readings

  • HBase book HBase: The Definitive Guide. Random Access to Your Planet-Size Data: http://shop.oreilly.com/product/0636920014348.do
  • HBase book. Chapter 8 Architecture for information about transactional processing, WriteAhead Log notably, and how consistency is being maintained.

Week 10: Monday Nov. 12th - Frequent Itemsets

Readings

  • Mining of Massive Datasets, Chapter 4

Additional Reading

Week 11: Monday Nov 19th- Algorithms on MapReduce: text processing

Readings

  • Data-Intensive Text Processing with MapReduce, Chapter 4

Week 12: Monday Nov. 26th - Graph Algorithms and Phase-I project presentations

Readings

  • Data-Intensive Text Processing with MapReduce, Chapter 4
  • Pregel: A System for Large-Scale Graph Processing. Google. [3]

Week 13: Monday Dec. 3rd - Clustering

Readings

Week 14: Monday Dec. 10th - EM algorithms for text processing

Readings

  • Data-Intensive Text Processing with MapReduce, Chapter 6


Week 15 Monday Dec. 17 - Phase-II Project presentation

Further Readings

Other topics

Provenance

Juliana Freire and Claudio Silva. In Computing in Science and Engineering 14(4): 18-25, 2012.

Juliana Freire, David Koop, Emanuele Santos, and Claudio T. Silva. In IEEE Computing in Science & Engineering, 2008.