Difference between revisions of "Course: Big Data Analysis"

From VistrailsWiki
Jump to navigation Jump to search
Line 157: Line 157:
Your assignment is in  http://www.newgradiance.com/services. Please see http://vgc.poly.edu/~juliana/courses/cs9223 for instructions on how to access this service.
Your assignment is in  http://www.newgradiance.com/services. Please see http://vgc.poly.edu/~juliana/courses/cs9223 for instructions on how to access this service.


== Week 11: Monday Nov 18th- Frequent Itemsets ==
== Week 11: Monday Nov 18th- MapReduce Algorithm Design and Graph Processing ==  
** Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/assoc-rules1.pdf, http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/assoc-rules1.pdf
** Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/mapreduce-indexing-graph.pdf
 
=== Homework Assignment ===
Your Mapreduce/Pig assignment is available from Blackboard. '''It is Due December  1st'''.
 


=== Required Reading ===
=== Required Reading ===
* Mining of Massive Datasets, Chapter 6
* Chapter 5, Data-Intensive Text Processing with MapReduce
 
=== Additional Reading ===
* [http://infolab.stanford.edu/pub/papers/google.pdf 1998 PageRank Paper]
* [http://lintool.github.com/MapReduceAlgorithms/MapReduce-book-final.pdf Data-Intensive Text Processing with MapReduce, Chapter 4 (Inverted Indexing for Text Retrieval) and 5(Graph Algorithms)]
* [http://infolab.stanford.edu/~ullman/mmds/ch5.pdf Mining of Massive Datasets, Chapter 5 (Link Analysis)]
* Pregel: A System for Large-Scale Graph Processing. Google. [http://kowshik.github.com/JPregel/pregel_paper.pdf]


=== Homework Assignment ===
'''Due November 24th'''


=== Additional Reading ===
== Week 12: Monday Nov. 25th - Large-Scale Visualization ==
* Mining association rules between sets of items in large databases. Agrawal et al., SIGMOD 1993. http://delivery.acm.org/10.1145/180000/170072/p207-agrawal.pdf?ip=128.238.251.32&acc=ACTIVE%20SERVICE&CFID=198467341&CFTOKEN=23537886&__acm__=1352747519_b80a516e0f5e294b36dc021f13f55bbb
* Fast algorithms for mining association rules. Agrawal and Srikant, VLDB 1994. https://www.seas.upenn.edu/~jstoy/cis650/papers/Apriori.pdf
* An effective hash-based algorithm for mining association rules. Park et al., SIGMOD 1995. http://dl.acm.org/citation.cfm?id=223813


== Week 12: Monday Nov. 25th - Clustering ==
* Invited lectures by:
** Dr. Lauro Lins (AT&T Research)
** Dr. Huy Vo (NYU Center for Urban Science and Progress)


* Lecture notes:  
* Lecture notes:  
** Graph algorithms: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/mapreduce-graph.pdf
** https://www.dropbox.com/s/7t2vqryj5zgs44n/intro-to-visualization.pdf
**Clustering: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/clustering.pdf, http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/form-clustering-icde2007.pdf
** https://www.dropbox.com/s/btb3ocupkmpgefi/nanocubes.pdf
 
 
=== Required Reading ===
The Value of Visualization, Jarke Van Wijk
http://www.win.tue.nl/~vanwijk/vov.pdf


=== Homework Assignment ===
Tamara Munzner's Book draft 2 available online
'''Due Dec 1st'''
http://www.cs.ubc.ca/~tmm/courses/533/book/


=== Readings ===
Nanocubes Paper
* Mining of Massive Datasets, Chapter 7
http://nanocubes.net
* See readings for previous class
http://nanocubes.net/assets/pdf/nanocubes_paper_preprint.pdf
* Web Mining, by Bing Liu. http://www.cs.uic.edu/~liub/WebMiningBook.html
* Information Retrieval. http://nlp.stanford.edu/IR-book/pdf/irbookprint.pdf


== Further Readings ==
=== Additional Reading ===
* [http://www-users.cs.umn.edu/~kumar/papers/anomaly-survey.php Anomaly Detection: A Survey]
imMens Paper (to contrast with nanocubes)
http://vis.stanford.edu/papers/immens




== Week 13: Monday Dec. 2nd - Invited lecture by Enrico Bertini ==
== Week 13: Monday Dec. 2nd - Frequent Itemsets ==
* Introduction to Visual Analytics
** Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/assoc-rules1.pdf, http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/assoc-rules1.pdf
** Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/intro-to-visualization.pdf


=== Readings ===
=== Additional Reading ===
The Value of Visualization. IEEE Visualization 2005. Jarke J. van Wijk. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.78.1138
* Mining association rules between sets of items in large databases. Agrawal et al., SIGMOD 1993. http://delivery.acm.org/10.1145/180000/170072/p207-agrawal.pdf?ip=128.238.251.32&acc=ACTIVE%20SERVICE&CFID=198467341&CFTOKEN=23537886&__acm__=1352747519_b80a516e0f5e294b36dc021f13f55bbb
* Fast algorithms for mining association rules. Agrawal and Srikant, VLDB 1994. https://www.seas.upenn.edu/~jstoy/cis650/papers/Apriori.pdf
* An effective hash-based algorithm for mining association rules. Park et al., SIGMOD 1995. http://dl.acm.org/citation.cfm?id=223813


Visualization Analysis and Design: Principles, Methods, and Practice. Tamara Munzner (Book Draft 2 from Sep. 2012). http://www.cs.ubc.ca/~tmm/courses/533-11/book/vispmp-draft.pdf
=== Homework Assignment ===
'''Due Dec 8th'''




== Week 14: Monday Dec. 9th - - Graph Algorithms ==
== Week 14: Monday Dec. 9th - - Clustering ==


TODO
* Lecture notes:  http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/clustering.pdf, http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/form-clustering-icde2007.pdf


=== Readings ===
=== Readings ===
* [http://infolab.stanford.edu/pub/papers/google.pdf 1998 PageRank Paper]
* Chapter 7 (Clustering), Mining of Massive Data Sets
* [http://lintool.github.com/MapReduceAlgorithms/MapReduce-book-final.pdf Data-Intensive Text Processing with MapReduce, Chapter 4 (Inverted Indexing for Text Retrieval) and 5(Graph Algorithms)]
* [http://infolab.stanford.edu/~ullman/mmds/ch5.pdf Mining of Massive Datasets, Chapter 5 (Link Analysis)]
* Pregel: A System for Large-Scale Graph Processing. Google. [http://kowshik.github.com/JPregel/pregel_paper.pdf]
 


=== Homework Assignment ===
'''Due Dec 15th'''


== Week 15  Monday Dec. 16th -  Final Exam ==
== Week 15  Monday Dec. 16th -  Final Exam ==
Line 222: Line 233:
* [http://vgc.poly.edu/~juliana/pub/freire-cise2008.pdf Provenance for Computational Tasks: A Survey]
* [http://vgc.poly.edu/~juliana/pub/freire-cise2008.pdf Provenance for Computational Tasks: A Survey]
Juliana Freire, David Koop, Emanuele Santos, and Claudio T. Silva. In IEEE Computing in Science & Engineering, 2008.
Juliana Freire, David Koop, Emanuele Santos, and Claudio T. Silva. In IEEE Computing in Science & Engineering, 2008.
== Week 9: Monday Nov 4th - Invited Speaker: Torsten Suel ==
* '''Professor Suel's lecture has been postponed'''
* Big Data and Information Retrieval. Invited lecture by Torsten Suel.
** Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/search-data.pdf

Revision as of 15:21, 22 November 2013

Fall 2013

This schedule is tentative and subject to change

Make sure to check my.poly.edu for course announcements

News

The deadline is Nov 15th. Please make sure that you have your correct name and Poly ID in your Gradiance account.

For frequently asked questions about the course and homework assignments, please check our BigDataAnalysisFAQ.

Week 1: Monday Sept. 9th - Course Overview

Required Reading

Additional References

Week 2: Monday Sept. 16th - Map-Reduce/Hadoop

Assignment

For more details see http://cis.poly.edu/policies.

  • You assignment is due on Sun Sept 29th. Make sure you can login and access my.poly.edu!
  • If you have questions about the assignment, we will hold office hours on Sept 23, 2013 from 2:30-3:30pm at 2 Metrotech, room 10.018

Required Reading

Week 3: Monday Sept. 23rd - Data Management for Big Data

Related Topics

Required Reading

Additional References

Week 4: Monday Sept 30th - Invited lecture by Dr. C. Mohan (IBM)

  • Note that we will meet at a different location: NYU CUSP, 1 Metrotech Center, 19th floor
  • Abstract: This tutorial is targeted at a broad set of database systems and applications people. It is intended to let the attendees better appreciate what is really behind the covers of many of the modern database systems (e.g., NoSQL and NewSQL systems), going beyond the hype associated with these open source and commercial systems. The capabilities and limitations of such systems will be addressed. Modern extensions to decades old relational DBMSs will also be described. Some application case studies will also be presented. An outline of problems for which no adequate solutions exist will be included. Such problems could be fertile grounds for new research work.
  • Presenter: Dr. C. Mohan, IBM Fellow, IBM Almaden Research Center, San Jose, CA 95120, USA.
  • Bio: Dr. C. Mohan has been an IBM researcher for 31 years in the information management area, impacting numerous IBM and non-IBM products, the research community and standards, especially with his invention of the ARIES family of locking and recovery algorithms, and the Presumed Abort commit protocol. This IBM, ACM and IEEE Fellow has also served as the IBM India Chief Scientist. In addition to receiving the ACM SIGMOD Innovation Award, the VLDB 10 Year Best Paper Award and numerous IBM awards, he has been elected to the US and Indian National Academies of Engineering, and has been named an IBM Master Inventor. This distinguished alumnus of IIT Madras received his PhD at the University of Texas at Austin. He is an inventor of 38 patents. He serves on the advisory board of IEEE Spectrum and on the IBM Software Group Architecture Board’s Council. More information can be found in his home page at http://bit.ly/CMohan

Week 5: Monday Oct. 7th - Query Processing on Mapreduce and High-level Languages

Required Reading

Additional References

Week 6: Mon Oct. 14th - Fall Break - No class

Week 6: Wed Oct. 16th - Fall Break - Make-up class


Week 7: Monday Oct. 21st - Invited Speaker: Alberto Lerner

  • Inside MongoDB

Week 8: Monday Oct 28th- Statistics is easy - Invited Speaker: Dennis Shasha

Required Reading


  • We will cover the material planned for "Week 10: Monday Nov. 11th": Finding Similar Items

Week 9: Monday Nov. 4th - Finding Similar Items, Information Integration

Required Reading

Homework Assignment

Due Nov 15th, 2013 Your assignment is in http://www.newgradiance.com/services. Please see http://vgc.poly.edu/~juliana/courses/cs9223 for instructions on how to access this service.

Week 10: Monday Nov. 11th - MapReduce Algorithm Design

Required Reading

  • Chapters 3 and 4 in textbook: Data-Intensive Text Processing with MapReduce, by Lin and Dyer

Homework Assignment

Due Nov 15th, 2013 Your assignment is in http://www.newgradiance.com/services. Please see http://vgc.poly.edu/~juliana/courses/cs9223 for instructions on how to access this service.

Week 11: Monday Nov 18th- MapReduce Algorithm Design and Graph Processing

Homework Assignment

Your Mapreduce/Pig assignment is available from Blackboard. It is Due December 1st.


Required Reading

  • Chapter 5, Data-Intensive Text Processing with MapReduce

Additional Reading


Week 12: Monday Nov. 25th - Large-Scale Visualization

  • Invited lectures by:
    • Dr. Lauro Lins (AT&T Research)
    • Dr. Huy Vo (NYU Center for Urban Science and Progress)


Required Reading

The Value of Visualization, Jarke Van Wijk http://www.win.tue.nl/~vanwijk/vov.pdf

Tamara Munzner's Book draft 2 available online http://www.cs.ubc.ca/~tmm/courses/533/book/

Nanocubes Paper http://nanocubes.net http://nanocubes.net/assets/pdf/nanocubes_paper_preprint.pdf

Additional Reading

imMens Paper (to contrast with nanocubes) http://vis.stanford.edu/papers/immens


Week 13: Monday Dec. 2nd - Frequent Itemsets

Additional Reading

Homework Assignment

Due Dec 8th


Week 14: Monday Dec. 9th - - Clustering

Readings

  • Chapter 7 (Clustering), Mining of Massive Data Sets

Homework Assignment

Due Dec 15th

Week 15 Monday Dec. 16th - Final Exam

Other topics

Provenance

Juliana Freire and Claudio Silva. In Computing in Science and Engineering 14(4): 18-25, 2012.

Juliana Freire, David Koop, Emanuele Santos, and Claudio T. Silva. In IEEE Computing in Science & Engineering, 2008.