Difference between revisions of "Course: Big Data Analysis"

From VistrailsWiki
Jump to navigation Jump to search
Line 39: Line 39:
* "NewSQL" stores: more on [http://hive.apache.org/ Hive], [http://voltdb.com/ VoltDB], [http://db.cs.yale.edu/hadoopdb/hadoopdb.html HadoopDB],
* "NewSQL" stores: more on [http://hive.apache.org/ Hive], [http://voltdb.com/ VoltDB], [http://db.cs.yale.edu/hadoopdb/hadoopdb.html HadoopDB],
* Beyond MapReduce: [http://spark-project.org/ Berkeley's Spark], [http://asterix.ics.uci.edu/ UC Irvine's Asterix], Google's [http://code.google.com/p/dremel/ Dremel]
* Beyond MapReduce: [http://spark-project.org/ Berkeley's Spark], [http://asterix.ics.uci.edu/ UC Irvine's Asterix], Google's [http://code.google.com/p/dremel/ Dremel]
=== Readings ===
* [http://research.google.com/archive/bigtable.html Bigtable: A Distributed Storage System for Structured Data]
* [http://cs-www.cs.yale.edu/homes/dna/papers/hadoopdb.pdf HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads]
* [http://cs-www.cs.yale.edu/homes/dna/papers/hstore-cc.pdf Low Overhead Concurrency Control for Partitioned Main Memory Databases]
* [http://asterix.ics.uci.edu/pub/ASTERIX-DPD-2011.pdf ASTERIX: Towards a Scalable, Semistructured Data Platform for Evolving-World Models.]
* [http://research.google.com/pubs/pub36632.html Dremel: Interactive Analysis of Web-Scale Datasets]
* [http://research.google.com/pubs/pub36726.html Large-scale Incremental Processing Using Distributed Transactions and Notifications]


== Week 4:  Monday Oct. 1st - Statistics is easy - Invited Speaker: Dennis Shasha ==
== Week 4:  Monday Oct. 1st - Statistics is easy - Invited Speaker: Dennis Shasha ==
Line 49: Line 57:
* JF: add references for issues related to stats and big data  
* JF: add references for issues related to stats and big data  


=== Readings ===
* [http://research.google.com/archive/bigtable.html Bigtable: A Distributed Storage System for Structured Data]
* [http://cs-www.cs.yale.edu/homes/dna/papers/hadoopdb.pdf HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads]
* [http://cs-www.cs.yale.edu/homes/dna/papers/hstore-cc.pdf Low Overhead Concurrency Control for Partitioned Main Memory Databases]
* [http://asterix.ics.uci.edu/pub/ASTERIX-DPD-2011.pdf ASTERIX: Towards a Scalable, Semistructured Data Platform for Evolving-World Models.]
* [http://research.google.com/pubs/pub36632.html Dremel: Interactive Analysis of Web-Scale Datasets]
* [http://research.google.com/pubs/pub36726.html Large-scale Incremental Processing Using Distributed Transactions and Notifications]


== Week 5: Monday Oct. 8st - Finding Similar Items ==
== Week 5: Monday Oct. 8st - Finding Similar Items ==

Revision as of 20:48, 6 September 2012

Make sure to check my.poly.edu for course announcements

Week 1: Monday Sept. 10th - Course Overview

  • Course overview (First day of classes!)
  • Student survey
  • Introduction to Big Data

Readings

Week 2: Monday Sept. 17th - Map-Reduce

Readings

Week 3: Monday Sept. 24th - Databases and Big Data

Readings

Week 4: Monday Oct. 1st - Statistics is easy - Invited Speaker: Dennis Shasha

Readings


Week 5: Monday Oct. 8st - Finding Similar Items

  • Overview of information integration

Readings

Week 6: Monday Oct. 15st - Invited Speaker: Torsten Suel

  • Reading: inverted index and crawling (Lin chapter 4)
  • Ask Torsten (tentative, ask him for reading material)

Readings

Week 7: Monday Oct. 22st - Invited Speakers: Claudio Silva and Lauro Lins

  • Introduction to Visualization; Data stewardship and provenance
  • Guest lecture by Claudio Silva and Lauro Lins

Readings

  • Hellerstein (ask Claudio for additional references)
  • ADD: provenance and reproducibility

Week 8: Monday Oct. 29th - Graph Analysis

  • Graph algorithms, link analysis, social networks

Readings

  • Data-Intensive Text Processing with MapReduce, Chapter 4


Week 9: Monday Nov. 5th - Frequent Itemsets

Reading

  • Mining of Massive Datasets, Chapter 6


Week 10: Monday Nov. 12th - Mining Data Streams =

Readings

  • Mining of Massive Datasets, Chapter 4


Week 11: Monday Nov. 19th - Clustering

Readings

  • Mining of Massive Datasets, Chapter 7

Week 12: Monday Nov. 26th - Recommendation Systems

Readings

  • Mining of Massive Datasets, Chapter 9

Week 13 Monday Dec. 3rd - EM algorithms for text processing

  • Data-Intensive Text Processing with MapReduce, Chapter 6

Week 14: Monday Dec. 10th - Project presentation

Further Readings