Difference between revisions of "Course: Big Data Analysis"

From VistrailsWiki
Jump to navigation Jump to search
Line 1: Line 1:
'''''Make sure to check my.poly.edu for course announcements'''''
'''''Make sure to check my.poly.edu for course announcements'''''


== Week 1: Monday Sept. 10th --- Course Overview ==
== Week 1: Monday Sept. 10th - Course Overview ==


* Course overview  (First day of classes!)  
* Course overview  (First day of classes!)  
Line 7: Line 7:
* Introduction to Big Data
* Introduction to Big Data


== Week 2:  Monday Sept. 17th --- Map-Reduce ==
== Week 2:  Monday Sept. 17th - Map-Reduce ==


* Introduction to map-reduce
* Introduction to map-reduce
Line 16: Line 16:
* Data-Intensive Text Processing with MapReduce, Chapter 2
* Data-Intensive Text Processing with MapReduce, Chapter 2


== Week 3: Monday Sept. 24th --- Statistics is easy ==
== Week 3: Monday Sept. 24th - Statistics is easy ==


* Guest lecture by [http://cs.nyu.edu/shasha/ Dennis Shasha]
* Guest lecture by [http://cs.nyu.edu/shasha/ Dennis Shasha]
Line 25: Line 25:
* JF: add references for issues related to stats and big data
* JF: add references for issues related to stats and big data


== Week 4:  Monday Oct. 1st -- Databases and Big Data ==
== Week 4:  Monday Oct. 1st - Databases and Big Data ==


* Databases and Big Data
* Databases and Big Data
Line 34: Line 34:
Overview of different architectures, distributed databases vs. hadoop, transaction support...
Overview of different architectures, distributed databases vs. hadoop, transaction support...


== Week 5: Monday Oct. 8st --- Finding Similar Items ==
== Week 5: Monday Oct. 8st - Finding Similar Items ==
* Overview of information integration
* Overview of information integration


Line 41: Line 41:




== Week 6:  Monday Oct. 15st --- Graph Analysis ==
== Week 6:  Monday Oct. 15st - Graph Analysis ==


* Graph algorithms, link analysis, social networks
* Graph algorithms, link analysis, social networks
Line 50: Line 50:




== Week 7:  Monday Oct. 22st --- Introduction to Visualization; Data stewardship and provenance ==
== Week 7:  Monday Oct. 22st - Introduction to Visualization; Data stewardship and provenance ==
* Guest lecture by Claudio Silva and Lauro Lins
* Guest lecture by Claudio Silva and Lauro Lins


Line 58: Line 58:




== Week 8: Monday Oct. 29th --- TBD swap oct 15==  
== Week 8: Monday Oct. 29th - TBD swap oct 15==  
* Reading: inverted index and crawling (Lin chapter 4)
* Reading: inverted index and crawling (Lin chapter 4)
* Ask Torsten (tentative, ask him for reading material)
* Ask Torsten (tentative, ask him for reading material)
Line 66: Line 66:




== Week 9: Monday Nov. 12th --- Frequent Itemsets ==
== Week 9: Monday Nov. 12th - Frequent Itemsets ==


=== Reading ===
=== Reading ===
Line 72: Line 72:




== Week 10: Monday Nov. 5th --- Mining Data Streams ===
== Week 10: Monday Nov. 5th - Mining Data Streams ===


=== Readings ===
=== Readings ===
Line 78: Line 78:




== Week 11: Monday Nov. 19th --- Clustering ==
== Week 11: Monday Nov. 19th - Clustering ==


=== Readings ===
=== Readings ===
* Mining of Massive Datasets, Chapter 7
* Mining of Massive Datasets, Chapter 7


== Week 12: Monday Nov. 26th --- Recommendation Systems ==
== Week 12: Monday Nov. 26th - Recommendation Systems ==


=== Readings ===
=== Readings ===
* Mining of Massive Datasets, Chapter 9
* Mining of Massive Datasets, Chapter 9


== Week 13  Monday Dec. 3rd ---  EM algorithms for text processing===
== Week 13  Monday Dec. 3rd -  EM algorithms for text processing===


* Data-Intensive Text Processing with MapReduce, Chapter 6
* Data-Intensive Text Processing with MapReduce, Chapter 6

Revision as of 00:11, 27 August 2012

Make sure to check my.poly.edu for course announcements

Week 1: Monday Sept. 10th - Course Overview

  • Course overview (First day of classes!)
  • Student survey
  • Introduction to Big Data

Week 2: Monday Sept. 17th - Map-Reduce

  • Introduction to map-reduce

Readings

  • google original paper
  • Mining of Massive Datasets, Chapter 2
  • Data-Intensive Text Processing with MapReduce, Chapter 2

Week 3: Monday Sept. 24th - Statistics is easy

Readings

Week 4: Monday Oct. 1st - Databases and Big Data

  • Databases and Big Data

Readings

  • JF: ADD: NoSQL databases (reading papers from literature)

Column store vs. tuple store. HBase, MongoDB, VaultDB, Cassandra, HadoopDB (Facebook) Overview of different architectures, distributed databases vs. hadoop, transaction support...

Week 5: Monday Oct. 8st - Finding Similar Items

  • Overview of information integration

Readings

  • Mining of Massive Datasets, chapter 3; information integration; entity resolution


Week 6: Monday Oct. 15st - Graph Analysis

  • Graph algorithms, link analysis, social networks

Readings

  • Mining of Massive Datasets, Chapter 5
  • Data-Intensive Text Processing with MapReduce, Chapter 5


Week 7: Monday Oct. 22st - Introduction to Visualization; Data stewardship and provenance

  • Guest lecture by Claudio Silva and Lauro Lins

Readings

  • Hellerstein (ask Claudio for additional references)
  • ADD: provenance and reproducibility


Week 8: Monday Oct. 29th - TBD swap oct 15

  • Reading: inverted index and crawling (Lin chapter 4)
  • Ask Torsten (tentative, ask him for reading material)

Readings

  • Data-Intensive Text Processing with MapReduce, Chapter 4


Week 9: Monday Nov. 12th - Frequent Itemsets

Reading

  • Mining of Massive Datasets, Chapter 6


Week 10: Monday Nov. 5th - Mining Data Streams =

Readings

  • Mining of Massive Datasets, Chapter 4


Week 11: Monday Nov. 19th - Clustering

Readings

  • Mining of Massive Datasets, Chapter 7

Week 12: Monday Nov. 26th - Recommendation Systems

Readings

  • Mining of Massive Datasets, Chapter 9

Week 13 Monday Dec. 3rd - EM algorithms for text processing=

  • Data-Intensive Text Processing with MapReduce, Chapter 6

== Week 14 == Monday Dec. 10th

  • Project presentation