Difference between revisions of "Course: Big Data Analysis"

From VistrailsWiki
Jump to navigation Jump to search
Line 10: Line 10:


* [http://www.nytimes.com/2012/08/12/business/how-big-data-became-so-big-unboxed.html?ref=stevelohr New York Time's "How BigData Became so Big"]
* [http://www.nytimes.com/2012/08/12/business/how-big-data-became-so-big-unboxed.html?ref=stevelohr New York Time's "How BigData Became so Big"]
* [http://www3.weforum.org/docs/WEF_TC_MFS_BigDataBigImpact_Briefing_2012.pdf World Economic Forum: Big Data, Big Impact]
* [http://www.analytics-magazine.org/november-december-2010/54-the-analytics-journey.html The Analytics Journey]
* [http://www.analytics-magazine.org/november-december-2010/54-the-analytics-journey.html The Analytics Journey]



Revision as of 17:30, 2 September 2012

Make sure to check my.poly.edu for course announcements

Week 1: Monday Sept. 10th - Course Overview

  • Course overview (First day of classes!)
  • Student survey
  • Introduction to Big Data

Readings

Week 2: Monday Sept. 17th - Map-Reduce

  • Introduction to map-reduce

Readings

Week 3: Monday Sept. 24th - Statistics is easy

Readings

Week 4: Monday Oct. 1st - Databases and Big Data

  • Databases and Big Data

Readings

  • JF: ADD: NoSQL databases (reading papers from literature)

Column store vs. tuple store. HBase, MongoDB, VaultDB, Cassandra, HadoopDB (Facebook) Overview of different architectures, distributed databases vs. hadoop, transaction support...

Week 5: Monday Oct. 8st - Finding Similar Items

  • Overview of information integration

Readings

  • Mining of Massive Datasets, chapter 3; information integration; entity resolution


Week 6: Monday Oct. 15st - Graph Analysis

  • Graph algorithms, link analysis, social networks

Readings

  • Mining of Massive Datasets, Chapter 5
  • Data-Intensive Text Processing with MapReduce, Chapter 5


Week 7: Monday Oct. 22st - Introduction to Visualization; Data stewardship and provenance

  • Guest lecture by Claudio Silva and Lauro Lins

Readings

  • Hellerstein (ask Claudio for additional references)
  • ADD: provenance and reproducibility


Week 8: Monday Oct. 29th - TBD swap oct 15

  • Reading: inverted index and crawling (Lin chapter 4)
  • Ask Torsten (tentative, ask him for reading material)

Readings

  • Data-Intensive Text Processing with MapReduce, Chapter 4


Week 9: Monday Nov. 12th - Frequent Itemsets

Reading

  • Mining of Massive Datasets, Chapter 6


Week 10: Monday Nov. 5th - Mining Data Streams =

Readings

  • Mining of Massive Datasets, Chapter 4


Week 11: Monday Nov. 19th - Clustering

Readings

  • Mining of Massive Datasets, Chapter 7

Week 12: Monday Nov. 26th - Recommendation Systems

Readings

  • Mining of Massive Datasets, Chapter 9

Week 13 Monday Dec. 3rd - EM algorithms for text processing

  • Data-Intensive Text Processing with MapReduce, Chapter 6

Week 14: Monday Dec. 10th - Project presentation