CS6093/Selected Papers and Topics

From VistrailsWiki
Jump to navigation Jump to search

Provenance and Databases

  • Peter Buneman, Sanjeev Khanna, Wang Chiew Tan: Why and Where: A Characterization of Data Provenance. ICDT 2001: 316-330


  • A. Das Sarma, M. Theobald, and J. Widom. LIVE: A Lineage-Supported Versioned DBMS. Proceedings of the 22nd International Conference on Scientific and Statistical Database Management, Heidelberg, Germany, June 2010.


  • Total Recall | Oracle Database


Additional Suggested Reading:

Graph Indexing

  • Answering pattern match queries in large graph databases via graph embedding

Lei Zou, Lei Chen, M. Tamer Özsu and Dongyan Zhao http://vgc.poly.edu/~juliana/courses/cs6093/Readings/graph-matching-vldbj2011

  • Chenghui Ren, Eric Lo, Ben Kao, Xinjie Zhu, Reynold Cheng: On Querying Historical Evolving Graph Sequences. PVLDB 4(11): 726-737 (2011)


Provenance Applications: Reproducible Publications

- papers from challenge

Web Schema Matching and Integration

NoSQL Databases

  • Intro to Hadoop (TBD)
  • Automatic optimization for MapReduce programs. Eaman Jahani, Michael J. Cafarella, Christopher Ré. .PVLDB, 2011.


  • Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011


  • Scalable SQL and NoSQL Data Stores Rick Cattel, SIGMOD Record 2011. (overview of current data stores)


Additional suggested reading:

Relational Data on the Large

  • Swoosh: a generic approach to entity resolution Omar Benjelloun, Hector Garcia-Molina, David Menestrina, Qi Su, Steven Euijong Whang and Jennifer Widom


  • Automatically incorporating new sources in keyword search-based data integration. Talukdar et al, SIGMOD 2010


  • Discovering data quality rules. Chiang and Miller. PVLDB 2008


  • Data cleaning: Problems and current approaches. Rahm, IEEE DEB 2000.


Deep Web

  • Google's Deep Web crawl. Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, Alon Y. Halevy. PVLDB 1(2): 1241-1252 (2008)

Information Extraction

  • Efficiently Incorporating User Feedback into Information Extraction and Integration Programs. Chai et al., SIGMOD 2009


Using and Analyzing Social Networking Data