Difference between revisions of "CS6093/Selected Papers and Topics"

From VistrailsWiki
Jump to navigation Jump to search
 
(3 intermediate revisions by the same user not shown)
Line 23: Line 23:
* Answering pattern match queries in large graph databases via graph embedding
* Answering pattern match queries in large graph databases via graph embedding
Lei Zou, Lei Chen, M. Tamer Özsu and Dongyan Zhao
Lei Zou, Lei Chen, M. Tamer Özsu and Dongyan Zhao
[graph-matching-vldbj2011]
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/graph-matching-vldbj2011


* Chenghui Ren, Eric Lo, Ben Kao, Xinjie Zhu, Reynold Cheng: On Querying Historical Evolving Graph Sequences. PVLDB 4(11): 726-737 (2011)
* Chenghui Ren, Eric Lo, Ben Kao, Xinjie Zhu, Reynold Cheng: On Querying Historical Evolving Graph Sequences. PVLDB 4(11): 726-737 (2011)
evolving-graphs-vldb11.pdf
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/evolving-graphs-vldb11.pdf


==  Provenance Applications: Reproducible Publications ==
==  Provenance Applications: Reproducible Publications ==
Line 36: Line 36:
== NoSQL Databases ==  
== NoSQL Databases ==  


* Intro to Hadoop
* Intro to Hadoop (TBD)
* Languages


== Relational Data on the Web ==  
* Automatic optimization for MapReduce programs. Eaman Jahani, Michael J. Cafarella, Christopher Ré. .PVLDB, 2011.
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/jahani-vldb2011.pdf
 
* Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf
 
* Scalable SQL and NoSQL Data Stores Rick Cattel, SIGMOD Record 2011. (overview of current data stores)
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/cattel-sigrec2011.pdf
 
* [http://infolab.stanford.edu/~usriv/papers/pig-latin.pdf Pig latin: a not-so-foreign language for data processing].C Olston, B Reed, U Srivastava, R Kuma, A. Tomkins. SIGMOD 2008.
 
* [http://infolab.stanford.edu/~usriv/papers/pnuts.pdf PNUTS : Yahoo !’ s Hosted Data Serving Platform.] Brian F Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-arno Jacobsen, et al. in Proceedings of the VLDB Endowment (2008).
 
Additional suggested reading:
 
* [http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/publications/2010/SQLvsNoSQLDatabases.pdf SQL databases v. NoSQL databases.] Michael Stonebraker, CACM 2010.
 
* [http://www.christof-strauch.de/nosqldbs.pdf NoSQL Databases.] Christof Strauch. 2010.
 
== Relational Data on the Large ==
 
* [http://fleixeiras.cs.utah.edu/researchTopics/images/e/e7/Webtables-vldb08.pdf WebTables: exploring the power of tables on the web. ] Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wang, Eugene Wu, Yang Zhang: PVLDB 1(1): 538-549 (2008)
 
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/dassarma-sigmod2008.pdf Bootstrapping pay-as-you-go data integration systems.] Anish Das Sarma, Xin Dong, Alon Y. Halevy, SIGMOD Conference 2008: 861-874.
 
 
* Swoosh: a generic approach to entity resolution Omar Benjelloun, Hector Garcia-Molina, David Menestrina, Qi Su, Steven Euijong Whang and Jennifer Widom
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/swoosh-vldbj2009pdf]
 
* Automatically incorporating new sources in keyword search-based data integration. Talukdar et al, SIGMOD 2010
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/ives-sigmod2010pdf]
 
* Discovering data quality rules. Chiang and Miller. PVLDB 2008
[http://vgc.poly.edu/~juliana/courses/cs6093/Readings/chiang-vldb2008.pdf]
 
* Data cleaning: Problems and current approaches. Rahm, IEEE DEB 2000.
http://dc-pubs.dbs.uni-leipzig.de/files/Rahm2000DataCleaningProblemsand.pdf


== Deep Web ==
== Deep Web ==
* [http://www.cs.cornell.edu/~lucja/Publications/I03.pdf Google's Deep Web crawl.]  Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, Alon Y. Halevy. PVLDB 1(2): 1241-1252 (2008)
== Information Extraction ==
* Efficiently Incorporating User Feedback into Information Extraction and Integration Programs. Chai et al., SIGMOD 2009
[http://vgc.poly.edu/~juliana/courses/cs6093/Readings/chai-sigmod2009pdf]
* [http://www.it.iitb.ac.in/~sunita/papers/ieSurvey.pdf Information extraction] Sunita Sarawagi.  FnT Databases, 1(3), 2008.
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/huang-vldb2008.pdf On the Provenance of Non-Answers to Queries over Extracted Data.] J. Huang, T. Chen, A. Doan, J. Naughton. VLDB-08.
* [http://turing.cs.washington.edu/papers/kdd08.pdf Information Extraction From Wikipedia:  Moving Down the Long Tail] Fei Wu, Raphael Hoffmann, Daniel S. Weld


== Using and Analyzing Social Networking Data ==
== Using and Analyzing Social Networking Data ==

Latest revision as of 21:33, 7 February 2012

Provenance and Databases

  • Peter Buneman, Sanjeev Khanna, Wang Chiew Tan: Why and Where: A Characterization of Data Provenance. ICDT 2001: 316-330

http://db.cis.upenn.edu/DL/whywhere.pdf

  • A. Das Sarma, M. Theobald, and J. Widom. LIVE: A Lineage-Supported Versioned DBMS. Proceedings of the 22nd International Conference on Scientific and Statistical Database Management, Heidelberg, Germany, June 2010.

http://ilpubs.stanford.edu:8090/926/1/versioning-TR.pdf

  • Total Recall | Oracle Database

http://www.oracle.com/technetwork/database/focus-areas/storage/total-recall-whitepaper-171749.pdf

Additional Suggested Reading:

Graph Indexing

  • Answering pattern match queries in large graph databases via graph embedding

Lei Zou, Lei Chen, M. Tamer Özsu and Dongyan Zhao http://vgc.poly.edu/~juliana/courses/cs6093/Readings/graph-matching-vldbj2011

  • Chenghui Ren, Eric Lo, Ben Kao, Xinjie Zhu, Reynold Cheng: On Querying Historical Evolving Graph Sequences. PVLDB 4(11): 726-737 (2011)

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/evolving-graphs-vldb11.pdf

Provenance Applications: Reproducible Publications

- papers from challenge

Web Schema Matching and Integration

NoSQL Databases

  • Intro to Hadoop (TBD)
  • Automatic optimization for MapReduce programs. Eaman Jahani, Michael J. Cafarella, Christopher Ré. .PVLDB, 2011.

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/jahani-vldb2011.pdf

  • Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf

  • Scalable SQL and NoSQL Data Stores Rick Cattel, SIGMOD Record 2011. (overview of current data stores)

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/cattel-sigrec2011.pdf

Additional suggested reading:

Relational Data on the Large


  • Swoosh: a generic approach to entity resolution Omar Benjelloun, Hector Garcia-Molina, David Menestrina, Qi Su, Steven Euijong Whang and Jennifer Widom

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/swoosh-vldbj2009pdf]

  • Automatically incorporating new sources in keyword search-based data integration. Talukdar et al, SIGMOD 2010

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/ives-sigmod2010pdf]

  • Discovering data quality rules. Chiang and Miller. PVLDB 2008

[1]

  • Data cleaning: Problems and current approaches. Rahm, IEEE DEB 2000.

http://dc-pubs.dbs.uni-leipzig.de/files/Rahm2000DataCleaningProblemsand.pdf

Deep Web

  • Google's Deep Web crawl. Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, Alon Y. Halevy. PVLDB 1(2): 1241-1252 (2008)


Information Extraction

  • Efficiently Incorporating User Feedback into Information Extraction and Integration Programs. Chai et al., SIGMOD 2009

[2]


Using and Analyzing Social Networking Data