Difference between revisions of "Reading List"

From VistrailsWiki
Jump to navigation Jump to search
(Blanked the page)
(One intermediate revision by the same user not shown)
Line 1: Line 1:
== Provenance ==

=== Overview ===
* [http://www.cs.utah.edu/~juliana/pub/freire-tutorial-sigmod2008.pdf Provenance and Scientific Workflows: Challenges and Opportunities] Susan Davidson and Juliana Freire. In Proceedings of ACM SIGMOD International Conference on Management of Data, 2008. [http://www.vistrails.org/index.php?title=Tutorials/SIGMOD2008 Tutorial resources]
* [http://www.cs.utah.edu/~juliana/pub/freire-cise2008.pdf Provenance for Computational Tasks: A Survey] Juliana Freire, David Koop, Emanuele Santos, and Claudio T. Silva. In IEEE Computing in Science & Engineering, 2008.
* [http://homepages.inf.ed.ac.uk/rbose/pubs/bose_2005_ACM_CS.pdf Lineage retrieval for scientific data processing:a survey] R. Bose and J. Frew.  ACM Computing Surveys,37(1):1-28,2005.
* [http://www.soe.ucsc.edu/~wctan/papers/2007/pdb-ieee.pdf Provenance in Databases: Past, Current, and Future] W. Tan.  IEEE Data Engineering Bulletin.
* [http://www.sigmod.org/sigmod/record/issues/0509/p31-special-sw-section-5.pdf  A survey of data provenance in e-science], Yogesh L. Simmhan, Beth Plale, Dennis Gannon, SIGMOD Record, September, 2005.
=== Provenance in Databases===
* [http://www.soe.ucsc.edu/~wctan/papers/2007/pdb-ieee.pdf Provenance in Databases: Past, Current, and Future] W. Tan. IEEE Data Engineering Bulletin. (short overview)
* [http://www.soe.ucsc.edu/~wctan/papers/2008/curateddbs.pdf Curated Databases] W. Tan,  P. Buneman, J. Cheney, S. Vansumerren. ACM Symposium on Principles of Database Systems (PODS), 2008.
* [http://arxiv.org/abs/0812.0564 Provenance as Dependency Analysis] James Cheney, Amal Ahmed, Umut A. Acar. DBPL 2007: 138-152
* [http://www.soe.ucsc.edu/~wctan/papers/2007/DBProvenance.ppt Database Provenance Tutorial] W. Tan and P. Buneman
=== Provenance Management: Storage, Indexing and Querying  ===
* [http://www.cs.utah.edu/~juliana/pub/tvcg-analogy2007.pdf  Querying and Creating Visualizations by Analogy.] Carlos E. Scheidegger, Huy T. Vo, David Koop, Juliana Freire and Claudio T. Silva. IEEE Transactions on Visualization and Computer Graphics, 13(6), pp. 1560-1567, 2007. Best paper in IEEE Visualization 2007.
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/chapman-sigmod2008.pdf Efficient Provenance Storage Adriane Chapman.] H .V. Jagadish and Prakash Ramanan. SIGMOD 2008.
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/heinis-sigmod2008.pdf Efficient lineage tracking for scientific workflows.] Thomas Heinis, Gustavo Alonso. SIGMOD Conference 2008: 1007-1018
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/bitton-icde2008.pdf Querying and Managing Provenance through User Views in Scientific Workflows.] Olivier Biton, Sarah Cohen Boulakia, Susan B. Davidson, Carmem S. Hara. ICDE 2008: 1072-1081
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/beeri-vldb06.pdf Querying Business Processes.] Catriel Beeri, Anat Eyal, Simon Kamenkovich, Tova Milo.  VLDB 2006: 343-354
===  Provenance/Workflow/Graph Indexing ===
* [http://www.cs.nyu.edu/shasha/papers/graphgrep/pods2002.pdf Algorithmics and Applications of Tree and Graph Searching] D. Shasha, J. T. L. Wang, and R. Giugno.  PODS 2002.
* [http://www.vldb.org/conf/2007/papers/research/p938-zhao.pdf Graph Indexing: Tree + Delta >= Graph] P. Zhao, J. X. Yu, and P. S. Yu.  VLDB 2007.
* [http://www.cs.ucsb.edu/~dbl/papers/he_icde_2006.pdf Closure-Tree: An Index Structure for Graph Queries] H. He and A. K. Singh.  ICDE 2006.
Additional papers:
* [http://ieeexplore.ieee.org/Xplore/login.jsp?url=/iel5/34/20643/00954600.pdf?arnumber=954600 Efficient Matching and Indexing of Graph Models in Content-Based Retrieval] by Stefano Berretti , Alberto Del Bimbo , Enrico Vicario, IEEE TPAMI 2001 
* [http://portal.acm.org/citation.cfm?id=844380.844749 Computing Frequent Graph Patterns from Semistructured Data] by N. Vanetik , E. Gudes , S. E. Shimony ICDM 2002
* [http://www.xifengyan.net/papers/sigmod04_gindex.pdf Graph indexing based on discriminative frequent structure analysis] by Xifeng Yan, Philip S. Yu, Jiawei Han TODS 2004
* [http://www.xifengyan.net/papers/sigmod04_gindex.pdf Graph Indexing: A Frequent Structure­based Approach] by Xifeng Yan, Philip S. Yu, Jiawei Han SIGMOD 2004
* [http://www.cs.unc.edu/~weiwang/paper/ICDE07_1.pdf Graph Database Indexing Using Structured Graph Decomposition] by David W. Williams, Jun Huan, Wei Wang ICDE 2007
* [http://www.vldb.org/conf/2007/papers/research/p926-chen.pdf Towards graph containment search and indexing] by Chen Chen , Xifeng Yan , Philip S. Yu , Jiawei Han , Dong-Qing Zhang , Xiaohui Gu, VLDB 2007
* [http://vorlon.case.edu/~jiong/research/papers/67.pdf Treepi: A novel graph indexing method] by S Zhang, M Hu, J Yang ICDE 2007
* [http://www.leizou.net/papers/zlDASFAA08.pdf Summarization Graph Indexing: Beyond Frequent Structure-based Approach] by Lei Zou, Lei Chen, Huaming Zhang, Yansheng Lu, and Qiang Lou
* [http://www.info.uqam.ca/Members/mili_h/Enseignement/inf980x-aut08/graph_indexing.pdf Graph Indexing Algorithms]
Video lecture:
* [http://videolectures.net/mlg07_han_miasg/ Mining, Indexing, and Searching Graphs in Large Data Sets] by Jiawei Han Nature 2007
== Provenance Mining  ==
* [http://www.cs.utah.edu/~juliana/pub/tvcg-recommendation2008.pdf VisComplete: Automating Suggestions for Visualization Pipelines.] David Koop, Carlos E. Scheidegger, Steven P. Callahan, Huy T. Vo, Juliana Freire and Claudio T. Silva. In IEEE Transactions on Visualization and Computer Graphics, 14(6), pp. 1691-1698, 2008.
* [http://www.cs.utah.edu/~juliana/pub/cluster-ipaw2008.pdf A First Study on Clustering Collections of Workflow Graphs] E. Santos, L. Lins, J. P. Ahrens, J. Freire, C. Silva. In Proceedings of IPAW, pp. 160-173, 2008
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/medeiros-bpm2008.pdf Process Mining Based on Clustering: A Quest for Precision.] A.K. Alves de Medeiros, A. Guzzo, G. Greco, W.M.P. van der Aalst, A.J.M.M. Weijters, B. van Dongen, and D. Saccà. In A. ter Hofstede, B. Benatallah, and H.-Y. Paik, editors, BPM 2007 Workshops, LNCS 4928: 17–29, 2008.
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/greco-tkde2005.pdf Mining and Reasoning on Workflows] Greco et al. TKDE2005
===  Provenance Applications: Publications ===
* [http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4720217&isnumber=4720211 Reproducible Research] Fomel, Sergey; Claerbout, Jon F. CiSE Volume: 11 Issue: 1 Date: Jan.-Feb. 2009 Page(s): 5-7 Digital Object Identifier 10.1109/MCSE.2009.14
* [http://www.bepress.com/bioconductor/paper3 Reproducible Research: A Bioinformatics Case Study] Robert Gentleman.  Bioconductor Project Working Papers. Working Paper 3. (May 2004).
* [http://gking.harvard.edu/files/dvn.pdf An Introduction to the Dataverse Network as an Infrastructure for Data Sharing.] Gary King. Sociological Methods and Research. Vol. 32, No. 2 (November, 2007): Pp. 173--199,
== Provenance: Security and Privacy ==
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/braun-hotsec2008.pdf Securing provenance.]  Braun, A. Shinnar, and M. Seltzer.  In HotSec’08, 2008.
* [http://www.ragibhasan.com/publications/papers/hasan-fast2009-provenance.pdf The Case of the Fake Picasso: Preventing History Forgery with Secure Provenance], Ragib Hasan, Radu Sion, and Marianne Winslett, USENIX FAST 2009
* [http://www.ragibhasan.com/publications/papers/storagess2007-rhasan.pdf Introducing Secure Provenance: Problems and Challenges], Ragib Hasan, Radu Sion, Marianne Winslett, in ACM StorageSS 2007.
* [http://www.ragibhasan.com/research/provenance.html Secure Provenance Project at UIUC]
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/cirillo-esop2008.pdf TAPIDO: Trust and Authorization via Provenance and Integrity in Distributed Objects.] A. Cirillo, R. Jagadeesan, C. Pitcher, and J. Riely. In European Symposium on Programming (ESOP), Lecture Notes in Computer Science, Springer, 2008.
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/vaughan-csf2008 Evidence-Based Audit.] Jeffrey A. Vaughan, Limin Jia, Karl Mazurak, Steve Zdancewic. CSF 2008: 177-191
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/swamy-ieeessp2008 SELinks: End to end security for Web applications.] Hicks, Swamy, and Corcoran. [http://www.cs.umd.edu/projects/PL/selinks/ Project Web Site]
== Data on the Web ==
===Web Schema Matching and Integration ===
* [http://portal.acm.org/citation.cfm?id=1007582 An interactive clustering-based approach to integrating source query interfaces on the deep Web] Wensheng Wu, Clement Yu, AnHai Doan, Weiyi Meng, SIGMOD 2004
* [http://portal.acm.org/citation.cfm?id=1132863.1132872&coll=GUIDE&dl=GUIDE Automatic complex schema matching across Web query interfaces] Bin He, Kevin Chuan Chang, ACM Trans. Database Syst. 2006
* [http://www.cidrdb.org/cidr2007/papers/cidr07p40.pdf Web-scale Data Integration: You can only afford to Pay As You Go] Jayant Madhavan, Shawn R. Jeffery, Shirley Cohen, Xin (Luna) Dong, David Ko, Cong Yu, Alon Halevy. CIDRDB 2007
* [http://www.vldb.org/conf/2007/papers/research/p687-dong.pdf Data Integration with Uncertainty] Xin Dong, Alon Y. Halevy, Cong Yu. VLDB 2007
Additional: papers
* [http://portal.acm.org/citation.cfm?id=767154 A survey of approaches to automatic schema matching] Rahm Erhard and Bernstein Philip,  VLDB 2001
* [http://www.dit.unitn.it/~p2p/RelatedWork/Matching/JoDS-IV-2005_SurveyMatching-SE.pdf A Survey of Schema-based Matching Approaches] Pavel Shvaiko1 and Jerome Euzenat2,  JoDS 2005
* [http://delivery.acm.org/10.1145/1230000/1228269/p2-gal.pdf?key1=1228269&key2=7951094321&coll=GUIDE&dl=GUIDE&CFID=22967505&CFTOKEN=83376470 Why is schema matching tough and what can we do about it?] Avigdor Gal. ACM SIGMOD Record
* [http://citeseerx.ist.psu.edu/viewdoc/summary?doi= Wise-integrator: An automatic integrator of web search interfaces for e-commerce.] Hai He and Weiyi Meng. VLDB 2003
* [http://www.springerlink.com/content/7g5n755552m9n356/fulltext.pdf Holistic query interface matching using parallel schema matching. ] W. Su, J. Wang, and F. Lochovsky. ICDE '06
* [http://www.dit.unitn.it/~p2p/RelatedWork/Matching/corpus-icde05.pdf Corpus-based schema matching.] Jayant Madhavan, Philip A. Bernstein, Anhai Doan, Alon Halevy. ICDE 05
* [http://ieeexplore.ieee.org/iel5/10810/34089/01623841.pdf A Robust Approach to Schema Matching overWeb Query Interfaces] Jin Pei, Jun Hong, David Bell. ICDE 06
* [http://eagle.cs.uiuc.edu/pubs/2003/unifiedschema-sigmod03-hc-mar03.pdf Statistical Schema Matching across Web Query Interfaces] Bin He, Kevin Chen-Chuan Chang, SIGMOD 2003
* [http://www.dit.unitn.it/~accord/RelatedWork/Matching/RS10P4.pdf Instance-based Schema Matching for Web Databases by Domain-specific Query Probing], J Wang, VLDB04
* [http://www.cs.binghamton.edu/~meng/pub.d/ICDE2006_06203.pdf Merging Source Query Interfaces on Web Databases], Eduard Dragut, ICDE06
Additional papers on Dataspaces:
* [http://portal.acm.org/citation.cfm?id=1107499.1107502 From databases to dataspaces: a new abstraction for information management] by Michael Franklin, Alon Halevy, David Maier, SIGMOD 2005
* [http://portal.acm.org/citation.cfm?id=1454159.1454217 A first tutorial on dataspaces] by Michael Franklin, Alon Halevy, David Maier, VLDB 2008
=== Relational data on the Web ===
* [http://fleixeiras.cs.utah.edu/researchTopics/images/e/e7/Webtables-vldb08.pdf WebTables: exploring the power of tables on the web. ] Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wang, Eugene Wu, Yang Zhang: PVLDB 1(1): 538-549 (2008)
* [http://fleixeiras.cs.utah.edu/researchTopics/images/0/0a/Relweb-webdb08.pdf Uncovering the Relational Web. ] Michael J. Cafarella, Alon Y. Halevy, Yang Zhang, Daisy Zhe Wang, Eugene Wu. WebDB 2008
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/dasu-sigmod2002 Mining database structure; or, how to build a data quality browser.] Tamraparni Dasu, Theodore Johnson, S. Muthukrishnan, Vladislav Shkapenyuk. SIGMOD 2002
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/andritsos-sigmod2004.pdf] Information-theoretic tools for mining database structure from large data sets. Periklis Andritsos, Renee J. Miller and Panayiotis Tsaparas. SIGMOD 2004
Additional Papers:
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/elmargamid-tkde2007.pdf Duplicate Record Detection: A Survey.] Ahmed K. Elmagarmid, Panagiotis G. Ipeirotis, Vassilios S. Verykios. IEEE TKDE, 2007
* [http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=655802&isnumber=14296 Efficient Discovery of Functional and Approximate Dependencies Using Partitions] Yka Huhtala, Juha Karkkainen, Pasi Porkka, and Hannu Toivonen. In Proc. IEEE Intl. conf. on Data Engineering, 1998.
* [http://rakesh.agrawal-family.com/papers/sigmod93assoc.pdf Mining Association Rules between Sets of Items in Large Databases] Rakesh Agrawal, Tomasz Imielinski, Arun Swami. SIGMOD 1993
* [http://www.cs.toronto.edu/~periklis/pubs/edbt04.pdf LIMBO: Scalable Clustering of Categorical Data]Periklis Andritsos, Panayiotis Tsaparas, Ren´ee J. Miller, and Kenneth C. Sevcik. In EDBT 2004.
=== Data integration on the fly (or almost...) ===
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/franklin-sigmodrecord2005.pdf From databases to dataspaces: a new abstraction for information management.] Michael Franklin, Alon Halevy, David Maier. Sigmod Record, 2005
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/dong-sigmod2007 Indexing dataspaces.] Xin Dong and Alon Halevy. SIGMOD 2007.
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/jeffery-sigmod2008.pdf Pay-as-you-go user feedback for dataspace systems.] Shawn R. Jeffery, Michael J. Franklin, Alon Y. Halevy. SIGMOD Conference 2008: 847-860
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/dassarma-sigmod2008.pdf Bootstrapping pay-as-you-go data integration systems.] Anish Das Sarma, Xin Dong, Alon Y. Halevy, SIGMOD Conference 2008: 861-874.
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/derose-icde2008.pdf Building Community Wikipedias: A Human-Machine Approach.] P. DeRose, X. Chai, B. Gao, W. Shen, A. Doan, P. Bohannon, J. Zhu. ICDE-08.
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/doan-cidr2009.pdf The Case for a Structured Approach to Managing Unstructured Data.]  A. Doan, J. F. Naughton, A. Baid, X. Chai, F. Chen, T. Chen, E. Chu, P. DeRose, B. Gao, C. Gokhale, J. Huang, W. Shen, B. Vuong. CIDR-09.
=== Usable query interfaces for structured data ===
* [http://db.ucsd.edu/people/vagelis/publications/discover.pdf Discover: keyword search in relational databases.] Vagelis Hristidis, Yannis Papakonstantinou. VLDB 2002.
* [http://www.vldb2005.org/program/paper/wed/p505-kacholia.pdf Bidirectional Expansion For Keyword Search on Graph Databases.] Varun Kacholia, Shashank Pandit, Soumen Chakrabarti, S Sudarshan, Rushi Desai and Hrishikesh Karambelkar, VLDB 2005
* [http://www.cse.iitb.ac.in/~aru/Publications/BanksICDE2002.pdf Keyword Searching and Browsing in databases using BANKS.] Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, S. Sudarshan. ICDE 2002
* [http://www.cs.uic.edu/~fliu1/Sigmod06_Keyword_FangLiu_UIC.pdf Effective keyword search in relational databases.] Liu,, Fang and Yu,, Clement and Meng,, Weiyi and Chowdhury,, Abdur. SIGMOD 2006, pp 563--574.
=== Snippet Generation and Ranking ===
* [http://portal.acm.org/citation.cfm?id=1183703 A system for query-specific document summarization]. Ramakrishna Varadarajan, Vagelis Hristidis. CIKM, 2006
* [http://portal.acm.org/citation.cfm?id=1277741.1277766 Fast generation of result snippets in web search]. Andrew Turpin, Yohannes Tsegay, David Hawking, Hugh E. Williams. ACM SIGIR, 2007 (Ramesh: Will present this)
* [http://portal.acm.org/citation.cfm?id=1060745.1060828 Object-level ranking: bringing order to Web objects]. Zaiqing Nie, Yuanzhi Zhang, Ji-Rong Wen, Wei-Ying Ma. WWW, 2005
* [http://portal.acm.org/citation.cfm?id=1066220 Page quality: in search of an unbiased web ranking]. Junghoo Cho, Sourashis Roy, Robert E. Adams. SIGMOD, 2005
=== The Deep Web ===
* [http://www.cs.cornell.edu/~lucja/Publications/I03.pdf Google's Deep Web crawl.]  Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, Alon Y. Halevy. PVLDB 1(2): 1241-1252 (2008) (*Ramesh will present this)
* [http://www.cs.utah.edu/~juliana/pub/freire-sbbd2004.pdf Siphoning Hidden-Web Data through Keyword-Based Interfaces.] Luciano Barbosa and Juliana Freire. In Proceedings of Brazilian Symposium on Databases (SBBD), 2004. (*Huong will present this)
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/wang-vldb2004.pdf Instance-based schema matching for web databases by domain-specific query probing.] Jiying Wang , Ji-Rong Wen , Fred Lochovsky , Wei-Ying Ma. VLDB 2004
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/wu-icde2006.pdf Query Selection Techniques for Efficient Crawling of Structured Web Sources.] Ping Wu , Ji-Rong Wen , Huan Liu , Wei-Ying Ma. ICDE 2006
=== Information Extraction ===
* [http://www.it.iitb.ac.in/~sunita/papers/ieSurvey.pdf Information extraction] Sunita Sarawagi.  FnT Databases, 1(3), 2008.
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/huang-vldb2008.pdf On the Provenance of Non-Answers to Queries over Extracted Data.] J. Huang, T. Chen, A. Doan, J. Naughton. VLDB-08.
* [http://turing.cs.washington.edu/papers/kdd08.pdf Information Extraction From Wikipedia:  Moving Down the Long Tail] Fei Wu, Raphael Hoffmann, Daniel S. Weld
* [http://turing.cs.washington.edu/papers/aaai08.pdf Intelligence in Wikipedia] Daniel S. Weld, Fei Wu, Eytan Adar
* [http://www.isi.edu/integration/papers/michelson05-ijcai.pdf Semantic annotation of unstructured and ungrammatical text] Matthew Michelson and Craig A. Knoblock. IJCAI 2005
* [http://www.it.iitb.ac.in/~sunita/papers/sigmodRecord08.pdf Domain adaptation of information extraction models.] Rahul Gupta and Sunita Sarawagi. In Sigmod Record, 2008.
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/doan-sigrec2008.pdf Information Extraction Challenges in Managing Unstructured Data.] AnHai Doan et al.  SIGMOD Record, Winter 08, Special Issue on Managing Information Extraction.
* [http://www.cis.upenn.edu/~pereira/papers/crf.pdf Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data] John Lafferty, Andrew McCallum, Fernando Pereira, ICML 2001.
* [http://research.microsoft.com/en-us/um/people/jrwen/jrwen_files/publications/2d-crf.pdf 2D Conditional Random Fields for Web Information Extraction] Jun Zhu,Wei-Ying Ma ICML 2005
* [http://delivery.acm.org/10.1145/1160000/1150457/p494-zhu.pdf?key1=1150457&key2=5287448321&coll=GUIDE&dl=GUIDE&CFID=28958275&CFTOKEN=60944724 Simultaneous record detection and attribute labeling in web data extraction] Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, Wei-Ying Ma. KDD 2006
* [http://www2005.org/cdrom/docs/p76.pdf Web Data Extraction Based on Partial Tree Alignment] Yanhong Zhai, Bing Liu. WWW 2005

Latest revision as of 18:05, 31 January 2012