CS6093/Lectures
Make sure to check my.poly.edu for course announcements
Every week, you must write position papers for the papers in the Required Readings list
Week 1 - Jan 24
- Course overview (First day of classes!)
http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/lecture1.pdf
- Provenance and Workflows
http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/provenance-workflows.pdf
Readings
- Provenance and Scientific Workflows: Challenges and Opportunities Susan Davidson and Juliana Freire. In Proceedings of ACM SIGMOD International Conference on Management of Data, 2008. Tutorial resources
- Provenance for Computational Tasks: A Survey Juliana Freire, David Koop, Emanuele Santos, and Claudio T. Silva. In IEEE Computing in Science & Engineering, 2008.
- Querying and Creating Visualizations by Analogy. Carlos E. Scheidegger, Huy T. Vo, David Koop, Juliana Freire and Claudio T. Silva. IEEE Transactions on Visualization and Computer Graphics, 13(6), pp. 1560-1567, 2007. Best paper in IEEE Visualization 2007.
Week 2 - Jan 31
- Provenance and Workflows (cont.)
http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/provenance-workflows.pdf
- Discussion about literature search
Readings
same as last week
Week 3 - Feb 7
- Information extraction: survey
http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/information-extraction.pdf
Announcements
- The topic winners were: Information Extraction, Deep Web, Relational Data on the Web, Web Schema Matching, NoSQL DB, Provenance in DB, Graph Indexing, Usable query interfaces
- I will email to you preliminary assignments tomorrow
Assignment
- Write a position paper for the article: ONDUX: on-demand unsupervised learning for information extraction
Readings
- A survey of approaches to automatic schema matching Rahm Erhard and Bernstein Philip, VLDB 2001
- A Brief Survey of Web Data Extraction Tools. Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran Soares da Silva, Juliana S. Teixeira: SIGMOD Record 31(2): 84-93 (2002)
- ONDUX: on-demand unsupervised learning for information extraction. Eli Cortez, Altigran Soares da Silva, Marcos André Gonçalves, Edleno Silva de Moura: SIGMOD Conference 2010: 807-818
Some history and perspective:
- Data integration: the teenage years. A. Halevy, A. Rajaraman, J. Ordille. VLDB 2006.
- Generic Schema Matching, Ten Years Later. Philip A. Bernstein, Jayant Madhavan, Erhard Rahm: PVLDB 4(11): 695-701 (2011)
Week 4 - Feb 14
- Provenance and Databases
- Graph Indexing
Assignment
- Write 2 position papers --- one for each of the articles in the required reading for this week (see below)
Required Reading
- Peter Buneman, Sanjeev Khanna, Wang Chiew Tan: Why and Where: A Characterization of Data Provenance. ICDT 2001: 316-330 http://db.cis.upenn.edu/DL/whywhere.pdf
- Presenter: Fernando Seabra
- Rebuttal: Joe Miller (tentative)
- Graph Indexing: Tree + Delta >= Graph P. Zhao, J. X. Yu, and P. S. Yu. VLDB 2007.
- Presenter: Nivan Ferreira
- Rebuttal: Sergey Nepomnyachiy
Additional Suggested Reading
- A. Das Sarma, M. Theobald, and J. Widom. LIVE: A Lineage-Supported Versioned DBMS. Proceedings of the 22nd International Conference on Scientific and Statistical Database Management, Heidelberg, Germany, June 2010.
http://ilpubs.stanford.edu:8090/926/1/versioning-TR.pdf
- Total Recall | Oracle Database
http://www.oracle.com/technetwork/database/focus-areas/storage/total-recall-whitepaper-171749.pdf
- Provenance in Databases: Past, Current, and Future W. Tan. IEEE Data Engineering Bulletin.
- Closure-Tree: An Index Structure for Graph Queries H. He and A. K. Singh. ICDE 2006.
- Answering pattern match queries in large graph databases via graph embedding
Lei Zou, Lei Chen, M. Tamer Özsu and Dongyan Zhao http://vgc.poly.edu/~juliana/courses/cs6093/Readings/graph-matching-vldbj2011
- Chenghui Ren, Eric Lo, Ben Kao, Xinjie Zhu, Reynold Cheng: On Querying Historical Evolving Graph Sequences. PVLDB 4(11): 726-737 (2011)
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/evolving-graphs-vldb11.pdf
- Algorithmics and Applications of Tree and Graph Searching D. Shasha, J. T. L. Wang, and R. Giugno. PODS 2002.
Week 5 - Feb 21
- NoSQL databases
Assignment
- Write a position papers for the required papers
Required Reading
- MapReduce: simplified data processing on large clusters Jeffrey Dean and Sanjay Ghemawat, CACM 2008
- Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf
- Presenters: Dmitriy Gromov,Xiang Liu
- Rebuttal: Fernando Seabra, Shoshana Gottesman
Additional suggested reading
- Debate between MR and DB people:
- SQL databases v. NoSQL databases. Michael Stonebraker, CACM 2010.
- NoSQL Databases. Christof Strauch. 2010.
For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics
Week 6 - Feb 28
Introduction to Visualization. Lecture will be given by Professors Claudio Silva and Lauro Lins
There will be no assignment this week, but I plan to give you a quiz on visualization next week.
Week 7 - March 6
- NoSQL Databases
Assignment
- Write a position papers for the required papers
Required Reading
- HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel J. Abadi, Avi Silberschatz, Alex Rasin. VLDB 2009.
- Efficient Processing of Data Warehousing Queries in a Split Execution Environment. Bajda-Pawlikowsk et al., SIGMOD 2011
- Pig latin: a not-so-foreign language for data processing.C Olston, B Reed, U Srivastava, R Kuma, A. Tomkins. SIGMOD 2008.
- Presenters: Julie Odongo, Majed Hakami, Yuan Ding
- Rebuttal: Nivan Ferreira, Dmitriy Gromov, Juliana Freire
For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics
Week 8 - March 13
Spring break - no class
Week 9 - March 20
TBD
Week 10 - March 27
- Web information integration
Assignment
- Write a position papers for the required papers
Required Reading
- iMAP: Discovering Complex Semantic Matches between Database Schemas. R. Dhamanka, Y. Lee, A. Doan, A. Halevy, and P. Domingos. SIGMOD-2004.
- Automatic complex schema matching across Web query interfaces Bin He, Kevin Chuan Chang, ACM Trans. Database Syst. 2006
- Presenters: Joe Miller, Vineet Meghani
- Rebuttal: Yuan Ding, Chunqing Jiang
Additional Reading
- A survey of approaches to automatic schema matching Rahm Erhard and Bernstein Philip, VLDB 2001
Week 11 - April 3
- Wikipedia
Assignment
- Write a position papers for the required papers
Required Reading
- Information Arbitrage in Multi-Lingual Wikipedia. Adar, E. and Skinner, M. and Weld, D., Second ACM International Conference on Web Search and Data Mining (WSDM'09)
- Yago - A Core of Semantic Knowledge. Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum. 16th international World Wide Web conference (WWW 2007)
- DBpedia - A crystallization point for the Web of Data Bizer et al., Web Semantics 2009.
- Presenters: Sergey Nepomnyachiy, Shoshana Gottesman, Haibo Zeng
- Rebuttal: Wei Jiang, Maneli Kadkhodazadeh, Majed Hakami
Additional Reading
- The YAGO-NAGA Approach to Knowledge Discovery Gjergji Kasneci, Fabian M. Suchanek, Maya Ramanath, Gerhard Weikum SIGMOD Record 37:4, December 2008
- Multilingual Schema Matching for Wikipedia Infoboxes Nguyen et al., VLDB 2012
Week 12 - April 10
- Information extraction
Assignment
- Write a position papers for the required papers
Required Reading
- Optimizing Complex Extraction Programs over Evolving Text Data. F. Chen, B. Gao, A. Doan, J. Yang, R. Ramakrishnan. SIGMOD 2009
- On the Provenance of Non-Answers to Queries over Extracted Data. Huang et al, VLDB 2008
- Information Extraction From Wikipedia: Moving Down the Long Tail Fei Wu, Raphael Hoffmann, Daniel S. Weld
- Presenters: Chunqing Jiang, Bhaktavatsalam Nallanthighal, Sameer More
- Rebuttal: Xiang Liu, May Thazin, Haibo Zeng
Additional Reading
- Information extraction Sunita Sarawagi. FnT Databases, 1(3), 2008.
- Introduction to the Special Issue on Managing Information Extraction Doan et al., SIGMOD Record 2008.
Week 13 - April 17
Assignment
- Write a position papers for the required papers
- Twitter and News: finding entities and trends
Required Reading
- Named Entity Recognition in Tweets: An Experimental Study. EMNLP 2011
- Recognizing Named Entities in Tweets ACL 2011
- Tracking Trends: Incorporating Term Volume into Temporal Topic Models. KDD 2011
- Presenters: Maneli Kadkhodazadeh, Wei Jiang
- Rebuttal: Sameer More, Bhaktavatsalam Nallanthighal, Julie Ondongo
Additional reading
- Unified Analysis of Streaming News WWW 2011
- Transferring Topical Knowledge from Auxiliary Long Texts for Short Text Clustering CIKM 2011
Week 14 - April 24
- Keyword queries over relational data
Assignment
- Write a position papers for the required papers
Required Reading
- Toward Scalable Keyword Search over Relational Data Baid et al., VLDB 2010
- BANKS: Browsing and Keyword Searching in Relational Databases Aditya et al., VLDB 2002
- Presenters: May Thazin, Tehila Minkus
- Rebuttal: Vineet Meghani, Tehila Minkus
Week 15 - May 1
Project presentation