Difference between revisions of "CS6093/Lectures"

From VistrailsWiki
Jump to navigation Jump to search
 
(42 intermediate revisions by the same user not shown)
Line 69: Line 69:


* Peter Buneman, Sanjeev Khanna, Wang Chiew Tan: Why and Where: A Characterization of Data Provenance. ICDT 2001: 316-330 http://db.cis.upenn.edu/DL/whywhere.pdf
* Peter Buneman, Sanjeev Khanna, Wang Chiew Tan: Why and Where: A Characterization of Data Provenance. ICDT 2001: 316-330 http://db.cis.upenn.edu/DL/whywhere.pdf
** Presenter: Fernando Seabra
** Presenter: Fernando Seabra [http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/WhyWherePresentation.pdf Presentation]
** Rebuttal: Joe Miller (tentative)
** Rebuttal: Joe Miller (tentative)


Line 105: Line 105:


===Required Reading ===
===Required Reading ===
* [http://vgc.poly.edu/~juliana/courses/cs6093/Readings/dean-cacm2008.pdf MapReduce: simplified data processing on large clusters] Jeffrey Dean and  Sanjay Ghemawat, CACM 2008


* Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011
* Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf
**Presenters: Dmitriy Gromov [ttp://vgc.poly.edu/~juliana/courses/cs6093/Lectures/MapReducePresentation_DmitriyGromov.pdf Presentation], Xiang Liu
**Rebuttal: Fernando Seabra,  Shoshana Gottesman
=== Additional suggested reading ===


* [http://infolab.stanford.edu/~usriv/papers/pig-latin.pdf Pig latin: a not-so-foreign language for data processing].C Olston, B Reed, U Srivastava, R Kuma, A. Tomkins. SIGMOD 2008.
* Debate between MR and DB people:
**http://cacm.acm.org/magazines/2010/1/55743-mapreduce-and-parallel-dbmss-friends-or-foes/fulltext
**http://cacm.acm.org/magazines/2010/1/55744-mapreduce-a-flexible-data-processing-tool/fulltext


=== Additional suggested reading ===
* http://www.computerworld.com/s/article/9224180/What_s_the_big_deal_about_Hadoop_


* [http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/publications/2010/SQLvsNoSQLDatabases.pdf SQL databases v. NoSQL databases.] Michael Stonebraker, CACM 2010.  
* [http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/publications/2010/SQLvsNoSQLDatabases.pdf SQL databases v. NoSQL databases.] Michael Stonebraker, CACM 2010.  
Line 121: Line 129:
== Week 6 - Feb 28 ==
== Week 6 - Feb 28 ==


TBD
[http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/intro-to-visualization.pdf Introduction to Visualization.]  Lecture will be given by Professors Claudio Silva and Lauro Lins
 
There will be no assignment this week, but I plan to give you a quiz on visualization next week.
 
=== Suggested Reading ===
 
Visualization. Tamara Munzner. Chapter 27, p 675-707, of Fundamentals of Graphics, Third Edition
http://www.cs.ubc.ca/labs/imager/tr/2009/VisChapter/akp-vischapter.pdf
 
Lecture notes. Claudio Silva
http://www.cs.utah.edu/~csilva/courses/cs5630/lec01-notes.pdf


== Week 7 - March 6 ==
== Week 7 - March 6 ==
Line 135: Line 153:


* [http://cs-www.cs.yale.edu/homes/dna/papers/split-execution-hadoopdb.pdf Efficient Processing of Data Warehousing Queries in a Split Execution Environment.]  Bajda-Pawlikowsk et al., SIGMOD 2011
* [http://cs-www.cs.yale.edu/homes/dna/papers/split-execution-hadoopdb.pdf Efficient Processing of Data Warehousing Queries in a Split Execution Environment.]  Bajda-Pawlikowsk et al., SIGMOD 2011
* [http://infolab.stanford.edu/~usriv/papers/pig-latin.pdf Pig latin: a not-so-foreign language for data processing].C Olston, B Reed, U Srivastava, R Kumar, A. Tomkins. SIGMOD 2008.
** Presenters: Julie Odongo, Majed Hakami [http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/majed-hadoopdb.pdf Presentation], Yuan Ding
** Rebuttal:  Nivan Ferreira, Dmitriy Gromov, Juliana Freire


For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics
For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics
Line 157: Line 180:


* [http://portal.acm.org/citation.cfm?id=1132863.1132872&coll=GUIDE&dl=GUIDE Automatic complex schema matching across Web query interfaces] Bin He, Kevin Chuan Chang, ACM Trans. Database Syst. 2006
* [http://portal.acm.org/citation.cfm?id=1132863.1132872&coll=GUIDE&dl=GUIDE Automatic complex schema matching across Web query interfaces] Bin He, Kevin Chuan Chang, ACM Trans. Database Syst. 2006
** Presenters: Joe Miller, Vineet Meghani
** Rebuttal:  Yuan Ding,  Chunqing Jiang


=== Additional Reading ===
=== Additional Reading ===
* [http://portal.acm.org/citation.cfm?id=767154 A survey of approaches to automatic schema matching] Rahm Erhard and Bernstein Philip,  VLDB 2001
* [http://portal.acm.org/citation.cfm?id=767154 A survey of approaches to automatic schema matching] Rahm Erhard and Bernstein Philip,  VLDB 2001


== Week 11 - April 3 ==
== Week 11 - April 3 ==
Line 173: Line 197:
*  [http://www.cs.washington.edu/homes/weld/papers/adar-wsdm09.pdf  Information Arbitrage in Multi-Lingual Wikipedia.] Adar, E. and Skinner, M. and Weld, D., Second ACM International Conference on Web Search and Data Mining (WSDM'09)
*  [http://www.cs.washington.edu/homes/weld/papers/adar-wsdm09.pdf  Information Arbitrage in Multi-Lingual Wikipedia.] Adar, E. and Skinner, M. and Weld, D., Second ACM International Conference on Web Search and Data Mining (WSDM'09)
* [http://suchanek.name/work/publications/www2007.pdf Yago - A Core of Semantic Knowledge. Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum. ] 16th international World Wide Web conference (WWW 2007)
* [http://suchanek.name/work/publications/www2007.pdf Yago - A Core of Semantic Knowledge. Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum. ] 16th international World Wide Web conference (WWW 2007)
* [http://vgc.poly.edu/~juliana/courses/cs6093/Readings/bizer-web-sem2009..pdf DBpedia - A crystallization point for the Web of Data] Bizer et al., Web Semantics 2009.
** Presenters: Sergey Nepomnyachiy, Shoshana Gottesman, Haibo Zeng
** Rebuttal: Wei Jiang,  Juliana Freire, Majed Hakami


=== Additional Reading ===
=== Additional Reading ===
Line 190: Line 215:
===Required Reading ===
===Required Reading ===


* [http://vgc.poly.edu/~juliana/courses/cs6093/Readings/bizer-web-sem2009..pdf DBpedia - A crystallization point for the Web of Data] Bizer et al., Web Semantics 2009.
* [http://pages.cs.wisc.edu/~anhai/papers/delex-sigmod09.pdf  Optimizing Complex Extraction Programs over Evolving Text Data.] F. Chen, B. Gao, A. Doan, J. Yang, R. Ramakrishnan. SIGMOD 2009
* [http://pages.cs.wisc.edu/~anhai/papers/delex-sigmod09.pdf  Optimizing Complex Extraction Programs over Evolving Text Data.] F. Chen, B. Gao, A. Doan, J. Yang, R. Ramakrishnan. SIGMOD 2009
* [http://pages.cs.wisc.edu/~anhai/papers/ie-provenance-vldb08.pdf On the Provenance of Non-Answers to Queries over Extracted Data]. Huang et al, VLDB 2008
* [http://pages.cs.wisc.edu/~anhai/papers/ie-provenance-vldb08.pdf On the Provenance of Non-Answers to Queries over Extracted Data]. Huang et al, VLDB 2008
* [http://turing.cs.washington.edu/papers/kdd08.pdf Information Extraction From WikipediaMoving Down the Long Tail] Fei Wu, Raphael Hoffmann, Daniel S. Weld
** Presenters: Haibo Zeng, Chunqing Jiang, Bhaktavatsalam Nallanthighal
** Rebuttal:   Majed Hakami, Xiang Liu, May Thazin


=== Additional Reading ===
=== Additional Reading ===
* [http://turing.cs.washington.edu/papers/kdd08.pdf Information Extraction From Wikipedia:  Moving Down the Long Tail] Fei Wu, Raphael Hoffmann, Daniel S. Weld
* [http://www.it.iitb.ac.in/~sunita/papers/ieSurvey.pdf Information extraction] Sunita Sarawagi.  FnT Databases, 1(3), 2008.
* [http://www.it.iitb.ac.in/~sunita/papers/ieSurvey.pdf Information extraction] Sunita Sarawagi.  FnT Databases, 1(3), 2008.
* [http://pages.cs.wisc.edu/~anhai/papers/spec-issue-intro-sigmodrec08.pdf Introduction to the Special Issue on Managing Information Extraction] Doan et al., SIGMOD Record 2008.
* [http://pages.cs.wisc.edu/~anhai/papers/spec-issue-intro-sigmodrec08.pdf Introduction to the Special Issue on Managing Information Extraction] Doan et al., SIGMOD Record 2008.


== Week 13 - April 17 ==
== Week 13 - April 17 ==
* Keyword queries over relational data


=== Assignment ===
=== Assignment ===
* Write a position papers for the required papers
* Write a position papers for the required papers
* Twitter and News: finding entities and trends


===Required Reading ===
===Required Reading ===


* [http://pages.cs.wisc.edu/~anhai/papers/scalable-kws-vldb10.pdf Toward Scalable Keyword Search over Relational Data] Baid et al., VLDB 2010
* [http://vgc.poly.edu/wiki/vgc/index.php/File:D11-1141.pdf Named Entity Recognition in Tweets: An Experimental Study.] EMNLP 2011
* [http://www.vldb.org/conf/2002/S33P11.pdf BANKS: Browsing and Keyword Searching in Relational Databases] Aditya et al., VLDB 2002
* [http://vgc.poly.edu/wiki/vgc/index.php/File:NerTwitter.pdf Recognizing Named Entities in Tweets]   ACL 2011
* [http://vgc.poly.edu/wiki/vgc/index.php/File:TrackingTrends.pdf Tracking Trends: Incorporating Term Volume into Temporal Topic Models.] KDD 2011
** Presenters:  Fernando Seabra, Wei Jiang, Nivan Ferreira
** Rebuttal:  Juliana Freire, Bhaktavatsalam Nallanthighal,  Julie Ondongo
 
=== Additional reading ===
 
* [http://www.www2011india.com/proceeding/proceedings/p267.pdf Unified Analysis of Streaming News] WWW 2011
*  [http://www.cs.ust.hk/~qyang/Docs/2011/cikm-short-text.pdf Transferring Topical Knowledge from Auxiliary Long Texts for Short Text Clustering] CIKM 2011


== Week 14 - April 24 ==
== Week 14 - April 24 ==
* Keyword queries over relational data


=== Assignment ===
=== Assignment ===
Line 216: Line 255:


===Required Reading ===
===Required Reading ===
* [http://pages.cs.wisc.edu/~anhai/papers/scalable-kws-vldb10.pdf Toward Scalable Keyword Search over Relational Data] Baid et al., VLDB 2010
* [http://www.vldb.org/conf/2002/S33P11.pdf BANKS: Browsing and Keyword Searching in Relational Databases] Aditya et al., VLDB 2002
* [http://pages.cs.wisc.edu/~anhai/papers/ie-provenance-vldb08.pdf On the Provenance of Non-Answers to Queries over Extracted Data]. Huang et al, VLDB 2008
** Presenters:  May Thazin,  Tehila Minkus, Bhaktavatsalam Nallanthighal
** Rebuttal:  Tehila Minkus, Vineet Meghani, May Thazin


== Week 15 - May 1 ==
== Week 15 - May 1 ==
Project presentation
Project presentation

Latest revision as of 19:55, 24 April 2012

Make sure to check my.poly.edu for course announcements

Every week, you must write position papers for the papers in the Required Readings list

Week 1 - Jan 24

  • Course overview (First day of classes!)

http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/lecture1.pdf

  • Provenance and Workflows

http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/provenance-workflows.pdf

Readings

  • Querying and Creating Visualizations by Analogy. Carlos E. Scheidegger, Huy T. Vo, David Koop, Juliana Freire and Claudio T. Silva. IEEE Transactions on Visualization and Computer Graphics, 13(6), pp. 1560-1567, 2007. Best paper in IEEE Visualization 2007.

Week 2 - Jan 31

  • Provenance and Workflows (cont.)

http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/provenance-workflows.pdf

  • Discussion about literature search

Readings

same as last week

Week 3 - Feb 7

  • Information extraction: survey

http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/information-extraction.pdf

Announcements

  • The topic winners were: Information Extraction, Deep Web, Relational Data on the Web, Web Schema Matching, NoSQL DB, Provenance in DB, Graph Indexing, Usable query interfaces
  • I will email to you preliminary assignments tomorrow

Assignment

  • Write a position paper for the article: ONDUX: on-demand unsupervised learning for information extraction

Readings

Some history and perspective:

Week 4 - Feb 14

  • Provenance and Databases
  • Graph Indexing

Assignment

  • Write 2 position papers --- one for each of the articles in the required reading for this week (see below)


Required Reading

Additional Suggested Reading

  • A. Das Sarma, M. Theobald, and J. Widom. LIVE: A Lineage-Supported Versioned DBMS. Proceedings of the 22nd International Conference on Scientific and Statistical Database Management, Heidelberg, Germany, June 2010.

http://ilpubs.stanford.edu:8090/926/1/versioning-TR.pdf

  • Total Recall | Oracle Database

http://www.oracle.com/technetwork/database/focus-areas/storage/total-recall-whitepaper-171749.pdf

  • Answering pattern match queries in large graph databases via graph embedding

Lei Zou, Lei Chen, M. Tamer Özsu and Dongyan Zhao http://vgc.poly.edu/~juliana/courses/cs6093/Readings/graph-matching-vldbj2011

  • Chenghui Ren, Eric Lo, Ben Kao, Xinjie Zhu, Reynold Cheng: On Querying Historical Evolving Graph Sequences. PVLDB 4(11): 726-737 (2011)

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/evolving-graphs-vldb11.pdf

Week 5 - Feb 21

  • NoSQL databases

Assignment

  • Write a position papers for the required papers

Required Reading

  • Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf

    • Presenters: Dmitriy Gromov [ttp://vgc.poly.edu/~juliana/courses/cs6093/Lectures/MapReducePresentation_DmitriyGromov.pdf Presentation], Xiang Liu
    • Rebuttal: Fernando Seabra, Shoshana Gottesman

Additional suggested reading

For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics

Week 6 - Feb 28

Introduction to Visualization. Lecture will be given by Professors Claudio Silva and Lauro Lins

There will be no assignment this week, but I plan to give you a quiz on visualization next week.

Suggested Reading

Visualization. Tamara Munzner. Chapter 27, p 675-707, of Fundamentals of Graphics, Third Edition http://www.cs.ubc.ca/labs/imager/tr/2009/VisChapter/akp-vischapter.pdf

Lecture notes. Claudio Silva http://www.cs.utah.edu/~csilva/courses/cs5630/lec01-notes.pdf

Week 7 - March 6

  • NoSQL Databases

Assignment

  • Write a position papers for the required papers

Required Reading

    • Presenters: Julie Odongo, Majed Hakami Presentation, Yuan Ding
    • Rebuttal: Nivan Ferreira, Dmitriy Gromov, Juliana Freire

For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics

Week 8 - March 13

Spring break - no class

Week 9 - March 20

TBD

Week 10 - March 27

  • Web information integration

Assignment

  • Write a position papers for the required papers

Required Reading

Additional Reading

Week 11 - April 3

  • Wikipedia

Assignment

  • Write a position papers for the required papers

Required Reading

Additional Reading

Week 12 - April 10

  • Information extraction

Assignment

  • Write a position papers for the required papers

Required Reading

Additional Reading

Week 13 - April 17

Assignment

  • Write a position papers for the required papers
  • Twitter and News: finding entities and trends

Required Reading

Additional reading

Week 14 - April 24

  • Keyword queries over relational data

Assignment

  • Write a position papers for the required papers

Required Reading

Week 15 - May 1

Project presentation