Difference between revisions of "CS6093/Lectures"

From VistrailsWiki
Jump to navigation Jump to search
 
(48 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''''Make sure to check my.poly.edu for course announcements'''''
'''''Make sure to check my.poly.edu for course announcements'''''


"""Every week, you must write position papers for the papers in the Required Readings list"""
'''''Every week, you must write position papers for the papers in the Required Readings list'''''


== Week 1 - Jan 24 ==
== Week 1 - Jan 24 ==
Line 68: Line 68:
=== Required Reading ===
=== Required Reading ===


* Peter Buneman, Sanjeev Khanna, Wang Chiew Tan: Why and Where: A Characterization of Data Provenance. ICDT 2001: 316-330
* Peter Buneman, Sanjeev Khanna, Wang Chiew Tan: Why and Where: A Characterization of Data Provenance. ICDT 2001: 316-330 http://db.cis.upenn.edu/DL/whywhere.pdf
http://db.cis.upenn.edu/DL/whywhere.pdf
** Presenter: Fernando Seabra [http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/WhyWherePresentation.pdf Presentation]
** Presenter: Fernando Seabra
** Rebuttal: Joe Miller (tentative)
** Rebuttal: Joe Miller (tentative)


* [http://www.vldb.org/conf/2007/papers/research/p938-zhao.pdf Graph Indexing: Tree + Delta >= Graph] P. Zhao, J. X. Yu, and P. S. Yu.  VLDB 2007.
* [http://www.vldb.org/conf/2007/papers/research/p938-zhao.pdf Graph Indexing: Tree + Delta >= Graph] P. Zhao, J. X. Yu, and P. S. Yu.  VLDB 2007.
** Presenter: Nivan Ferreira
** Presenter: Nivan Ferreira
** Rebuttal: Sergey Nepomnyachiy (tentative)
** Rebuttal: Sergey Nepomnyachiy


===Additional Suggested Reading===
===Additional Suggested Reading===
Line 106: Line 105:


===Required Reading ===
===Required Reading ===
* [http://vgc.poly.edu/~juliana/courses/cs6093/Readings/dean-cacm2008.pdf MapReduce: simplified data processing on large clusters] Jeffrey Dean and  Sanjay Ghemawat, CACM 2008


* Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011
* Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf
**Presenters: Dmitriy Gromov [ttp://vgc.poly.edu/~juliana/courses/cs6093/Lectures/MapReducePresentation_DmitriyGromov.pdf Presentation], Xiang Liu
**Rebuttal: Fernando Seabra,  Shoshana Gottesman


* [http://infolab.stanford.edu/~usriv/papers/pig-latin.pdf Pig latin: a not-so-foreign language for data processing].C Olston, B Reed, U Srivastava, R Kuma, A. Tomkins. SIGMOD 2008.
=== Additional suggested reading ===


=== Additional suggested reading ===
* Debate between MR and DB people:
**http://cacm.acm.org/magazines/2010/1/55743-mapreduce-and-parallel-dbmss-friends-or-foes/fulltext
**http://cacm.acm.org/magazines/2010/1/55744-mapreduce-a-flexible-data-processing-tool/fulltext
 
* http://www.computerworld.com/s/article/9224180/What_s_the_big_deal_about_Hadoop_


* [http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/publications/2010/SQLvsNoSQLDatabases.pdf SQL databases v. NoSQL databases.] Michael Stonebraker, CACM 2010.  
* [http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/publications/2010/SQLvsNoSQLDatabases.pdf SQL databases v. NoSQL databases.] Michael Stonebraker, CACM 2010.  
Line 122: Line 129:
== Week 6 - Feb 28 ==
== Week 6 - Feb 28 ==


TBD
[http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/intro-to-visualization.pdf Introduction to Visualization.]  Lecture will be given by Professors Claudio Silva and Lauro Lins
 
There will be no assignment this week, but I plan to give you a quiz on visualization next week.
 
=== Suggested Reading ===
 
Visualization. Tamara Munzner. Chapter 27, p 675-707, of Fundamentals of Graphics, Third Edition
http://www.cs.ubc.ca/labs/imager/tr/2009/VisChapter/akp-vischapter.pdf
 
Lecture notes. Claudio Silva
http://www.cs.utah.edu/~csilva/courses/cs5630/lec01-notes.pdf


== Week 7 - March 6 ==
== Week 7 - March 6 ==
Line 136: Line 153:


* [http://cs-www.cs.yale.edu/homes/dna/papers/split-execution-hadoopdb.pdf Efficient Processing of Data Warehousing Queries in a Split Execution Environment.]  Bajda-Pawlikowsk et al., SIGMOD 2011
* [http://cs-www.cs.yale.edu/homes/dna/papers/split-execution-hadoopdb.pdf Efficient Processing of Data Warehousing Queries in a Split Execution Environment.]  Bajda-Pawlikowsk et al., SIGMOD 2011
* [http://infolab.stanford.edu/~usriv/papers/pig-latin.pdf Pig latin: a not-so-foreign language for data processing].C Olston, B Reed, U Srivastava, R Kumar, A. Tomkins. SIGMOD 2008.
** Presenters: Julie Odongo, Majed Hakami [http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/majed-hadoopdb.pdf Presentation], Yuan Ding
** Rebuttal:  Nivan Ferreira, Dmitriy Gromov, Juliana Freire


For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics
For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics
Line 158: Line 180:


* [http://portal.acm.org/citation.cfm?id=1132863.1132872&coll=GUIDE&dl=GUIDE Automatic complex schema matching across Web query interfaces] Bin He, Kevin Chuan Chang, ACM Trans. Database Syst. 2006
* [http://portal.acm.org/citation.cfm?id=1132863.1132872&coll=GUIDE&dl=GUIDE Automatic complex schema matching across Web query interfaces] Bin He, Kevin Chuan Chang, ACM Trans. Database Syst. 2006
** Presenters: Joe Miller, Vineet Meghani
** Rebuttal:  Yuan Ding,  Chunqing Jiang


=== Additional Reading ===
=== Additional Reading ===
* [http://portal.acm.org/citation.cfm?id=767154 A survey of approaches to automatic schema matching] Rahm Erhard and Bernstein Philip,  VLDB 2001
* [http://portal.acm.org/citation.cfm?id=767154 A survey of approaches to automatic schema matching] Rahm Erhard and Bernstein Philip,  VLDB 2001


== Week 11 - April 3 ==
== Week 11 - April 3 ==
Line 174: Line 197:
*  [http://www.cs.washington.edu/homes/weld/papers/adar-wsdm09.pdf  Information Arbitrage in Multi-Lingual Wikipedia.] Adar, E. and Skinner, M. and Weld, D., Second ACM International Conference on Web Search and Data Mining (WSDM'09)
*  [http://www.cs.washington.edu/homes/weld/papers/adar-wsdm09.pdf  Information Arbitrage in Multi-Lingual Wikipedia.] Adar, E. and Skinner, M. and Weld, D., Second ACM International Conference on Web Search and Data Mining (WSDM'09)
* [http://suchanek.name/work/publications/www2007.pdf Yago - A Core of Semantic Knowledge. Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum. ] 16th international World Wide Web conference (WWW 2007)
* [http://suchanek.name/work/publications/www2007.pdf Yago - A Core of Semantic Knowledge. Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum. ] 16th international World Wide Web conference (WWW 2007)
* [http://vgc.poly.edu/~juliana/courses/cs6093/Readings/bizer-web-sem2009..pdf DBpedia - A crystallization point for the Web of Data] Bizer et al., Web Semantics 2009.
** Presenters: Sergey Nepomnyachiy, Shoshana Gottesman, Haibo Zeng
** Rebuttal: Wei Jiang,  Juliana Freire, Majed Hakami
 
=== Additional Reading ===
 
* [http://www.mpi-inf.mpg.de/yago-naga/yago/publications/YAGO-NAGA-Appr.pdf The YAGO-NAGA Approach to Knowledge Discovery] Gjergji Kasneci, Fabian M. Suchanek, Maya Ramanath, Gerhard Weikum SIGMOD Record 37:4, December 2008
 
* [http://vgc.poly.edu/~juliana/pub/wikimatch-vldb2012.pdf Multilingual Schema Matching for Wikipedia Infoboxes] Nguyen et al., VLDB 2012


== Week 12 - April 10 ==
== Week 12 - April 10 ==
Line 185: Line 215:
===Required Reading ===
===Required Reading ===


* [http://vgc.poly.edu/~juliana/courses/cs6093/Readings/bizer-web-sem2009..pdf DBpedia - A crystallization point for the Web of Data] Bizer et al., Web Semantics 2009.
* [http://pages.cs.wisc.edu/~anhai/papers/delex-sigmod09.pdf  Optimizing Complex Extraction Programs over Evolving Text Data.] F. Chen, B. Gao, A. Doan, J. Yang, R. Ramakrishnan. SIGMOD 2009
* [http://pages.cs.wisc.edu/~anhai/papers/delex-sigmod09.pdf  Optimizing Complex Extraction Programs over Evolving Text Data.] F. Chen, B. Gao, A. Doan, J. Yang, R. Ramakrishnan. SIGMOD 2009
* [http://pages.cs.wisc.edu/~anhai/papers/ie-provenance-vldb08.pdf On the Provenance of Non-Answers to Queries over Extracted Data]. Huang et al, VLDB 2008
* [http://pages.cs.wisc.edu/~anhai/papers/ie-provenance-vldb08.pdf On the Provenance of Non-Answers to Queries over Extracted Data]. Huang et al, VLDB 2008
* [http://turing.cs.washington.edu/papers/kdd08.pdf Information Extraction From WikipediaMoving Down the Long Tail] Fei Wu, Raphael Hoffmann, Daniel S. Weld
** Presenters: Haibo Zeng, Chunqing Jiang, Bhaktavatsalam Nallanthighal
** Rebuttal:   Majed Hakami, Xiang Liu, May Thazin


=== Additional Reading ===
=== Additional Reading ===
* [http://turing.cs.washington.edu/papers/kdd08.pdf Information Extraction From Wikipedia:  Moving Down the Long Tail] Fei Wu, Raphael Hoffmann, Daniel S. Weld
* [http://www.it.iitb.ac.in/~sunita/papers/ieSurvey.pdf Information extraction] Sunita Sarawagi.  FnT Databases, 1(3), 2008.
* [http://www.it.iitb.ac.in/~sunita/papers/ieSurvey.pdf Information extraction] Sunita Sarawagi.  FnT Databases, 1(3), 2008.
* [http://pages.cs.wisc.edu/~anhai/papers/spec-issue-intro-sigmodrec08.pdf Introduction to the Special Issue on Managing Information Extraction] Doan et al., SIGMOD Record 2008.
* [http://pages.cs.wisc.edu/~anhai/papers/spec-issue-intro-sigmodrec08.pdf Introduction to the Special Issue on Managing Information Extraction] Doan et al., SIGMOD Record 2008.


== Week 13 - April 17 ==
== Week 13 - April 17 ==
* Keyword queries over relational data


=== Assignment ===
=== Assignment ===
* Write a position papers for the required papers
* Write a position papers for the required papers
* Twitter and News: finding entities and trends


===Required Reading ===
===Required Reading ===


* [http://pages.cs.wisc.edu/~anhai/papers/scalable-kws-vldb10.pdf Toward Scalable Keyword Search over Relational Data] Baid et al., VLDB 2010
* [http://vgc.poly.edu/wiki/vgc/index.php/File:D11-1141.pdf Named Entity Recognition in Tweets: An Experimental Study.] EMNLP 2011
* [http://www.vldb.org/conf/2002/S33P11.pdf BANKS: Browsing and Keyword Searching in Relational Databases] Aditya et al., VLDB 2002
* [http://vgc.poly.edu/wiki/vgc/index.php/File:NerTwitter.pdf Recognizing Named Entities in Tweets]   ACL 2011
* [http://vgc.poly.edu/wiki/vgc/index.php/File:TrackingTrends.pdf Tracking Trends: Incorporating Term Volume into Temporal Topic Models.] KDD 2011
** Presenters:  Fernando Seabra, Wei Jiang, Nivan Ferreira
** Rebuttal:  Juliana Freire, Bhaktavatsalam Nallanthighal,  Julie Ondongo
 
=== Additional reading ===
 
* [http://www.www2011india.com/proceeding/proceedings/p267.pdf Unified Analysis of Streaming News] WWW 2011
*  [http://www.cs.ust.hk/~qyang/Docs/2011/cikm-short-text.pdf Transferring Topical Knowledge from Auxiliary Long Texts for Short Text Clustering] CIKM 2011


== Week 14 - April 24 ==
== Week 14 - April 24 ==
* Keyword queries over relational data


=== Assignment ===
=== Assignment ===
Line 211: Line 255:


===Required Reading ===
===Required Reading ===
* [http://pages.cs.wisc.edu/~anhai/papers/scalable-kws-vldb10.pdf Toward Scalable Keyword Search over Relational Data] Baid et al., VLDB 2010
* [http://www.vldb.org/conf/2002/S33P11.pdf BANKS: Browsing and Keyword Searching in Relational Databases] Aditya et al., VLDB 2002
* [http://pages.cs.wisc.edu/~anhai/papers/ie-provenance-vldb08.pdf On the Provenance of Non-Answers to Queries over Extracted Data]. Huang et al, VLDB 2008
** Presenters:  May Thazin,  Tehila Minkus, Bhaktavatsalam Nallanthighal
** Rebuttal:  Tehila Minkus, Vineet Meghani, May Thazin


== Week 15 - May 1 ==
== Week 15 - May 1 ==
Project presentation
Project presentation

Latest revision as of 19:55, 24 April 2012

Make sure to check my.poly.edu for course announcements

Every week, you must write position papers for the papers in the Required Readings list

Week 1 - Jan 24

  • Course overview (First day of classes!)

http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/lecture1.pdf

  • Provenance and Workflows

http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/provenance-workflows.pdf

Readings

  • Querying and Creating Visualizations by Analogy. Carlos E. Scheidegger, Huy T. Vo, David Koop, Juliana Freire and Claudio T. Silva. IEEE Transactions on Visualization and Computer Graphics, 13(6), pp. 1560-1567, 2007. Best paper in IEEE Visualization 2007.

Week 2 - Jan 31

  • Provenance and Workflows (cont.)

http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/provenance-workflows.pdf

  • Discussion about literature search

Readings

same as last week

Week 3 - Feb 7

  • Information extraction: survey

http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/information-extraction.pdf

Announcements

  • The topic winners were: Information Extraction, Deep Web, Relational Data on the Web, Web Schema Matching, NoSQL DB, Provenance in DB, Graph Indexing, Usable query interfaces
  • I will email to you preliminary assignments tomorrow

Assignment

  • Write a position paper for the article: ONDUX: on-demand unsupervised learning for information extraction

Readings

Some history and perspective:

Week 4 - Feb 14

  • Provenance and Databases
  • Graph Indexing

Assignment

  • Write 2 position papers --- one for each of the articles in the required reading for this week (see below)


Required Reading

Additional Suggested Reading

  • A. Das Sarma, M. Theobald, and J. Widom. LIVE: A Lineage-Supported Versioned DBMS. Proceedings of the 22nd International Conference on Scientific and Statistical Database Management, Heidelberg, Germany, June 2010.

http://ilpubs.stanford.edu:8090/926/1/versioning-TR.pdf

  • Total Recall | Oracle Database

http://www.oracle.com/technetwork/database/focus-areas/storage/total-recall-whitepaper-171749.pdf

  • Answering pattern match queries in large graph databases via graph embedding

Lei Zou, Lei Chen, M. Tamer Özsu and Dongyan Zhao http://vgc.poly.edu/~juliana/courses/cs6093/Readings/graph-matching-vldbj2011

  • Chenghui Ren, Eric Lo, Ben Kao, Xinjie Zhu, Reynold Cheng: On Querying Historical Evolving Graph Sequences. PVLDB 4(11): 726-737 (2011)

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/evolving-graphs-vldb11.pdf

Week 5 - Feb 21

  • NoSQL databases

Assignment

  • Write a position papers for the required papers

Required Reading

  • Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf

    • Presenters: Dmitriy Gromov [ttp://vgc.poly.edu/~juliana/courses/cs6093/Lectures/MapReducePresentation_DmitriyGromov.pdf Presentation], Xiang Liu
    • Rebuttal: Fernando Seabra, Shoshana Gottesman

Additional suggested reading

For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics

Week 6 - Feb 28

Introduction to Visualization. Lecture will be given by Professors Claudio Silva and Lauro Lins

There will be no assignment this week, but I plan to give you a quiz on visualization next week.

Suggested Reading

Visualization. Tamara Munzner. Chapter 27, p 675-707, of Fundamentals of Graphics, Third Edition http://www.cs.ubc.ca/labs/imager/tr/2009/VisChapter/akp-vischapter.pdf

Lecture notes. Claudio Silva http://www.cs.utah.edu/~csilva/courses/cs5630/lec01-notes.pdf

Week 7 - March 6

  • NoSQL Databases

Assignment

  • Write a position papers for the required papers

Required Reading

    • Presenters: Julie Odongo, Majed Hakami Presentation, Yuan Ding
    • Rebuttal: Nivan Ferreira, Dmitriy Gromov, Juliana Freire

For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics

Week 8 - March 13

Spring break - no class

Week 9 - March 20

TBD

Week 10 - March 27

  • Web information integration

Assignment

  • Write a position papers for the required papers

Required Reading

Additional Reading

Week 11 - April 3

  • Wikipedia

Assignment

  • Write a position papers for the required papers

Required Reading

Additional Reading

Week 12 - April 10

  • Information extraction

Assignment

  • Write a position papers for the required papers

Required Reading

Additional Reading

Week 13 - April 17

Assignment

  • Write a position papers for the required papers
  • Twitter and News: finding entities and trends

Required Reading

Additional reading

Week 14 - April 24

  • Keyword queries over relational data

Assignment

  • Write a position papers for the required papers

Required Reading

Week 15 - May 1

Project presentation