Difference between revisions of "CS6093/Lectures"

From VistrailsWiki
Jump to navigation Jump to search
 
(24 intermediate revisions by the same user not shown)
Line 69: Line 69:


* Peter Buneman, Sanjeev Khanna, Wang Chiew Tan: Why and Where: A Characterization of Data Provenance. ICDT 2001: 316-330 http://db.cis.upenn.edu/DL/whywhere.pdf
* Peter Buneman, Sanjeev Khanna, Wang Chiew Tan: Why and Where: A Characterization of Data Provenance. ICDT 2001: 316-330 http://db.cis.upenn.edu/DL/whywhere.pdf
** Presenter: Fernando Seabra
** Presenter: Fernando Seabra [http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/WhyWherePresentation.pdf Presentation]
** Rebuttal: Joe Miller (tentative)
** Rebuttal: Joe Miller (tentative)


Line 106: Line 106:
===Required Reading ===
===Required Reading ===


* [http://vgc.poly.edu/~juliana/courses/cs6093/Readings/dean-cacm2008 MapReduce: simplified data processing on large clusters] Jeffrey Dean and  Sanjay Ghemawat, CACM 2008
* [http://vgc.poly.edu/~juliana/courses/cs6093/Readings/dean-cacm2008.pdf MapReduce: simplified data processing on large clusters] Jeffrey Dean and  Sanjay Ghemawat, CACM 2008


* Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011
* Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf
**Presenters: Dmitriy Gromov [ttp://vgc.poly.edu/~juliana/courses/cs6093/Lectures/MapReducePresentation_DmitriyGromov.pdf Presentation], Xiang Liu
**Rebuttal: Fernando Seabra,  Shoshana Gottesman


* [http://infolab.stanford.edu/~usriv/papers/pig-latin.pdf Pig latin: a not-so-foreign language for data processing].C Olston, B Reed, U Srivastava, R Kuma, A. Tomkins. SIGMOD 2008.
=== Additional suggested reading ===
** Presenters: Dmitriy Gromov,Xiang Liu, Yuan Ding
 
**Rebuttal: Nivan Ferreira,  Shoshana Gottesman
* Debate between MR and DB people:
**http://cacm.acm.org/magazines/2010/1/55743-mapreduce-and-parallel-dbmss-friends-or-foes/fulltext
**http://cacm.acm.org/magazines/2010/1/55744-mapreduce-a-flexible-data-processing-tool/fulltext


=== Additional suggested reading ===
* http://www.computerworld.com/s/article/9224180/What_s_the_big_deal_about_Hadoop_


* [http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/publications/2010/SQLvsNoSQLDatabases.pdf SQL databases v. NoSQL databases.] Michael Stonebraker, CACM 2010.  
* [http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/publications/2010/SQLvsNoSQLDatabases.pdf SQL databases v. NoSQL databases.] Michael Stonebraker, CACM 2010.  
Line 125: Line 129:
== Week 6 - Feb 28 ==
== Week 6 - Feb 28 ==


TBD
[http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/intro-to-visualization.pdf Introduction to Visualization.]  Lecture will be given by Professors Claudio Silva and Lauro Lins
 
There will be no assignment this week, but I plan to give you a quiz on visualization next week.
 
=== Suggested Reading ===
 
Visualization. Tamara Munzner. Chapter 27, p 675-707, of Fundamentals of Graphics, Third Edition
http://www.cs.ubc.ca/labs/imager/tr/2009/VisChapter/akp-vischapter.pdf
 
Lecture notes. Claudio Silva
http://www.cs.utah.edu/~csilva/courses/cs5630/lec01-notes.pdf


== Week 7 - March 6 ==
== Week 7 - March 6 ==
Line 140: Line 154:
* [http://cs-www.cs.yale.edu/homes/dna/papers/split-execution-hadoopdb.pdf Efficient Processing of Data Warehousing Queries in a Split Execution Environment.]  Bajda-Pawlikowsk et al., SIGMOD 2011
* [http://cs-www.cs.yale.edu/homes/dna/papers/split-execution-hadoopdb.pdf Efficient Processing of Data Warehousing Queries in a Split Execution Environment.]  Bajda-Pawlikowsk et al., SIGMOD 2011


** Presenters: Julie Odongo, Majed Hakami
* [http://infolab.stanford.edu/~usriv/papers/pig-latin.pdf Pig latin: a not-so-foreign language for data processing].C Olston, B Reed, U Srivastava, R Kumar, A. Tomkins. SIGMOD 2008.
** Rebuttal:  Fernando Seabra, Dmitriy Gromov
 
** Presenters: Julie Odongo, Majed Hakami [http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/majed-hadoopdb.pdf Presentation], Yuan Ding
** Rebuttal:  Nivan Ferreira, Dmitriy Gromov, Juliana Freire


For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics
For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics
Line 181: Line 197:
*  [http://www.cs.washington.edu/homes/weld/papers/adar-wsdm09.pdf  Information Arbitrage in Multi-Lingual Wikipedia.] Adar, E. and Skinner, M. and Weld, D., Second ACM International Conference on Web Search and Data Mining (WSDM'09)
*  [http://www.cs.washington.edu/homes/weld/papers/adar-wsdm09.pdf  Information Arbitrage in Multi-Lingual Wikipedia.] Adar, E. and Skinner, M. and Weld, D., Second ACM International Conference on Web Search and Data Mining (WSDM'09)
* [http://suchanek.name/work/publications/www2007.pdf Yago - A Core of Semantic Knowledge. Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum. ] 16th international World Wide Web conference (WWW 2007)
* [http://suchanek.name/work/publications/www2007.pdf Yago - A Core of Semantic Knowledge. Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum. ] 16th international World Wide Web conference (WWW 2007)
* [http://vgc.poly.edu/~juliana/courses/cs6093/Readings/bizer-web-sem2009..pdf DBpedia - A crystallization point for the Web of Data] Bizer et al., Web Semantics 2009.
** Presenters: Sergey Nepomnyachiy, Shoshana Gottesman, Haibo Zeng
** Presenters: Sergey Nepomnyachiy, Shoshana Gottesman, Haibo Zeng
** Rebuttal: Wei Jiang,  Maneli Kadkhodazadeh, Majed Hakami
** Rebuttal: Wei Jiang,  Juliana Freire, Majed Hakami


=== Additional Reading ===
=== Additional Reading ===
Line 200: Line 215:
===Required Reading ===
===Required Reading ===


* [http://vgc.poly.edu/~juliana/courses/cs6093/Readings/bizer-web-sem2009..pdf DBpedia - A crystallization point for the Web of Data] Bizer et al., Web Semantics 2009.
* [http://pages.cs.wisc.edu/~anhai/papers/delex-sigmod09.pdf  Optimizing Complex Extraction Programs over Evolving Text Data.] F. Chen, B. Gao, A. Doan, J. Yang, R. Ramakrishnan. SIGMOD 2009
* [http://pages.cs.wisc.edu/~anhai/papers/delex-sigmod09.pdf  Optimizing Complex Extraction Programs over Evolving Text Data.] F. Chen, B. Gao, A. Doan, J. Yang, R. Ramakrishnan. SIGMOD 2009
* [http://pages.cs.wisc.edu/~anhai/papers/ie-provenance-vldb08.pdf On the Provenance of Non-Answers to Queries over Extracted Data]. Huang et al, VLDB 2008
* [http://pages.cs.wisc.edu/~anhai/papers/ie-provenance-vldb08.pdf On the Provenance of Non-Answers to Queries over Extracted Data]. Huang et al, VLDB 2008
* [http://turing.cs.washington.edu/papers/kdd08.pdf Information Extraction From Wikipedia:  Moving Down the Long Tail] Fei Wu, Raphael Hoffmann, Daniel S. Weld
** Presenters: Haibo Zeng, Chunqing Jiang, Bhaktavatsalam Nallanthighal
** Presenters: Chunqing Jiang, Bhaktavatsalam Nallanthighal,  Sameer More
** Rebuttal:  Majed Hakami, Xiang Liu,  May Thazin
** Rebuttal:  Xiang Liu,  May Thazin, Haibo Zeng


=== Additional Reading ===
=== Additional Reading ===
* [http://turing.cs.washington.edu/papers/kdd08.pdf Information Extraction From Wikipedia:  Moving Down the Long Tail] Fei Wu, Raphael Hoffmann, Daniel S. Weld
* [http://www.it.iitb.ac.in/~sunita/papers/ieSurvey.pdf Information extraction] Sunita Sarawagi.  FnT Databases, 1(3), 2008.
* [http://www.it.iitb.ac.in/~sunita/papers/ieSurvey.pdf Information extraction] Sunita Sarawagi.  FnT Databases, 1(3), 2008.
* [http://pages.cs.wisc.edu/~anhai/papers/spec-issue-intro-sigmodrec08.pdf Introduction to the Special Issue on Managing Information Extraction] Doan et al., SIGMOD Record 2008.
* [http://pages.cs.wisc.edu/~anhai/papers/spec-issue-intro-sigmodrec08.pdf Introduction to the Special Issue on Managing Information Extraction] Doan et al., SIGMOD Record 2008.
Line 222: Line 238:
* [http://vgc.poly.edu/wiki/vgc/index.php/File:NerTwitter.pdf Recognizing Named Entities in Tweets]  ACL 2011
* [http://vgc.poly.edu/wiki/vgc/index.php/File:NerTwitter.pdf Recognizing Named Entities in Tweets]  ACL 2011
* [http://vgc.poly.edu/wiki/vgc/index.php/File:TrackingTrends.pdf Tracking Trends: Incorporating Term Volume into Temporal Topic Models.] KDD 2011
* [http://vgc.poly.edu/wiki/vgc/index.php/File:TrackingTrends.pdf Tracking Trends: Incorporating Term Volume into Temporal Topic Models.] KDD 2011
** Presenters:  Maneli Kadkhodazadeh, Wei Jiang
** Presenters:  Fernando Seabra, Wei Jiang, Nivan Ferreira
** Rebuttal:  Sameer More, Bhaktavatsalam Nallanthighal,  Julie Ondongo
** Rebuttal:  Juliana Freire, Bhaktavatsalam Nallanthighal,  Julie Ondongo


=== Additional reading ===
=== Additional reading ===
Line 242: Line 258:
* [http://pages.cs.wisc.edu/~anhai/papers/scalable-kws-vldb10.pdf Toward Scalable Keyword Search over Relational Data] Baid et al., VLDB 2010
* [http://pages.cs.wisc.edu/~anhai/papers/scalable-kws-vldb10.pdf Toward Scalable Keyword Search over Relational Data] Baid et al., VLDB 2010
* [http://www.vldb.org/conf/2002/S33P11.pdf BANKS: Browsing and Keyword Searching in Relational Databases] Aditya et al., VLDB 2002
* [http://www.vldb.org/conf/2002/S33P11.pdf BANKS: Browsing and Keyword Searching in Relational Databases] Aditya et al., VLDB 2002
** Presenters:  May Thazin,  Tehila Minkus
* [http://pages.cs.wisc.edu/~anhai/papers/ie-provenance-vldb08.pdf On the Provenance of Non-Answers to Queries over Extracted Data]. Huang et al, VLDB 2008
** Rebuttal: Vineet Meghani, Tehila Minkus
** Presenters:  May Thazin,  Tehila Minkus, Bhaktavatsalam Nallanthighal
** Rebuttal:   Tehila Minkus, Vineet Meghani, May Thazin


== Week 15 - May 1 ==
== Week 15 - May 1 ==
Project presentation
Project presentation

Latest revision as of 19:55, 24 April 2012

Make sure to check my.poly.edu for course announcements

Every week, you must write position papers for the papers in the Required Readings list

Week 1 - Jan 24

  • Course overview (First day of classes!)

http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/lecture1.pdf

  • Provenance and Workflows

http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/provenance-workflows.pdf

Readings

  • Querying and Creating Visualizations by Analogy. Carlos E. Scheidegger, Huy T. Vo, David Koop, Juliana Freire and Claudio T. Silva. IEEE Transactions on Visualization and Computer Graphics, 13(6), pp. 1560-1567, 2007. Best paper in IEEE Visualization 2007.

Week 2 - Jan 31

  • Provenance and Workflows (cont.)

http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/provenance-workflows.pdf

  • Discussion about literature search

Readings

same as last week

Week 3 - Feb 7

  • Information extraction: survey

http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/information-extraction.pdf

Announcements

  • The topic winners were: Information Extraction, Deep Web, Relational Data on the Web, Web Schema Matching, NoSQL DB, Provenance in DB, Graph Indexing, Usable query interfaces
  • I will email to you preliminary assignments tomorrow

Assignment

  • Write a position paper for the article: ONDUX: on-demand unsupervised learning for information extraction

Readings

Some history and perspective:

Week 4 - Feb 14

  • Provenance and Databases
  • Graph Indexing

Assignment

  • Write 2 position papers --- one for each of the articles in the required reading for this week (see below)


Required Reading

Additional Suggested Reading

  • A. Das Sarma, M. Theobald, and J. Widom. LIVE: A Lineage-Supported Versioned DBMS. Proceedings of the 22nd International Conference on Scientific and Statistical Database Management, Heidelberg, Germany, June 2010.

http://ilpubs.stanford.edu:8090/926/1/versioning-TR.pdf

  • Total Recall | Oracle Database

http://www.oracle.com/technetwork/database/focus-areas/storage/total-recall-whitepaper-171749.pdf

  • Answering pattern match queries in large graph databases via graph embedding

Lei Zou, Lei Chen, M. Tamer Özsu and Dongyan Zhao http://vgc.poly.edu/~juliana/courses/cs6093/Readings/graph-matching-vldbj2011

  • Chenghui Ren, Eric Lo, Ben Kao, Xinjie Zhu, Reynold Cheng: On Querying Historical Evolving Graph Sequences. PVLDB 4(11): 726-737 (2011)

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/evolving-graphs-vldb11.pdf

Week 5 - Feb 21

  • NoSQL databases

Assignment

  • Write a position papers for the required papers

Required Reading

  • Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf

    • Presenters: Dmitriy Gromov [ttp://vgc.poly.edu/~juliana/courses/cs6093/Lectures/MapReducePresentation_DmitriyGromov.pdf Presentation], Xiang Liu
    • Rebuttal: Fernando Seabra, Shoshana Gottesman

Additional suggested reading

For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics

Week 6 - Feb 28

Introduction to Visualization. Lecture will be given by Professors Claudio Silva and Lauro Lins

There will be no assignment this week, but I plan to give you a quiz on visualization next week.

Suggested Reading

Visualization. Tamara Munzner. Chapter 27, p 675-707, of Fundamentals of Graphics, Third Edition http://www.cs.ubc.ca/labs/imager/tr/2009/VisChapter/akp-vischapter.pdf

Lecture notes. Claudio Silva http://www.cs.utah.edu/~csilva/courses/cs5630/lec01-notes.pdf

Week 7 - March 6

  • NoSQL Databases

Assignment

  • Write a position papers for the required papers

Required Reading

    • Presenters: Julie Odongo, Majed Hakami Presentation, Yuan Ding
    • Rebuttal: Nivan Ferreira, Dmitriy Gromov, Juliana Freire

For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics

Week 8 - March 13

Spring break - no class

Week 9 - March 20

TBD

Week 10 - March 27

  • Web information integration

Assignment

  • Write a position papers for the required papers

Required Reading

Additional Reading

Week 11 - April 3

  • Wikipedia

Assignment

  • Write a position papers for the required papers

Required Reading

Additional Reading

Week 12 - April 10

  • Information extraction

Assignment

  • Write a position papers for the required papers

Required Reading

Additional Reading

Week 13 - April 17

Assignment

  • Write a position papers for the required papers
  • Twitter and News: finding entities and trends

Required Reading

Additional reading

Week 14 - April 24

  • Keyword queries over relational data

Assignment

  • Write a position papers for the required papers

Required Reading

Week 15 - May 1

Project presentation