Difference between revisions of "New Database Schema"

From VistrailsWiki
Jump to navigation Jump to search
Line 1: Line 1:
We are proposing an update to the schema for storing a vistrail in a MySQL database. The motivation is:
We are proposing an update to the schema for storing a vistrail in a MySQL database.


* The current schema requires multiple SQL commands to access and update a vistrail. One SELECT query is required for each table we want to extract information from.  This gets really slow when the server is far away from the client.
The current schema requires multiple SQL commands to access and update a vistrail. One SELECT query is required for each table we want to extract information from.  This gets really slow when the server is far away from the client. For instance, there are currently around 28 tables used. A round-trip query from USA to Europe takes 0.17 seconds. That is about 5 seconds in total time. This means that each time you press the save button, VisTrails locks up for 5 seconds. We have not found a simple way of getting around this problem. We need to store the vistrail in a database because we want to be able to perform detailed queries across all the provenance information.


== Proposed Improvements ==
== Proposed Improvements ==

Revision as of 15:45, 18 February 2011

We are proposing an update to the schema for storing a vistrail in a MySQL database.

The current schema requires multiple SQL commands to access and update a vistrail. One SELECT query is required for each table we want to extract information from. This gets really slow when the server is far away from the client. For instance, there are currently around 28 tables used. A round-trip query from USA to Europe takes 0.17 seconds. That is about 5 seconds in total time. This means that each time you press the save button, VisTrails locks up for 5 seconds. We have not found a simple way of getting around this problem. We need to store the vistrail in a database because we want to be able to perform detailed queries across all the provenance information.

Proposed Improvements

We have come up with 3 possible solutions that each have its advantages and disadvantages. Comments and other suggestions are appreciated.

  1. Store all information in a single table.
    Pro: One single command can be used to access and update a vistrail.
    Pro: We can still use a vanilla MySQL server.
    Con: A lot of redundancy in this table since different objects need different kinds of fields such as dates and integers.
    Con: One big table might lead to slow performance.
    Con: Difficult to implement.
  2. Add an intermediate server application that can receive and perform multiple SQL commands in batches.
    Pro: No changes to the schema necessary.
    Con: We have to install a new application on the server side that handles communication between VisTrails and the MySQL server.
  3. Use a more object-focused database such as an XML database.
    Pro: A single query can retrieve a complete vistrail.
    Con: We would need to install a new database such as eXist
    Con: We would need to use a different query language such as XQuery
  4. Other ideas?