Difference between revisions of "SciVisFall2008/Assignment 1"

From VistrailsWiki
Jump to navigation Jump to search
m
Line 27: Line 27:
== Exercise 1: Principles of plotting ==
== Exercise 1: Principles of plotting ==


The file [[stocks.dat]] has the the first quote
The file [[stocks.dat]] has the first quote for each     <br/>
for each month from January 2006 to September 2008
month from January 2006 to September 2008 for the papers <br/>
for the papers from Apple Inc. (AAPL) and Microsoft  
from Apple Inc. (AAPL) and Microsoft Corporation (MSFT). <br/>
Corporation (MSFT). Below we present the first three  
Below we present the first three lines and the last two <br/>
lines and the last two lines of this file.
lines of this file.


  month,apple,microsoft
  month,apple,microsoft
Line 41: Line 41:




a. Apply the principles of plotting described in class  
a. Apply the principles of plotting described in class <br/>
and in the class notes to generate a simple connected  
and in the class notes to generate a simple connected   <br/>
symbol plot for all Apple's quotes in the file  
symbol plot for all Apple's quotes in the file         <br/>
[[stocks.dat]]. Tag the final version of this plot as  
[[stocks.dat]]. Tag the final version of this plot as   <br/>
"Problem 1a" and annotate it with an explanation  
"Problem 1a" and annotate it with an explanation       <br/>
of the plotting principles you used to make this  
of the plotting principles you used to make this       <br/>
a clear plot.
a clear plot.                                           <br/>


b. Using as reference the quote of January 2006 directly
b. Using as reference the quote of January 2006 directly     <br/>
compare the progress of Apple's and Microsoft's papers by  
compare the progress of Apple's and Microsoft's papers by     <br/>
generating a plot using superposition (both curves in the  
generating a plot using superposition (both curves in the     <br/>
same plot). Tag this final plot as "Problem 1b" and annotate
same plot). Tag this final plot as "Problem 1b" and annotate <br/>
it with the conclusions you can draw from this plot.
it with the conclusions you can draw from this plot.         <br/>


c. Repeat item b, but now using juxtaposition: split the  
c. Repeat item b, but now using juxtaposition: split the       <br/>
two curves (i.e. Apple's paper progress relative to January  
two curves (i.e. Apple's paper progress relative to January     <br/>
2006 and Microsoft's paper progress relative to January 2006)  
2006 and Microsoft's paper progress relative to January 2006)   <br/>
into two different plots (each plot in a different spreadsheet  
into two different plots (each plot in a different spreadsheet <br/>
cell). Tag the final version as "Problem 1c" and annotate it
cell). Tag the final version as "Problem 1c" and annotate it   <br/>
describing which technique (superpostion vs. juxtaposition)
describing which technique (superpostion vs. juxtaposition)     <br/>
makes more sense for this data and why.
makes more sense for this data and why.                         <br/>


== Exercise 2: Histogram and number of bins ==
== Exercise 2: Histogram and number of bins ==

Revision as of 15:17, 23 September 2008

This is your second assignment for CS 5630/6630.

The assignment is due at midnight on October ??th, 2008. You will need to use the CADE handin functionality to turn in your assignment. The class account is "cs5630".

This assignment was successfully tested in release 1.2rev1263. It should work fine in releases >=1.2rev1263. Check your release before starting your work and upgrade it if necessary.

The Vistrails User's Guide will probably be helpful to you in this assignment.

The purpose of this initial assignment is to make sure you familiarize yourself with basic concepts of the VisTrails system, VTK, and matplotlib. As you work on it, we encourage you to read the available documentation on those tools (links available from the class wiki).

Use Vistrails file Assignment0.vt as the starting point for all problems in this assignment. Open this file and start working on the problems. Save your progress. Don't worry if you make mistakes, that is the beauty in Vistrails you can always redo, undo and/or branch from any point in the history tree. In the end you will have an updated Assignment0.vt file with the original file plus all your work. This will be the file that you should turn in.

Exercise 1: Principles of plotting

The file stocks.dat has the first quote for each
month from January 2006 to September 2008 for the papers
from Apple Inc. (AAPL) and Microsoft Corporation (MSFT).
Below we present the first three lines and the last two
lines of this file.

month,apple,microsoft
2008-09,140.91,25.16
2008-08,169.53,27.29
...
2006-02,68.49,25.92
2006-01,75.51,27.06


a. Apply the principles of plotting described in class
and in the class notes to generate a simple connected
symbol plot for all Apple's quotes in the file
stocks.dat. Tag the final version of this plot as
"Problem 1a" and annotate it with an explanation
of the plotting principles you used to make this
a clear plot.

b. Using as reference the quote of January 2006 directly
compare the progress of Apple's and Microsoft's papers by
generating a plot using superposition (both curves in the
same plot). Tag this final plot as "Problem 1b" and annotate
it with the conclusions you can draw from this plot.

c. Repeat item b, but now using juxtaposition: split the
two curves (i.e. Apple's paper progress relative to January
2006 and Microsoft's paper progress relative to January 2006)
into two different plots (each plot in a different spreadsheet
cell). Tag the final version as "Problem 1c" and annotate it
describing which technique (superpostion vs. juxtaposition)
makes more sense for this data and why.

Exercise 2: Histogram and number of bins

Like this year, in the Fall of 2007, during the Scientific Visualization Course we collected all the assignments of the students in Vistrails' format. The file actions_fall_2007.dat has all the timestamps of all the actions of all the students in all the assignments: a total of 132131 actions. Using matplotlib in Vistrails, create a histogram for the distribution of these timestamps and highlight the folowing due dates in the histogram. (obs. note that by some reason assignment 5 had a due data before assignment 6).

| Assigment | Due Date            |
|-----------+---------------------|
|         0 | 2007-09-18 12:00:00 |
|         1 | 2007-09-18 12:00:00 |
|         2 | 2007-10-04 12:00:00 |
|         3 | 2007-10-25 12:00:00 |
|         4 | 2007-11-27 12:00:00 |
|         5 | 2007-12-15 12:00:00 |
|         6 | 2007-12-11 12:00:00 |

When you finish your histogram tag its pipeline version with "Problem 2". And annotate it answering the following questions:

a. How did you select the bins for the histogram and why?

b. What hypothesis can you make about the amount of work (i.e. number of actions) for the different assignments just by looking to this histogram.

c. What pattern can you observe for the amount of work (i.e. number of actions) close to the deadlines?

Exercise 3: Dot plots for labeled data

Each line of the file microprocessors.dat has two quantitative values associated to a label. The quantitative values are "year of introduction" and "number of transistors" and the label is name of the "microprocessor". Generate two dot plots horizontally juxtaposed for these microprocessors: one for "year of introduction" and the other for "number of transistors". For "number of transistors" use log base 10 scale.

Exercise 4: Correlation, scatterplot and regression

Let A, B, C, D be four genes. A scientist measured the activity (i.e. the expression) of these genes in 100 different conditions. The results are given in file genes.dat. Generate a 4 x 4 matrix of scatter plots to understand correlations between the four genes. Visually analyze the plot and rank the genes B, C, D in decrescent order of correlation to A. Now draw a linear best fit line in the plots of A with its most correlated gene, a quadratic best fit in the plots o A with its second most correlated gene and a cubic best fit in the plots of A with its most uncorrelated gene. Tag the final pipeline version that does all this work as "Problem 4".