This is your first real assignment for CS 5630/6630.
The assignment is due at midnight on September ??, 2007. You will need to use the CADE handin functionality to turn in your assignment. The class account is "cs5630".
The purpose of this initial assignment is to make sure you understand the basic plotting concepts covered in class. As you work on it, we encourage you to read the available documentation on both matplotlib and python (links available from the class wiki).
Here is the initial vistrail file hw1.vt that you should use for completing your work. You should use this vistrails to do your assignment. Remember what you talk in class regarding submitting "showing your work", you should submit the complete vistrail file that you used for solving the problems. More details below.
This problem deals with correlation (for an example, see the Correlation.vt example). The data we will be using comes from weather measurements near Snowbird Ski Resort in Little Cottonwood Canyon (original data found here). To make things simpler, the data we are giving you has been reformatted so that it is easy to parse. The temp_precip.dat file contains a line for each day of the year which includes the air temperature in Celcius and amount of precipitation in inches (in form "10:0.5" for 10 degrees C and 0.5 inches). Note, this is a similar format that the labeled data in the MammalScaling.vt example is provided, so you can use a similar parser. Perform the following tasks and label the nodes "Problem4a", "Problem4b", etc.
a. (Grads and UGrads) Plot the data using a scatterplot with temperature on the X axis and precipitation on the Y axis. Be sure to use the basic principles of plotting. In the notes for this node, describe any correlation that you can perceive (rough judgement, not calculated) and any conclusions that could be drawn.
b. (Grads only) Because of the limited resolution of the measurements, the data takes a regular spacing and points are stacked. This makes it difficult to analyze concentrations of the data. Resolve this problem by using one of the following techniques: - jittering: Perturb the points by a small amount of randomness such that the overlap is reduced. - symbols: Find stacked points and represent them using one point that is drawn differently (heavier weight or different symbol) - colormap: Find stacked points and color them differently depending on how many are in the stack. In the notes for the node, describe what you did.
c. (Grads only) Perform a linear regression to fit a line through the data. Is a degree 1 polynomial (line) sufficient? What happens with a higher degree polynomial such as a cubic (degree 3) polynomial? Note, the 3rd parameter of the scipy.polyfit function defines the degree of the polynomial. The number of coefficients returned from scipy.polyfit is determined by the degree. Thus (ar,br) = scipy.polyfit(x,y,1) would need to be (ar,br,cr) = scipy.polyfit(x,y,2). The polyval function would need to be changed in a similar way. Also note that a sort on the x axis may need to be performed on the data for the polyval points to be monotonic (and thus not overlapping). In the notes, describe what fit you settled on and why.