Assignment 1: Provenance and Data Exploration
During class last week, you explored MTA data about subway fares. For your assignment, you will further explore this data set and try to find at least 4 interesting facts/observations. Use your creativity!
You should look at data from different weeks (http://web.mta.info/developers/fare.html) and you can also use other data sets (e.g., http://web.mta.info/developers/turnstile.html) that might be interesting to integrate with the fare data.
You will use VisTrails for this assignment, and you can start from the example you submitted on git, or you can use this: http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/Provenance/mta-analysis.vt
You can find more information about VisTrails in the Users' Guide.
You can exchange ideas with your classmates, but the work you submit should be your own. Copying is not allowed.
You will submit the vt file containing the trail of your analysis to NYU Classes. Some guidelines you should follow:
- The pipelines that correspond to the interesting facts you discover should be tagged using the following convention: Fact <number>. For example, Fact 1, Fact 2, etc. You can set the tag on the left pane in the History view (see screenshot below).
- You should add notes to these pipelines explaining your findings. The notes field is located below the tag.
- Make sure your pipelines are portable, i.e., I should be able to run them on my own machine. For example, you should avoid using files stored in your local file system.
The deadline for submission is Sunday, Oct 5th, at 11:59pm.