<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.vistrails.org//index.php?action=history&amp;feed=atom&amp;title=Big_Data_2015%3A_Final_Project</id>
	<title>Big Data 2015: Final Project - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://www.vistrails.org//index.php?action=history&amp;feed=atom&amp;title=Big_Data_2015%3A_Final_Project"/>
	<link rel="alternate" type="text/html" href="https://www.vistrails.org//index.php?title=Big_Data_2015:_Final_Project&amp;action=history"/>
	<updated>2026-04-15T06:13:37Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.36.2</generator>
	<entry>
		<id>https://www.vistrails.org//index.php?title=Big_Data_2015:_Final_Project&amp;diff=9573&amp;oldid=prev</id>
		<title>Juliana: /* Weather data */</title>
		<link rel="alternate" type="text/html" href="https://www.vistrails.org//index.php?title=Big_Data_2015:_Final_Project&amp;diff=9573&amp;oldid=prev"/>
		<updated>2015-04-20T18:28:29Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Weather data&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 18:28, 20 April 2015&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l58&quot;&gt;Line 58:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 58:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;===Weather data===&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;===Weather data===&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* http://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;-- choose &lt;/del&gt;either &amp;quot;Surface Data, Global Summary of the Day&amp;quot;, or &amp;quot;Surface Data, Hourly Global&amp;quot; for a more detailed analysis. You can choose NY state, and &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;then &lt;/del&gt;select the &amp;quot;John F Kennedy International Airport&amp;quot; station&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;.&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* http://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets  &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* http://www7.ncdc.noaa.gov/CDO/dataproduct -- select &amp;quot;Surface Data, Hourly Global&amp;quot;, and then when it comes to select &lt;/del&gt;the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;region&lt;/del&gt;, &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;choose NY and the three main stations (&lt;/del&gt;Central Park, JFK and LaGuardia).&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* http://www7.ncdc.noaa.gov/CDO/dataproduct &lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;** Select &lt;/ins&gt;either &amp;quot;Surface Data, Global Summary of the Day&amp;quot;, or &amp;quot;Surface Data, Hourly Global&amp;quot; for a more detailed analysis.  &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;** &lt;/ins&gt;You can &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;then &lt;/ins&gt;choose NY state, and select the &amp;quot;John F Kennedy International Airport&amp;quot; station &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;(or all &lt;/ins&gt;the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;stations&lt;/ins&gt;, Central Park, JFK and LaGuardia).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;===Property and Construction data===&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;===Property and Construction data===&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wikidb-vistrails_:diff::1.12:old-9572:rev-9573 --&gt;
&lt;/table&gt;</summary>
		<author><name>Juliana</name></author>
	</entry>
	<entry>
		<id>https://www.vistrails.org//index.php?title=Big_Data_2015:_Final_Project&amp;diff=9572&amp;oldid=prev</id>
		<title>Juliana: /* Weather data */</title>
		<link rel="alternate" type="text/html" href="https://www.vistrails.org//index.php?title=Big_Data_2015:_Final_Project&amp;diff=9572&amp;oldid=prev"/>
		<updated>2015-04-20T18:25:19Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Weather data&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 18:25, 20 April 2015&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l58&quot;&gt;Line 58:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 58:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;===Weather data===&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;===Weather data===&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* http://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* http://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;-- choose either &amp;quot;Surface Data, Global Summary of the Day&amp;quot;, or &amp;quot;Surface Data, Hourly Global&amp;quot; for a more detailed analysis. You can choose NY state, and then select the &amp;quot;John F Kennedy International Airport&amp;quot; station.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* http://www7.ncdc.noaa.gov/CDO/dataproduct -- select &amp;quot;Surface Data, Hourly Global&amp;quot;, and then when it comes to select the region, choose NY and the three main stations (Central Park, JFK and LaGuardia).&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* http://www7.ncdc.noaa.gov/CDO/dataproduct -- select &amp;quot;Surface Data, Hourly Global&amp;quot;, and then when it comes to select the region, choose NY and the three main stations (Central Park, JFK and LaGuardia).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wikidb-vistrails_:diff::1.12:old-9478:rev-9572 --&gt;
&lt;/table&gt;</summary>
		<author><name>Juliana</name></author>
	</entry>
	<entry>
		<id>https://www.vistrails.org//index.php?title=Big_Data_2015:_Final_Project&amp;diff=9478&amp;oldid=prev</id>
		<title>Juliana at 03:09, 6 April 2015</title>
		<link rel="alternate" type="text/html" href="https://www.vistrails.org//index.php?title=Big_Data_2015:_Final_Project&amp;diff=9478&amp;oldid=prev"/>
		<updated>2015-04-06T03:09:51Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 03:09, 6 April 2015&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l86&quot;&gt;Line 86:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 86:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* You should describe the experimental setup (cluster configuration, number of mappers/reducers, tools you used) as well as report on the performance of your approach (e.g., report the running times of the scripts) and any optimizations you applied to speed up your code.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* You should describe the experimental setup (cluster configuration, number of mappers/reducers, tools you used) as well as report on the performance of your approach (e.g., report the running times of the scripts) and any optimizations you applied to speed up your code.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* ''You should describe the individual contributions of each of the project's members.''&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* ''You should describe the individual contributions of each of the project's members.''&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;== Some Notes ==&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* For your analyses, it may be useful to have the fare and trip files merged. You already know how to do this ;-)&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* If you want to restrict your analyses to only consider trips that start or end in Manhattan, need to obtain the shape files for Manhattan and check whether the Lat/Long for the trip start (or end) are in the polygon defined by the shape files. Here are some links where you can find shape files: &lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;** NYC Neighborhoods shapefile in geojson: http://nycdata.pediacities.com/dataset?tags=neighborhoods&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;** NYC ZIP: http://nycdata.pediacities.com/dataset/nyc-zip-code-tabulation-areas&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* To avoid float precision issues, convert money amounts to cents&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* Note that datetime is local datetime (with day-light saving). You can compute the dropoff_datetime by adding pickup_datetime to trip_time_in_secs -- this will help deal with changes in time.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* Payments in cash often do not have a tip. Go figure...&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* Distances reported are in miles&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wikidb-vistrails_:diff::1.12:old-9424:rev-9478 --&gt;
&lt;/table&gt;</summary>
		<author><name>Juliana</name></author>
	</entry>
	<entry>
		<id>https://www.vistrails.org//index.php?title=Big_Data_2015:_Final_Project&amp;diff=9424&amp;oldid=prev</id>
		<title>Juliana: /* Project Mechanics */</title>
		<link rel="alternate" type="text/html" href="https://www.vistrails.org//index.php?title=Big_Data_2015:_Final_Project&amp;diff=9424&amp;oldid=prev"/>
		<updated>2015-03-26T16:31:40Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Project Mechanics&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 16:31, 26 March 2015&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l75&quot;&gt;Line 75:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 75:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* April 3: Submit the form with the information for your group. In the Google Doc, indicate your choice for the project, the data sets you will use, the tasks you will carry out,  and a proposed timeline with weekly milestones.  &lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* April 3: Submit the form with the information for your group. In the Google Doc, indicate your choice for the project, the data sets you will use, the tasks you will carry out,  and a proposed timeline with weekly milestones.  &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* April 10: Submit a status report describing any issues you encountered and updates to your initial plan.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* April 10: Submit a status report describing any issues you encountered and updates to your initial plan.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* April &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;17&lt;/del&gt;: Submit a status report with preliminary results.  &lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* April &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;20&lt;/ins&gt;: Submit a status report with preliminary results.  &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* May 11th: Final project report due.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* May 11th: Final project report due.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* May 11th,18th: Project presentation: each group will present their results to the class. Each student will grade all the presentations (except, of course, their own ;-)&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* May 11th,18th: Project presentation: each group will present their results to the class. Each student will grade all the presentations (except, of course, their own ;-)&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wikidb-vistrails_:diff::1.12:old-9423:rev-9424 --&gt;
&lt;/table&gt;</summary>
		<author><name>Juliana</name></author>
	</entry>
	<entry>
		<id>https://www.vistrails.org//index.php?title=Big_Data_2015:_Final_Project&amp;diff=9423&amp;oldid=prev</id>
		<title>Juliana at 16:29, 26 March 2015</title>
		<link rel="alternate" type="text/html" href="https://www.vistrails.org//index.php?title=Big_Data_2015:_Final_Project&amp;diff=9423&amp;oldid=prev"/>
		<updated>2015-03-26T16:29:54Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 16:29, 26 March 2015&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l49&quot;&gt;Line 49:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 49:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;*Fare data: http://chriswhong.com/wp-content/uploads/2014/06/nycTaxiFareData2013.torrent&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;*Fare data: http://chriswhong.com/wp-content/uploads/2014/06/nycTaxiFareData2013.torrent&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Taxi data 2010-2013 ===&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;=&lt;/ins&gt;==Taxi data 2010-2013 ===&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;*https://uofi.app.box.com/NYCtaxidata&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;*https://uofi.app.box.com/NYCtaxidata&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wikidb-vistrails_:diff::1.12:old-9422:rev-9423 --&gt;
&lt;/table&gt;</summary>
		<author><name>Juliana</name></author>
	</entry>
	<entry>
		<id>https://www.vistrails.org//index.php?title=Big_Data_2015:_Final_Project&amp;diff=9422&amp;oldid=prev</id>
		<title>Juliana at 16:29, 26 March 2015</title>
		<link rel="alternate" type="text/html" href="https://www.vistrails.org//index.php?title=Big_Data_2015:_Final_Project&amp;diff=9422&amp;oldid=prev"/>
		<updated>2015-03-26T16:29:12Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 16:29, 26 March 2015&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l9&quot;&gt;Line 9:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 9:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;In this project, you will create the infrastructure to automatically generate Fact Books for different years. The output could be a Web site where users can explore and compare the statistics for different years.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;In this project, you will create the infrastructure to automatically generate Fact Books for different years. The output could be a Web site where users can explore and compare the statistics for different years.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Detecting Gentrification==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;=&lt;/ins&gt;==Detecting Gentrification&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;=&lt;/ins&gt;==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Gentrification in NYC is a problem that has received substantial attention both in the media and in academia. See e.g.,  &lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Gentrification in NYC is a problem that has received substantial attention both in the media and in academia. See e.g.,  &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* http://www.nytimes.com/2015/02/25/opinion/the-gentrification-effect.html&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* http://www.nytimes.com/2015/02/25/opinion/the-gentrification-effect.html&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wikidb-vistrails_:diff::1.12:old-9421:rev-9422 --&gt;
&lt;/table&gt;</summary>
		<author><name>Juliana</name></author>
	</entry>
	<entry>
		<id>https://www.vistrails.org//index.php?title=Big_Data_2015:_Final_Project&amp;diff=9421&amp;oldid=prev</id>
		<title>Juliana: /* Task */</title>
		<link rel="alternate" type="text/html" href="https://www.vistrails.org//index.php?title=Big_Data_2015:_Final_Project&amp;diff=9421&amp;oldid=prev"/>
		<updated>2015-03-26T16:28:41Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Task&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 16:28, 26 March 2015&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l1&quot;&gt;Line 1:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Task ==  &lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Task ==  &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;You will analyze &lt;/del&gt;NYC taxi data. &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Different &lt;/del&gt;groups will use the data to study &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;different &lt;/del&gt;aspects urban life that can be detected from the taxi data and other related data sets&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;.&lt;/del&gt;.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;By now, you are already an expert on the &lt;/ins&gt;NYC taxi data. &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;For the final project, different &lt;/ins&gt;groups will use the data to study &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;various &lt;/ins&gt;aspects urban life that can be detected from the taxi data and other related data sets.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br/&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;You can select one or more of the areas below to focus your analysis on. While I provide suggestions for what you can look for, I expect you to use your creativity and go beyond those suggestions.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;You can select one or more of the areas below to focus your analysis on. While I provide suggestions for what you can look for, I expect you to use your creativity and go beyond those suggestions.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wikidb-vistrails_:diff::1.12:old-9420:rev-9421 --&gt;
&lt;/table&gt;</summary>
		<author><name>Juliana</name></author>
	</entry>
	<entry>
		<id>https://www.vistrails.org//index.php?title=Big_Data_2015:_Final_Project&amp;diff=9420&amp;oldid=prev</id>
		<title>Juliana: Created page with '== Task ==  You will analyze NYC taxi data. Different groups will use the data to study different aspects urban life that can be detected from the taxi data and other related dat…'</title>
		<link rel="alternate" type="text/html" href="https://www.vistrails.org//index.php?title=Big_Data_2015:_Final_Project&amp;diff=9420&amp;oldid=prev"/>
		<updated>2015-03-26T15:53:41Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;#039;== Task ==  You will analyze NYC taxi data. Different groups will use the data to study different aspects urban life that can be detected from the taxi data and other related dat…&amp;#039;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;== Task == &lt;br /&gt;
You will analyze NYC taxi data. Different groups will use the data to study different aspects urban life that can be detected from the taxi data and other related data sets..&lt;br /&gt;
&lt;br /&gt;
You can select one or more of the areas below to focus your analysis on. While I provide suggestions for what you can look for, I expect you to use your creativity and go beyond those suggestions.&lt;br /&gt;
Besides exploring the taxi data, you must also use at least one additional data set in your analysis.&lt;br /&gt;
&lt;br /&gt;
===Taxi Factbook===&lt;br /&gt;
Periodically, the Taxi &amp;amp; Limousine Commission releases a Fact Book (see http://www.nyc.gov/html/tlc/downloads/pdf/2014_taxicab_fact_book.pdf) where, for a given year,  they list different statistics, e.g., the total number of medallions, the average number of miles traveled per year, number of passengers, etc.&lt;br /&gt;
In this project, you will create the infrastructure to automatically generate Fact Books for different years. The output could be a Web site where users can explore and compare the statistics for different years.&lt;br /&gt;
&lt;br /&gt;
==Detecting Gentrification==&lt;br /&gt;
Gentrification in NYC is a problem that has received substantial attention both in the media and in academia. See e.g., &lt;br /&gt;
* http://www.nytimes.com/2015/02/25/opinion/the-gentrification-effect.html&lt;br /&gt;
* http://opinionator.blogs.nytimes.com/2014/12/03/a-gentrification-story/?_r=0&lt;br /&gt;
* http://www.huffingtonpost.com/news/new-york-city-gentrification&lt;br /&gt;
* http://observer.com/2015/01/gentrification-may-be-complicated-but-its-not-a-myth-and-neither-is-displacement/&lt;br /&gt;
* http://socialwork.nyu.edu/news/2014/04/28/seminar-explores-complex-micro-and-macro-level-effects-of-gentrification-on-nyc-neighborhoods.html&lt;br /&gt;
&lt;br /&gt;
Taxis can serve as sensors for economic activity in NYC, in that higher density of taxis in a region can  serve as an indication of increased activity in that region.&lt;br /&gt;
Can we combine taxi data with other data sets better detect gentrification? For example: &lt;br /&gt;
* ACRIS (sales data): https://data.cityofnewyork.us/City-Government/ACRIS-Real-Property-Master/bnx9-e6tj&lt;br /&gt;
* Multi Agency Permits (including all applications for construction activity): https://data.cityofnewyork.us/City-Government/Multi-Agency-Permits/xfyi-uyt5&lt;br /&gt;
&lt;br /&gt;
===Understanding taxi usage===&lt;br /&gt;
*Which neighborhoods are better served by taxis? How this correlates with the median household income per neighborhood ?      &lt;br /&gt;
*How does taxi density vary over time? During the day? On weekdays vs. weekends and holidays?&lt;br /&gt;
*Are there regions where the taxi density is always high (or low)? &lt;br /&gt;
*What are the most popular destinations? Do these change over time? For example, summer vs the other seasons?&lt;br /&gt;
*What are the most popular trips (source, destination)? &lt;br /&gt;
*Is the number of trips and taxis affected by weather? E.g., are there fewer cabs when it is raining/snowing?&lt;br /&gt;
&lt;br /&gt;
===Understanding taxi economics===&lt;br /&gt;
*How does revenue vary across neighborhoods and how does it correlate with the median household income in the neighborhood?&lt;br /&gt;
*How does revenue vary over time? Are the months or seasons when taxi companies make more (or less) money?&lt;br /&gt;
*How long do cab drives ride without passengers? How does this vary over time?&lt;br /&gt;
*Are revenues affected during major events? E.g., parades, presidential visits, storms&lt;br /&gt;
&lt;br /&gt;
===Understanding driver behavior===&lt;br /&gt;
*How do different drivers work? Do drivers (or group of drivers) have a preferred neighborhood (or set of neighborhoods)? What does the pickup/dropoff distribution looks like? Does this preference change over time?&lt;br /&gt;
*Are there patterns shared among different drivers? &lt;br /&gt;
*Do some drivers get higher tips on average than others? &lt;br /&gt;
*Do some drivers take longer routes than others?&lt;br /&gt;
*Can you identify patterns for drivers that have higher income?&lt;br /&gt;
*How many hours do drivers often work each day/week? Are there outliers?&lt;br /&gt;
&lt;br /&gt;
== Data sources == &lt;br /&gt;
===Taxi data 2013===&lt;br /&gt;
*Trip data: http://chriswhong.com/wp-content/uploads/2014/06/nycTaxiTripData2013.torrent&lt;br /&gt;
*Fare data: http://chriswhong.com/wp-content/uploads/2014/06/nycTaxiFareData2013.torrent&lt;br /&gt;
&lt;br /&gt;
==Taxi data 2010-2013 ===&lt;br /&gt;
*https://uofi.app.box.com/NYCtaxidata&lt;br /&gt;
&lt;br /&gt;
===Census data===&lt;br /&gt;
*Demographics: http://www.nyc.gov/html/dcp/html/census/demo_tables_2010.shtml &lt;br /&gt;
*Income information: http://www.nyc.gov/html/dcp/html/census/socio_tables.shtml&lt;br /&gt;
* Shape files for census tracts: http://www.nyc.gov/html/dcp/html/bytes/districts_download_metadata.shtml (search for &amp;quot;tract&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
===Weather data===&lt;br /&gt;
* http://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets&lt;br /&gt;
* http://www7.ncdc.noaa.gov/CDO/dataproduct -- select &amp;quot;Surface Data, Hourly Global&amp;quot;, and then when it comes to select the region, choose NY and the three main stations (Central Park, JFK and LaGuardia).&lt;br /&gt;
&lt;br /&gt;
===Property and Construction data===&lt;br /&gt;
* ACRIS (sales data): https://data.cityofnewyork.us/City-Government/ACRIS-Real-Property-Master/bnx9-e6tj&lt;br /&gt;
* Multi Agency Permits (including all applications for construction activity): https://data.cityofnewyork.us/City-Government/Multi-Agency-Permits/xfyi-uyt5&lt;br /&gt;
&lt;br /&gt;
== Project Mechanics ==&lt;br /&gt;
You should form a group with *at most 3 people*.&lt;br /&gt;
You have to use the Hadoop environment to carry out your analyses -- you can write mapreduce programs, use Pig, and any other tool that works on Hadoop. &lt;br /&gt;
Your code and scripts should be made available on GitHub and it should be '''reproducible''' -- you should include enough information so that others can reproduce what you did.&lt;br /&gt;
You will also maintain a GoogleDoc that you will share with me that describes your project, the questions you are investigating, and what you have done so far.&lt;br /&gt;
Please use this form to register your group and provide the information about your GitHub repo and Google Doc: https://docs.google.com/forms/d/1feAXUfUfgt2NgrHXf3xku3AdxPUWcfaxRv-h-cEfC1E/viewform?usp=send_form&lt;br /&gt;
&lt;br /&gt;
Here are you milestones:&lt;br /&gt;
* April 3: Submit the form with the information for your group. In the Google Doc, indicate your choice for the project, the data sets you will use, the tasks you will carry out,  and a proposed timeline with weekly milestones. &lt;br /&gt;
* April 10: Submit a status report describing any issues you encountered and updates to your initial plan.&lt;br /&gt;
* April 17: Submit a status report with preliminary results. &lt;br /&gt;
* May 11th: Final project report due.&lt;br /&gt;
* May 11th,18th: Project presentation: each group will present their results to the class. Each student will grade all the presentations (except, of course, their own ;-)&lt;br /&gt;
&lt;br /&gt;
== Project Report ==&lt;br /&gt;
You can use your Google Doc as the starting point for your report. &lt;br /&gt;
* You should describe your experience, issues you encountered (e.g., dirty data) and how you dealt with them.&lt;br /&gt;
* You should report on and explain the findings of your analysis -- the use of insightful visualizations is highly encouraged. &lt;br /&gt;
* All results in your report should be reproducible -- the code/scripts you used should be made available together with instructions on how to run them to derive the results you obtained.&lt;br /&gt;
* You should describe the experimental setup (cluster configuration, number of mappers/reducers, tools you used) as well as report on the performance of your approach (e.g., report the running times of the scripts) and any optimizations you applied to speed up your code.&lt;br /&gt;
* ''You should describe the individual contributions of each of the project's members.''&lt;/div&gt;</summary>
		<author><name>Juliana</name></author>
	</entry>
</feed>