Accessing Wikipedia data on S3

From VistrailsWiki
Revision as of 19:17, 12 November 2012 by Juliana (talk | contribs)
Jump to navigation Jump to search

There are 2 ways to access the Wikipedia segments:

  • By HTTP. Here are the link to 27 files:

  • Through Hadoop on EC2. Hadoop supports access to S3 directly, so anyone with an access key and secret key configured in core-sites.xml will be able to access it. For example

bin/hadoop fs -ls s3n://cs9223/enwiki-20121001/

(s3n is the s3 native filesystem)

To access these files from any machine with hadoop installed, open core-site.xml and add the following: <property>






See for more information