Internet Archive Celebrates 10 Petabytes

October 28, 2012

The Internet Archive, a non-profit organization dedicated to creating a digital archive and library of Internet content, has just celebrated its collection reaching 10 petabytes (10,000,000,000,000,000, or 1.0×1016 bytes).   The collection contains approximately 150 billion historical Web pages, as well as texts, images, audio, and video.  The Internet Archive provides the Wayback Machine to allow retrieval of archived pages, as well as more general search tools.

The Internet Archive also announced the availability, for research purposes, of 80-terabytes (8.0×1013 bytes) of archived Web crawl data from 2011.  The data set characteristics are:

  • Crawl start date: 09 March, 2011
  • Crawl end date: 23 December, 2011
  • Number of captures: 2,713,676,341
  • Number of unique URLs: 2,273,840,159
  • Number of hosts: 29,032,069

Interested researchers can get in touch with the Archive to arrange access.

If you would like access to this set of crawl data, please contact us at info at archive dot org and let us know who you are and what you’re hoping to do with it.  We may not be able to say “yes” to all requests, since we’re just figuring out whether this is a good idea, but everyone will be considered.

The San Francisco Chronicle recently had a front-page profile of the Internet Archive and its founder, Brewster Kahle.


Mozilla Updates Firefox to 16.0.2

October 28, 2012

Mozilla has released a new version of its Firefox browser, 16.0.2., for Windows, Linux, and Mac OS X.  This update fixes a single security vulnerability with Location objects, which  Mozilla rates as Critical.   The Release Notes have been updated to reflect this change.

You can get a copy of the new version via the built-in update mechanism (Help / About Firefox / Check for Updates), or you can get an installation package from the downloads page.


%d bloggers like this: