Newton’s Manuscripts Available Online

December 12, 2011

The University of Cambridge, as part of its Cambridge Digital Library project, is making part of its substantial collection of Sir Isaac Newton’s scientific and mathematical manuscripts available online:

Cambridge University Library is pleased to present the first items in its Foundations of Science collection: a selection from the Papers of Sir Isaac Newton. The Library holds the most important and substantial collection of Newton’s scientific and mathematical manuscripts and over the next few months we intend to make most of our Newton papers available on this site.

The first installment of manuscripts includes some of Newton’s college notebooks, some early work on the calculus, early papers on optics, and Newton’s own annotated copy of the first edition of his Philosophiæ Naturalis Principia Mathematica (often called just the Principia), the work that cemented his international reputation.  The Principia is over 1,000 pages; with the other manuscripts, this segment of the library’s collection comprises about 4,000 pages.  Wired has an image gallery illustrating some sample pages from Sir Isaac’s work.

This is a great example of one of the real benefits of the Internet.  These landmark documents in the history of science can now be seen by millions of people, most of whom would never have had the opportunity in person.

Fifty Years of Catch-22

October 11, 2011

Today is the 50th anniversary of the publication of Joseph Heller’s best, and best known, novel, Catch–22.   It is a novel about an American Army Air Force bomber squadron, operating in Italy during World War II; but it is not the typical sort of war story.  Its central character is Captain Yossarian, a bombardier, who is becoming increasingly convinced that everyone, his own government included, is trying to kill him; nonetheless, he has a strong urge to keep on breathing.  Among the other notable characters are Colonel Cathcart, the unit’s commanding officer; Major Major Major Major, the squadron leader, who can be seen in his office only when he is not there; Nately, a fellow aviator, whose idealism keeps surfacing despite ample evidence against it  (Heller writes, “Nately’s mother was a Daughter of the American Revolution, and his father was a Son of a Bitch.”); and the chaplain, who is asked by Col. Cathcart to lead a prayer before each mission, but to avoid prayers about “valleys and rivers and God”.  Although Catch–22 is a very funny book, its humor is decidedly dark.  Clevinger, one of Yossarian’s friends, who tries to buck up his enthusiasm, is described thus:

Clevinger was dead.  That was the basic flaw in his philosophy.

Yossarian tries to get the medical officer to ground him because he’s crazy, after flying so many missions, and in the process discovers the eponymous “Catch”:

[Yossarian] “Can’t you ground someone who’s crazy?”

[Doc Daneeka] “Oh sure.  I have to.  There’s a rule saying that I have to ground anyone who’s crazy.  …But first he has to ask me.  That’s part of the rule.”

“And then you can ground him?”, Yossarian asked.

“No.  Then I can’t ground him.”

“You mean there’s a catch?”

“Sure there’s a catch,” Doc Daneeka replied.  “Catch–22.  Anyone who wants to get out of combat duty isn’t really crazy.”

There was only one catch, and that was Catch–22, which specified that a concern for one’s own safety in the face of dangers that were real and immediate was the process of a rational mind.

As The Economist points out in an article reviewing a Heller biography in this week’s magazine, Catch–22 was Heller’s first novel, and by common consensus his best.  He did continue to write throughout his life, but he never reached quite the same level again.

LATE in life, Joseph Heller was occasionally asked why he had never written anything else as good as “Catch-22”. “Who has?” he’d reply with a self-satisfied grin. Heller was haunted by the long shadow cast by his absurdist first novel, which has sold over 10m copies since it was published in 1961.

The book, in Heller’s original manuscript, was titled “Catch-18”, but the title was changed because Leon Uris’s Mila 18 was published the same year.

Along with many of my friends, I first read Catch–22 when I was in high school, back in the late 1960s.  (I need hardly add that it was not on the list of assigned reading.)   At a time when the Vietnam War was in full swing, the book’s theme of the senselessness and absurdity of war definitely touched a nerve.


Dead Sea Scrolls Online

September 26, 2011

Just after I finished writing the post about Princeton’s new policy of making all scholarly papers available to the public, I came across a story at the BBC News site, which reports that Google has worked together with the Israel Museum in Jerusalem to put facsimiles of some of the Dead Sea Scrolls online.   The Scrolls, originally discovered at Qumran on the northwest shore of the Dead Sea, preserve the oldest existing copies of some parts of the Hebrew Bible, as well as some other texts.  The scrolls that have been made available so far include:

  • The Temple Scroll
  • The Great Isaiah Scroll
  • The War Scroll
  • The Community Rule Scroll
  • The Commentary of Habakkuk Scroll

The online edition contains very high resolution images of the scrolls (1200 megapixel), so that users can inspect the text in detail.  Additional scrolls may be added in the future.  More details are available on the museum site.

This is another aspect of Google’s project to make more of the world’s cultural heritage available online.  I’ve written before about the Google Art project, and about some of the work done on the Google Books project.  It’s good to see some of the positive potential of the Web realized.

The Official Google Blog also has a post on this project.

National Academies Press Offers Free E-Books

June 3, 2011

The National Academies Press [NAP]  has announced that it is making all PDF editions of its books available for free download, effective yesterday, June 2.   The NAP is the publishing organization for the National Academy of Sciences, the National Academy of Engineering, the National Research Council, and the Institute of Medicine.  The list of available books currently includes  4,000+ titles, and will include most future titles.  Some older books are not covered because a PDF edition was never made; and there are a few specific publications on “Nutritional Requirements of Domestic Animals” that are not included.

The NAP has a FAQ page that covers the licensing and use of the PDF editions. The material is copyrighted, and not in the public domain; there are a few reasonable restrictions, aimed at ensuring that distributed copies are authentic.

This is another welcome step  toward making one of the hoped-for functions of the Internet materialize, by making a wide range of information generally available.

Yale to Put Public-Domain Works Online

June 2, 2011

The “Babbage” blog at The Economist site has a  post reporting  that Yale University will make available online a collection of high-resolution digital images of those works from its extensive collections that are in the public domain.

In an announcement on May 10th, the university says its libraries, museums and archives will provide free universal access to high-resolution digitisations of holdings in the public domain. A teaser in the shape of 250,000 images (in low resolution) from its central catalog of 1.5m is already available.

This is welcome news.  Yale’s libraries contain ~ 10 million books, as well as many other documents.  The University’s natural history museum has a collection of ~12 million specimens.   Yale does not yet know exactly how many of these works are in the public domain, but this step will surely make many more works available to anyone with an Internet connection.

The images themselves, being newly produced, are not in the public domain, but they will be licensed under a Creative Commons Attribution 3.0 license, which grants permission to copy, modify, and redistribute the work, as long as the original source is credited.  (This blog is licensed under a similar Creative Commons license — see the “Legal Stuff” sidebar.)

As “Babbage” points out, this step, though significant and welcome, is not the first of its kind.  Google has had an on-going project for some time to scan books and make the resulting images available.   Although there has been come controversy over just how Google should handle works that are currently protected by copyright, Google Books already has about one million  public domain titles available.  The Flickr photo sharing site has a large collection of images in the Commons, provided by a number of institutions, including the Smithsonian, the Library of Congress, and the National Library of Scotland.  Several other universities have made parts of their collections available on the Internet, and Google also has its Art Project, which I wrote about in February, that is making the collections of some of the world’s greatest art museums available online.

The article also mentions that some people have concerns that making this material easily accessible may lead to undesirable results.

Most controversially, without legal recourse museum pieces and specimens from an earlier age risk being travestied in unseemly ways.

That sort of thing — drawing a mustache on the Mona Lisa, or something even more tasteless — is bound to happen.  But it seems to me that is a very small price to pay to make our common cultural heritage available to a much wider audience.

Google Words ?

December 18, 2010

Some of you may have used Google’s Books search tool for identifying books related to a particular phrase or set of keywords.  It is part of a very ambitious project to, in effect, create a digital card catalogue for most of the world’s books.  There are three main sources for the information:

  • Books whose copyright has expired, and which are in the public domain.  Google makes the complete text of these available.
  • Books from a group of academic libraries around the world.
  • Books included by arrangement with their authors or publishers.

In the case of copyrighted material, Google will show you basic bibliographic data and perhaps some excerpts from the text, and tell you where you can buy the book, or borrow it from a library.

This week, the New York Times has an article describing a fascinating new resource that Google has made available as an outgrowth of the Books project.  It is a database of word usage, culled from approximately 5.2 million books that have been scanned and digitized by the project.

The digital storehouse, which comprises words and short phrases as well as a year-by-year count of how often they appear, represents the first time a data set of this magnitude and searching tools are at the disposal of Ph.D.’s, middle school students and anyone else who likes to spend time in front of a small screen. It consists of the 500 billion words contained in books published between 1500 and 2008 in English, French, Spanish, German, Chinese and Russian.

In other words, the database is something like a concordance of word and phrase usage, as reflected in the sample of books used. The sample contains only about one-third of the books that Google has digitized; the subset was selected from those that have the best available “metadata” (that is, information like date and place of publication) and scan quality.

The database does not contain the actual texts of the included books (that would present copyright issues); instead, Google constructed what it calls n-grams: sequences of n words found together in the text.  For example, ‘hamburger’ is a 1-gram, and ‘Jimmy Carter’ a 2-gram.

A group of researchers from Harvard, working with Google, has published an initial paper [abstract] in Science, which uses the data to try to identify cultural and linguistic trends, an endeavor the research team call culturomics.  (Science also has an overview article available online.)  For example, one phenomenon examined was the transformation of irregular verb forms (such as ‘burnt’) into regular forms (such as ‘burned’); it is interesting that the pattern of this change was often different in the US and the UK (as Shaw said, two countries divided by a common language).  The team also looked for cultural trends:

With a click you can see that “women,” in comparison with “men,” is rarely mentioned until the early 1970s, when feminism gained a foothold. The lines eventually cross paths about 1986.

The team estimates that the data set contains information from about 4% of all the books ever published.  That may not seem like much, but it’s a very respectable sample percentage for such a large underlying population.

One of the most intriguing aspects of this announcement is that Google is making the entire n-gram database available online, for viewing or downloading; there is also a Web-based query tool that you can use to explore questions of interest.

As with any new idea — especially in the social sciences — there will be some controversy about what this data means, how representative it is, possible biases in the sample selection,  and so on.  Nonetheless, having it available is of great potential value; and it is something that just could not have been done, in any practical way, without the assistance of technology.

There are also articles on this development at Wired, Technology Review, New Scientist, and Ars Technica.

Update, Saturday, 18 December, 15:05 EST

The “Law & Disorder” blog at Ars Technica has an amusing post, using the Google n-gram database to chronicle “A History of computing flamewars”.

Update, Saturday, 18 December, 17:48 EST

The “Short, Sharp Science” blog at New Scientist has results from another initial exploration of the data, including some of its warts.  Note to George W. Bush: sorry, you didn’t invent “misunderestimate”.

More on Mark Twain’s Memoirs

July 10, 2010

“It is the will of God that we must have critics, and missionaries, and Congressmen, and humorists, and we must bear the burden.”
— Mark Twain

A few weeks ago, I posted a note about the forthcoming publication of Mark Twain’s memoirs by the University of California, Berkeley, which has custody of Twain’s papers.  At Twain’s direction, the manuscript was withheld from publication for 100 years after his death in 1910.  Twain was not known for being particularly bashful about expressing his opinions, but the memoirs are apparently fairly outspoken even by his standards.

The New  York Times has an article that gives a bit more detail about the forthcoming Volume 1, to be published in November, along with a little more about the contents.   Much of the material in the first volume has been published before, but the editors say that Volumes 2 and 3 (each volume is expected to be about 600 pages) will contain a significant proportion of material that has never been published.

%d bloggers like this: