Oldest Bible Online

July 7, 2009

Back in early May, I posted a note, “Blasts from the Past”, about a number of projects to digitize old books, manuscripts, and so on, in order to make them available to a much wider audience via the Internet. The BBC News site has a story on the online launch of one of these projects, which will make available about 800 pages of the Codex Sinaiticus, the earliest known version of the Bible from the Christian era, dating back to the fourth century AD.  The British Library is holding an exhibition of documents and related artifacts to mark the event.

The Codex Sinaiticus was uncovered in 1844, after laying undisturbed in a monastery in the Sinai peninsula for about 1,500 years.  The arid desert conditions are thought to have played an important role in preserving the manuscript.   The original manuscript consisted of about 1,460 pages, each 40×35 cm.   It contains the earliest known complete version of what is now called the New Testament, written in koine Greek, the vernacular Greek of the period, and the Old Testament in the Greek translation known as the Septuagint.  The entire manuscript contains many annotations and emendations by contemporary scribes; thus, it is believed to be of great value in discerning how the accepted form of the text developed.

The publication of the manuscript, and associated research, is a joint project of four institutions: the British Library, the National Library of Russia, St. Catherine’s Monastery in the Sinai, and the Leipzig University Library.   Each of these institutions holds some part of the complete Codex.  The project Web site has images of the original manuscript, with transcription and translation (in English, German, Greek, and Russian).

Dr Scot McKendrick, head of Western manuscripts at the British Library, said the wide availability of the document presented many research opportunities.    …

“The availability of the virtual manuscript for study by scholars around the world creates opportunities for collaborative research that would not have been possible just a few years ago.”

Having been around not quite as long as the Codex, but long enough to remember the beginnings of the Internet and the World Wide Web, I’m pleased to see that some of the possibilities for cooperation and greater accessibility of information that were so appealing are being realized.


Pick a Number

July 7, 2009

Today’s Washington Post has a story by Brian Krebs (who also writes the “Security Fix” blog) about some research done at Carnegie Mellon University on the possibility of guessing a person’s Social Security number.  They used public information on how Social Security numbers [SSNs] are assigned:

The Social Security number’s first three digits — called the “area number” — is issued according to the Zip code of the mailing address provided in the application form. The fourth and fifth digits — known as the “group number” — transition slowly, and often remain constant over several years for a given region. The last four digits are assigned sequentially.

(It was pretty common in the early days of building data bases to use identifiers that encoded some information, just as the telephone Area Code originally specified a geographic area.  We now have learned that this is usually a Bad Idea, but then Social Security was started in the late 1930s.)

They also used a data base that I had not heard of before, the Social Security Administration’s rather grimly named “Death Master File”.  This apparently contains names, SSNs, state, and dates of birth and death for everyone who had a SSN and is deceased (to the knowledge of the Social Security Administration).

The researchers, Alessandro Aquisti and Ralph Gross, found that, by using this information and an individual’s place and date of birth, they could get a good start on discovering someone’s SSN:

The two tested their hunch using the Death Master File of people who died between 1972 and 2003, and found that on the first try they could correctly guess the first five digits of the SSN for 44 percent of deceased people who were born after 1988, and for 7 percent of those born between 1973 and 1988.

Their success rate was materially better for people born after 1988:

Acquisti and Gross found that it was far easier to predict SSNs for people born after 1988, when the Social Security Administration began an effort to ensure that U.S. newborns obtained their SSNs shortly after birth.

They were able to identify all nine digits for 8.5 percent of people born after 1988 in fewer than 1,000 attempts. For people born recently in smaller states, researchers sometimes needed just 10 or fewer attempts to predict all nine digits.

Now, a thousand tries may seem like a lot, but there are lots of Internet sites that allow on-line credit applications; it is not much of a stretch to imagine an enterprising crook writing a small computer program to automate the probing process – and then deploying it using a “botnet” of compromised PCs.  As Krebs points out in his blog post, some sites do not even require all nine digits to be correct, to make life easier despite data base errors.

There will probably be some reaction to the effect that the process of assigning numbers needs to be changed.  That entirely misses the point: the SSN was only supposed to be an account number for keeping track of Social Security taxes.  My original Social Security card (yes, I still have it) says across the front, “Not to be Used for Identification”.  Unfortunately, financial services firms and others more or less appropriated the SSN for an authentication role it was never meant to play.  Undoubtedly, it was easier than devising a new method: virtually every working person had a number, and all you needed to do was put a 9-digit field in your data base.  And, as is so often the case, the people and organizations responsible for designing the data bases and selling them for commercial purposes don’t bear the direct cause of the fraud that this sloppy design enables.

Perhaps it will be possible at some point to convince policy makers to do something about this:

Ross Anderson, a professor of security engineering at Cambridge University, said the findings suggest that businesses using SSNs as a password are being negligent, and should find other ways of verifying the claims to identity that are being made by their customers.

I’m personally not holding my breath.

The complete study is available for free download at the Proceedings of the National Academy of Science web site.  The authors have put together a FAQ that covers the substance of their results.  Perhaps the most important lesson one can draw is summarized there:

More broadly, our findings highlight the unexpected consequences of the interaction of multiple data sources in modern information economies. They show how non-sensitive personal data (such as information people reveal about themselves online) can be combined with other data sources, also non-sensitive, leading to the inference of much more sensitive information.

The fact that so much data is now available on the Internet has significantly reduced the effort involved in finding out a great deal of information about a person that heretofore would have been scattered around in various paper files.  I don’t think we as a society have really come to grips with this yet.


Active Exploit of Internet Explorer

July 7, 2009

Brian Krebs, in his “Security Fix” blog at the Washington Post has an article describing a newly-discovered ecurity vulnerability in Microsoft’s  Internet Explorer running on Windows XP or Windows Server 2003.  The problem is in a dynamic link library (.dll file) used by Internet Explorer in processing video content.  The vulnerability is serious, and can be used to attack any system that visits a compromised Web site.  The SANS Institute is reporting that a large number of otherwise legitimate Web sites have been compromised with infected files.

According to the Microsoft Security Advisory, Windows Vista and Server 2008 are not affected.  Microsoft has not yet released a patch for the affected software, but there is a work-around which will disable the dangerous video control.  The manual work-around, which requires editing the Registry (not for the faint-of-heart or the ten-thumbed), is in the “Suggested Actions / Workarounds” section of the advisory.  Alternatively, you can download a small installer file that does the same thing (as well as an uninstaller).  According to Microsoft, functions other than video playback should not be affected.   The Security Advisory has more details.

Because this seems to be spreading, and because code to exploit the vulnerability is publicly available, I recommend implementing the work-around as soon as possible.


%d bloggers like this: