Taming Device Drivers

June 30, 2010

Device drivers are a seldom-discussed but vital component of all computer operating systems.  They are the (relatively) small bits of software “glue” that allow the system to communicate with specific devices.  They translate a general functional request from the system (such as, read the next block of a file) into the specific device commands needed to accomplish it.  (Readers who are old enough to have used a word processor, like WordPerfect, from the MS-DOS era, may remember that getting the right printer driver installed was crucial to getting good-quality output.  A driver for an HP LaserJet II just would not work with a PostScript printer, for example.)   Although most drivers are not large individually, a typical system requires quite a few of them.  It is estimated, for example, that more than 60% of the code in a typical Linux installation is device drivers.

Device drivers are also a pain in the neck, for several reasons.  Device manufacturers know that they will not be able to sell their whizzy new gadgets unless they also provide drivers for popular operating environments (meaning in practice, Microsoft Windows, and to some extent Mac OS X).  Often, the development of these drivers is outsourced, and the goal is to produce something quickly that more-or-less works.  Drivers for Linux are often not provided by the device manufacturer; sometimes, the manufacturer will not even provide the interface specifications, which have to be reverse-engineered.  (The fact that many drivers have been developed in just this way is a tribute to the ingenuity of their open-source developers.)  Regardless of how they are developed, though, the drivers are largely invisible to the user (at least until they fail), and are not really the focus of anyone’s attention.  Software and device vendors tend to focus on product features, and users generally install whatever drivers are provided without much thought.

This is unfortunate, because device drivers have a very significant capacity to cause trouble.  They typically run as extensions of the core operating system, at the highest execution privilege level.  This means that bugs in a driver can crash the entire system, and that security vulnerabilities in a driver can lead to a complete breakdown of system security.  (Even back in the days of the IBM System/360 mainframes, the hacker’s preferred route to attack system security was via device drivers.)    Microsoft researchers have estimated that ~85% of Windows crashes are caused by device driver problems.  Examination of the code in Linux device drivers, compared to the code in the Linux kernel, indicates an error rate in the driver code that is 3-7 times that of the kernel code.

At the recent USENIX Annual Technical Conference in Boston, Vitaly Chipounov, of the École Polytechnique Fédérale de Lausanne (EPFL) in  Switzerland, presented a paper [PDF] describing a new tool that he and his colleagues have developed for testing device drivers.   The tool, called DDT, is particularly interesting because it allows testing of binary drivers for which the source code is unavailable. It does this by executing the driver in a virtual machine environment, using a specialized symbolic execution engine to exercise the driver thoroughly.  When it finds suspect results, it provides an execution trace of how things went wrong.

The team tested six Microsoft-certified device drivers for Windows, from four different vendors, with code segments ranging in size from 14 to 120 KB, and containing from 48 to 525 functions.  They discovered 14 previously unreported serious bugs in these drivers.  (I was interested to see that one of the common causes of driver bugs was the driver’s failure to check error status when a kernel function call returned.  This is completely consistent with my personal observation of software development: some programmers seem to have difficulty understanding that error return codes exist for a reason.)

The Technology Review has a short article with an overview of this work.  The tool, at present, is basically a prototype, which currently works only for Windows drivers.  However, the concept seems sound; and getting better quality drivers would make everyone’s life easier.


Adobe Fixes Reader, Acrobat

June 29, 2010

Earlier this month, I posted a note about a newly-discovered critical security vulnerability that affected Adobe Systems’ Flash player, and its Reader and Acrobat products.  That flaw was soon exploited by the Bad Guys, although there were mitigation steps available, as I mentioned in that earlier post.  Adobe released a new version of Flash Player, which fixed the problem, on June 10, and promised a fix for Reader and Acrobat by the end of this month.

Adobe has now released updates for Reader and Acrobat, to version 9.3.3, which fixes the specific vulnerability discovered earlier this month, as well as a number of other security flaws.  More information about the fixes can be found in the Adobe Security Bulletin APSB10-15.   The update for Reader can be obtained through the built-in update mechanism (Help / Check for Updates); alternatively, you can download the updates packages directly for Windows or Mac OS X.  (Note that these are update packages, not complete new  versions.)   Linux users can download a package for the new version here.

(The Linux download link still seems to be pointing to version 9.3.2.  You can get the 9.3.3 packages via FTP by following this link.  The directory you want, for the English versions, is: ftp://ftp.adobe.com/pub/adobe/reader/unix/9.x/9.3.3/enu/ . Debian and RPM packages, and tarballs for Solaris, are available.)

For Acrobat updates, please check the instructions in the Security Bulletin.

Because at least one of the vulnerabilities fixed in this version is being actively exploited, I recommend that you apply the update as soon as possible.

Update, Tuesday, 29 June, 21:35 EDT

The download link for the Linux version has been corrected in the Security Bulletin, and now points to the correct, 9.3.3 version.

Update, Tuesday, 29 June, 22:25 EDT

Adobe has some additional information about this update in a post by Steve Gottwals on the Adobe Reader Blog.


Speedy Trading, Revisited

June 28, 2010

Back in early May, I posted a note about the unusual wide swings in US stock prices that took place on Thursday, May 6.  (For example, the value of the Dow-Jones Industrial Average fell by more than 1,000 points within a 15-minute interval.)   There was considerable speculation that the relatively new phenomenon of high-frequency trading was somehow implicated in the odd behavior of the markets, although there was little hard evidence available.  I also observed that some aspects of the incident reminded me of a market crash in 1987,  when the price and trade reporting systems at times simply could not keep pace with market activity.

I have just finished  reading a very interesting analysis of the events of May 6, conducted by Nanex, a firm that specializes in systems and software to distribute and process market data.  Because of their business, they have a particularly good view of activity in the overall market:

Our business is supplying a real-time data feed comprising trade and quote data for all US equity, option, and futures exchanges.

The activity in these markets today is enormous, compared even to 20 years ago, because of the proliferation of electronic trading systems.  Nanex claims, and I have no reason to doubt them, that the quote activity can at times exceed 1 million quotes per second.  (Back around 1990, when I was involved in the development of a system to distribute digital market data, the peak rate observed was something like 500 updates per second.)   Current price quotes from the various equity exchanges and electronic trading venues are aggregated, so that at any time, there is a National Best Bid and Offer [NBBO]  quote: that is, the highest price [bid] anyone is willing to pay for a share of XYZ Corp, and the lowest price [ask] at which anyone is willing to sell.   Trading will take place until the bid is less than the ask.

The Nanex analysis [text summary] points to two key factors that may have exacerbated the market’s volatility:

  • Delays in propagating some quotes that were not visible to all market participants
  • Apparent attempts to “clog” the quote reporting system by generating large numbers of spurious updates

The first problem, according to the Nanex analysis, affected primarily quotes from the New York Stock Exchange [NYSE].  It appears that quotes generated at the NYSE are put in a processing queue to be transmitted to the national system.  Sometimes there is a transmission backlog, but this is not easily detected by market participants, because the quotes are only time-stamped when they are actually transmitted (that is, when they exit the queue).  Thus, the apparent prices posted by the NYSE were actually older than the current state of the market [the emphasis is in the original]:

In summary, quotes from NYSE began to queue, but because they were time stamped after exiting the queue, the delay was undetectable to systems processing those quotes. On 05/06/2010 the delay was enough to cause the NYSE bid to be just slightly higher than the lowest offer price from competing exchanges, but small enough that is was difficult to detect. …  This caused sell order flow to route to NYSE — thus removing any buying power that existed on other exchanges. When these sell orders arrived at NYSE, the actual bid price was lower because new lower quotes were still waiting to exit a queue for dissemination.

In other words, because of the delay in processing the NYSE quotes, the posted figures did not reflect the current (down) state of the market; and this was not evident to market participants, because of the way the quotes were time-stamped.

Nanex very sensibly recommends that the rules should require all quotes to be time-stamped when they are originated, rather than when they are transmitted (although having both time stamps would be even better).  This would make processing delays visible to other market participants, and help prevent order disruptions caused by automated systems “chasing” what appears to be an attractive price, but which in fact is out of date.

The second problem identified by Nanex is more disturbing.  It appears that, during the May 6 incident, there were certain market makers that generated a very high volume of quote updates for individual stocks — updates that in some cases appear rather suspicious.

During May 6, there were hundreds of times that a single stock had over 1,000 quotes from one exchange in a single second. Even more disturbing, there doesn’t seem to be any economic justification for this. In many of the cases, the bid/offer is well outside the National Best Bid/Offer (NBBO)

The fact that in some cases the prices quoted were not close to the current NBBO is suspicious; there is really no reason to post or update a quote that has essentially no chance of generating a trade.  As Nanex observes, these might have been caused by a programming error, or by malicious software; but they might also represent a deliberate attempt to “clog” the quote reporting system.  Their hypothesis is that it may be an attempt to gain a transient advantage for a high-frequency trading system.  In a business where microseconds matter, someone might generate a large number of quote updates that his competitors would have to process (thus consuming time), but which he, having generated them, could ignore.  Although I  think it would be exceedingly hard to prove that something like this was going on, it would not surprise me in the least.  Having worked in the investment banking world for many years, I think it is generally a safe assumption that, if it is possible to “game” a trading system, someone will try it.

Nanex proposes another rule change to address this: a specified interval (they suggest 50 milliseconds) must elapse before a quote can be updated, unless the quote is “hit” (that is, has a trade executed against it).   As they point out, 50 ms is approximately the minimum round-trip time for messages to travel from New York to California, or vice versa, so it is hard to see how this could legitimately handicap anyone.

All of this is fascinating stuff, and I urge you to have a look at the Nanex report if you are interested in securities markets.  It may be another piece of evidence that we are able to increase the complexity of our systems faster than we can increase our ability to understand and control them.

Update Tuesday, 29 June, 6:55 EDT

I forgot to mention that the SEC and CFTC have issued a preliminary report [PDF, 151 pp], dated May 18, 2010, on their analysis of the May 6 incident.

I also want to thank Nanex for their work in doing the analysis that I discussed above, and for making it available.  Although they obviously have an interest in having the markets work well, it is a pleasure to see that they are being responsible participants.


Firefox 3.6.6 Released

June 27, 2010

Mozilla has released a new version of the Firefox browser, version 3.6.6.   This is a bug-fix update of Firefox 3.6.4, which was released last week.  The new version provides a little more time for a plugin (such as Flash) to respond before deciding that the plugin has crashed.  Apparently, the original version was a little too aggressive on heavily loaded systems.   Further information is in the Release Notes; installation binaries for all platforms (Linux, Max OS X, and Windows) can be downloaded here.

Although it is always a good idea to keep your software reasonably current, I don’t see any reason to be in a panic to install this update unless you are experiencing plugin-related problems.


Trusted Identity Strategy Proposed

June 26, 2010

In a posting on the White House blog yesterday, Howard Schmidt, the President’s Cybersecurity Coodinator, announced a new proposal to establish a new online environment, the Identity Ecosystem, that would provide a robust method of identifying individuals and organizations on the Internet.

Today, I am pleased to announce the latest step in moving our Nation forward in securing our cyberspace with the release of the draft National Strategy for Trusted Identities in Cyberspace (NSTIC).  This first draft of NSTIC was developed in collaboration with key government agencies, business leaders and privacy advocates. What has emerged is a blueprint to reduce cybersecurity vulnerabilities and improve online privacy protections through the use of trusted digital identities.

The idea of creating a uniform identity credential, rather than the current hodge-podge of user IDs and passwords for various Web facilities, is not a new one, of course.  Systems such as Microsoft’s more or less abortive Passport system, and projects like OpenID, have many of the same objectives.  (They do not, of course, have the government’s imprimatur.)  Any endeavor like this has to sort out a very complicated tangle of trust and privacy issues, in addition to getting the security right.

To that end, as Mr. Schmidt says in his announcement, a draft version of the National Strategy for Trusted Identities in Cyberspace has been published; the actual draft document [PDF, 39 pp] can be downloaded here.

(The document is being hosted at the Web site of IdeaScale, an “idea and innovation management” firm.  The site has a provision for entering comments, which requires you to register.  Unfortunately, I have not yet been able to get the registration process to work.  I’ll update this if I learn more.)

As with many security projects, the devil is likely to be in the details.  I have a copy of the draft report; the Executive Summary is long on ideals and short on details.  I’ll post a follow-up note when I’ve had a chance to read it all.

Update Monday, 28 June, 15:35 EDT

The registration function at the IdeaScale project site seems to have been fixed; I was able to register successfully today.


Mozilla Thunderbird 3.1 Released

June 25, 2010

The folks over the  Mozilla Messaging project have been busy lately.  Just about a week after making the bug fix release 3.0.5, they have released a new version 3.1 of the Thunderbird E-mail client.  This version incorporates a number of new features and improvements:

  • Faster searching and filtering
  • A new Migration Assistant for those switching from other E-mail programs
  • A Saved Files manager to keep track of attachments
  • An improved Setup Wizard, which has an internal data base of settings for popular mail providers, so the user doesn’t have to enter them all by hand.

Further details about the changes are in the Release Notes.  The installation binaries for all platforms (Mac OS X, Linux, and Windows) , in many languages, can be downloaded here.


Another Sleep Aid

June 24, 2010

Last week, I posted a note about a new sleep management system, developed by Microsoft Research, which provides a mechanism to use the sleep state, common in modern PCs, to save energy in a managed way on a network.  Now there is a report on the PhysOrg.com site about another approach to tackling the same problem.  The SleepServer system, developed by computer scientists at the University of California, San Diego, has many features similar to the Microsoft system I discussed before; yet there are some key differences that make it worth examining.  SleepServer is described in a paper [PDF download] to be presented at the USENIX Annual Technical Conference in Boston this week.

Like the Microsoft Research system, SleepServer uses a sleep proxy, running on a server, to stand in for each sleeping client machine.  The overall SleepServer system is managed by the SSR-Controller application, running on the server.  Each managed client machine has a SSR-Client application running, which does two things: it keeps the SSR-Controller informed of the client’s current “network state” (e.g., on which ports it is listening), and it informs the SSR-Controller when the client is about to enter the sleep state.   When the client goes to sleep, the SSR-Controller sets up the sleep proxy to stand in for it, using gratuitous ARP probes to redirect traffic to the client’s IP address to the proxy.

So far, this is almost the same as the Microsoft system.  What makes SleepServer different is that the proxy is not just an application that processes network requests; it is a virtual machine image of the client, running under a hypervisor (virtual machine monitor), such as Xen.  This potentially is a more powerful solution, since the images can obviously reflect idiosyncratic characteristics of individual clients; the images can also incorporate what the authors call “stub applications” — minimal but functional versions of real applications running on the client.  (For example, a long-running data transfer might continue to run in the virtual machine, using only the data communications “core” of the application, without any user interface.)   Potentially, this approach provides more flexibility, if the network’s owner is willing to expend some effort in customization.  It also is more suitable for use in a mixed-platform environment, since network profiles and responses can be tailored at the client level.

The authors claim that their tests demonstrate that the SleepServer system can achieve significant energy savings in a real network environment.

We detail results from our experience in deploying SleepServer in a medium scale enterprise with a sample set of thirty machines instrumented to provide accurate real-time measurements of energy consumption. Our measurements show significant energy savings for PCs ranging from 60%-80%, depending on their use model.

Probably there is no “one size fits all” approach to managing network “sleep” that applies to all environments; but it is good to see that some thoughtful work is being done on the problem.

We detail results from our experience in deploying SleepServer in a medium scale enterprise with a sample set of thirty machines instrumented to provide accurate real-time measurements of energy consumption. Our measurements show significant energy savings for PCs ranging from 60%-80%, depending on their use model.

%d bloggers like this: