Google Expands Unsafe Site Reporting

June 26, 2013

For some time now, Google has published its Transparency Report, which gives a high-level overview of how Google relates to events in the world at large. The report has historically included several sections:

  • Traffic to Google services (current and historical, highlighting disruptions)
  • Information removal requests (by copyright holders and governments)
  • Requests for user data (by governments)

This information can be interesting in light of current events. For example, at this writing, Google reports ongoing disruptions to their services in Pakistan, China, Morocco, Tajikistan,Turkey, and Iran.

Now, according to a post on the Official Google Blog, a new section will be added to the Transparency Report. The report is an outgrowth of work begun in 2006 with Google’s Safe Browsing Initiative.

So today we’re launching a new section on our Transparency Report that will shed more light on the sources of malware and phishing attacks. You can now learn how many people see Safe Browsing warnings each week, where malicious sites are hosted around the world, how quickly websites become reinfected after their owners clean malware from their sites, and other tidbits we’ve surfaced.

Google says that they flag about 10,000 sites per day for potentially malicious content. Many of there are legitimate sites that have been compromised in some way. The “Safe Browsing” section of the Transparency Report shows the number of unsafe sites detected per week, as well as the average time required to fix them.

Google, because its search engine “crawlers” cover so much of the Web, has an overview of what’s out there that few organizations can match. I think they are to be commended for making this information available.


A Tastier Selection of Cookies

June 24, 2013

I’ve written here a number of times about browser cookies: small pieces of text that your browser stores on your system at the request of a Web server.  The cookie’s contents can be returned to the server with a later HTTP request.  The cookie mechanism was developed to provide a means of maintaining state information in the otherwise stateless HTTP protocol, which deals only in page requests and responses; the concept of logging in to a Web site, or having a session, is grafted onto the underlying protocol via the cookie mechanism.  This can lead to some security problems; it also impacts users’ privacy, since cookies are very widely used to track users as they browse to different sites.  (For example, those ubiquitous “Like” buttons from Facebook can set tracking cookies in your browser, even if you never visit the Facebook site itself.)

For some time now, several browsers have offered an option to disallow so-called “third party” cookies: those set by sites other than the one you are visiting.  And  Apple’s Safari browser, as well as development builds of Mozilla’s Firefox, have included heuristics to accomplish something similar.  These are helpful, but imperfect, since the definition of a “third party” is not as precise as one might like.  For example, XYZ.COM might have a companion domain for videos, XYZ-MEDIA.COM; logically, both are part of the same site, but simple heuristics won’t see things that way.

Now, according to an article at Ars Technica, Stanford University, along with the browser makers Mozilla and Opera Software, is establishing a Cookie Clearinghouse to serve as a sort of central cookie  rating agency.

The Cookie Clearinghouse intends to provide lists of cookies that should be blocked or accepted. Still in the planning stages, it will be designed to work in concert with the heuristics found in Firefox in order to correct the errors that the algorithmic approach makes.

The Clearinghouse is just being set up, so it’s too early to say how much it will help.  Similar cooperative efforts have helped reduce the impact of spam, phishing, and malicious Web sites, though, so we should hope for the best.


China’s New Supercomputer is Tops

June 17, 2013

Today, at the opening session of the 2013 International Supercomputing Conference in Leipzig, Germany, the latest update (the 41st version) to the Top 500 list of supercomputers was announced, and a new Chinese system, the Tianhe-2, has taken first place honors.  The system achieved performance of 33.86 petaflops per second (3.386 × 1016 floating point operations per second) on the LINPACK benchmark; the Tianhe-2 (in English, Milky Way-2) will be deployed at the National Supercomputer Center in Guangzho, China, by the end of the year. The system has 16,000 nodes, each with multiple Intel processors, for a total of 3,120,000 processor cores.

The Titan system at Oak Ridge National Laboratory, ranked number 1 in the November, 2012 list, and the Sequoia system, at the Lawrence Livermore National Laboratory, previously ranked number 2, have both moved down one place, to number 2 and number 3, respectively.  The two system are still noteworthy as being among the most energy-efficient in use.  Titan delivers 2,143 Megaflops/Watt, and Sequoia 2,031.6 Megaflops/Watt.

The total capacity of the list has continued to grow quickly.

The last system on the newest list was listed at position 322 in the previous TOP500 just six months ago.  The total combined performance of all 500 systems has grown to 223 petaflop/sec, compared to 162 petaflop/sec six months ago and 123 petaflop/sec one year ago.

You can view the top ten systems, and the complete list, at this page.


Password Angst Again, Part 1

June 1, 2013

I’ve written here on several occasions about the problems of passwords as a user authentication mechanism, especially as the sole authentication mechanism.  When confronted with the necessity of choosing a password, many users make eminently lousy choices.  Examination of some actual password lists that have been hacked reveals a large number of passwords like ‘password’, ‘123456’, ‘qwerty123′, and the like.  Many thousands of words have been written in attempts to teach users how to choose “good” passwords.  Many Web sites and enterprises have password policies that impose requirements on users’ passwords; for example, “must contain a number”, or “must have both lower- and upper-case letters”.  It is not clear that these help all that much; if they are too cumbersome, they are likely to be circumvented.

The Official Google Blog has a recent post on the topic of password security, which contains (mostly) some very good advice.  The main suggestions are:

  •  Use a different password for each important service  This is a very important point.  Many people use the same password for multiple Web sites or services.  This is a Real Bad Idea for important accounts: online banking or shopping, sites that have sensitive data, or E-mail. It’s really essential that your E-mail account(s) be secure; the “I forgot my password” recovery for most sites includes sending you a new access token by E-mail.  If the Bad Guys can get all your E-mail, you’re hosed.
  • Make your password hard to guess Don’t pick obviously dumb things like ‘password’.  Avoid ordinary words, family names, common phrases, and anything else that would be easy to guess.  The best choice is a long, truly random character string.  Giving specific rules or patterns for passwords is not a good idea; paradoxically, these can have the effect of making the search for passwords easier.  (I’ll have more to say about this in a follow-up post.)
  • Keep your password somewhere safe Often, people are exhorted never to write their passwords down.  This is one of those suggestions that can actually be counter-productive.  If having to remember a large number of passwords is too difficult, the user is likely to re-use passwords for multiple accounts, or choose simple, easily guessed passwords.  It’s better to use good passwords, and write them down, as long as you keep in mind Mark Twain’s advice: “Put all your eggs in one basket, and watch that basket!”† You could, for example, write passwords on a piece of paper you keep in your wallet.  Most of us have some practice in keeping pieces of paper in our wallets secure.
  • Set a recovery option  If you can, provide a recovery option other than the so-called “secret questions” that many sites use.  A non-Internet option, like a cell phone number, is good because it’s less likely to be compromised by a computer hack.

All of this is good advice (and Google has been giving it for some time). There is also a short video included in Google’s blog post that gives advice on choosing a good password, but part of that advice is a bit troubling. The video starts off by saying, very sensibly, that one should not choose dictionary words or keyboard sequences (like ‘qwerty’).  It goes on to recommend starting with a phrase (in itself, OK), and then modifying it with special characters.  The example used starts with the phrase:

ilovesandwiches

and turns it into:

ilove$@nDwich3s

The problem with this is that this sort of substitution (sometimes called ‘133t speak’) is very well known.  There are password cracking tools that try substitutions like this automatically.  More generally, you don’t want to introduce any kind of predictable pattern into your password choices, even if it’s one that you, personally, have not used before.  Hackers can analyze those lists of leaked passwords, too.  Avoiding predictability is harder than it sounds; I’ll talk more about that in a follow-up post.

——

† from Pudd’nhead Wilson, Chapter 15


A Supercomputer ARM Race?

May 28, 2013

The PC World site has a report of an interesting presentation made at the EDAworkshop13 in Dresden, Germany, this month, on possible future trends in the high-performance computing [HPC] market.  The work, by a team of researchers from the Barcelona Supercomputing Center in Spain, suggests that we may soon see a shift in HPC architecture, away from the commodity x86 chips common today, and toward the simpler processors (e.g., those from ARM) used in smart phones and other mobile devices.

Looking at historical trends and performance benchmarks, a team of researchers in Spain have concluded that smartphone chips could one day replace the more expensive and power-hungry x86 processors used in most of the world’s top supercomputers.

The presentation material is available here [PDF].  (Although PC World calls it “a paper”, it is a set of presentation slides.)

As the team points out, significant architectural shifts have occurred before in the HPC market.  Originally, most supercomputers employed special purpose vector processors, which could operate on multiple data items simultaneously.  (The machines built by Cray Research are prime examples of this approach.)  The first Top 500 list, published in June 1993, was dominated by vector architectures  — notice how many systems are from Cray, or from Thinking Machines, another vendor of similar systems.  These systems tended to be voracious consumers of electricity; many of them required special facilities, like cooling with chilled water.

Within a few years, though, the approach had begun to change.  A lively market had developed in personal UNIX workstations, using RISC processors, provided by vendors such as Sun Microsystems, IBM, and HP.   (In the early 1990s, our firm, and many others in the financial industry, used these machines extensively.)  The resulting availability of commodity CPUs made building HPC system using those processors economically attractive.  They were not quite as fast as the vector processors, but they were a lot cheaper.  Slightly later on, a similar transition, also motivated by economics, took place away from RISC processors and toward the x86 processors used in the by-then ubiquitous PC.

Top 500 Architectures

Top 500 Processor Architectures

The researchers point out that current mobile processors have some limitations for this new role:

  • The CPUs are mostly 32-bit designs, limiting the amount of usable memory
  • Most lack support for error-correcting memory
  • Most use non-standard I/O interfaces
  • Their thermal engineering does not necessarily accommodate continuous full-power operation

But, as they also point out, these are implementation decisions made for business reasons, not insurmountable technical problems.  They predict that newer designs will be offered that will remove these limitations.

This seems to me a reasonable prediction. Using more simple components in parallel has often been a sensible alternative to more powerful, complex systems.  Even back in the RISC workstation days, in the early 1990s, we were running large simulation problems at night, using our network of 100+ Sun workstations as a massively parallel computer.  The trend in the Top 500 lists is clear; we have even seen a small supercomputer built using Raspberry Pi computers and Legos.  Nature seems to favor this approach, too; our individual neurons are not particularly powerful, but we have a lot of them.


Weather Forecasts: Improving

May 25, 2013

Although there are a lot of different sources from which you can get a weather forecast, those forecasts all come from one of a few sources: national weather services that run large, numerical weather prediction models on their computer systems.  Two of the major suppliers are the US National Weather Service’s [NWS]  Global Forecasting System [GFS] (the source for most US forecasts), and the European Centre for Medium-Range Weather Forecasts [ECMWF], located in Reading, England.  Over the last few years, there has been a growing feeling that the US effort was not keeping up with the progress being made at ECMWF.  The criticism became considerably more pointed in the aftermath of last year’s Hurricane Sandy.  Initial forecasts from the GFS projected that the storm would head away from the US East Coast into the open Atlantic.  The ECMWF models correctly predicted that Sandy would make a left turn, and strike the coast in the New Jersey / New York region.

According to a story in Monday’s Washington Post, and a post on the paper’s “Capital Weather Gang” blog, at least one good thng will come out of this rather embarrassing forecasting error.  It’s anticipated that the NWS will get additional appropriated funds to allow the computers and the models they run to be updated.

Congress has approved large parts of NOAA’s spending plan under the Disaster Relief Appropriations Act of 2013 that will direct $23.7 million (or $25 million before sequestration), a “Sandy supplemental,” to the NWS for forecasting equipment and computer infrastructure.

This should go a long way toward addressing one of the most pressing needs for the GFS: more computing horsepower.

Computer power is vital to modern weather forecasting, most of which is done using mathematical models of the Earth’s climatic systems.  These models various weather features, such as winds, heat transfer, solar radiation, and relative humidity, using a system of partial differential equations.  (A fundamental set of these is called the primitive equations.)  The equations typically describe functions that are very far from linear; also, except for a few special cases, the equations do not have analytic solutions, but must be solved by numerical methods.

The standard techniques for numerical solution of equations of this type involves approximating the differential equations with difference equations on a grid of points.  This is somewhat analogous to approximating a curve by using a number of line segments; as we increase the number of segments and decrease their length, the approximation gets closer to the true value.  Similarly, in weather models, increasing the resolution of the grid (that is, decreasing the distance between points) allows better modeling of smaller-scale phenomena.  But increasing the resolution means that correspondingly more data must be processed and more sets of equations solved, all of which takes computer power.  Numerical weather prediction , although it had been worked on for some years, really only began to be practical in the 1950s, with the advent of digital computers, and the early weather models had to incorporate sizable simplifications to be at all practical.  (It is not too useful to have a forecasting model, no matter how accurate, that requires more than 24 hours to produce a forecast for tomorrow.)

The computation problem is made worse by the problems inherent in data acquisition.  For this type of numerical analysis, the three-dimensional grid would ideally consist of evenly spaced points, covering the surface of the Earth and extending upwards into the atmosphere.  Clearly, this ideal is unlikely to be achieved in practice; getting observations from the center of Antarctica, or the mid-Pacific Ocean, is not terribly convenient.  There are also ordinary measurement errors to deal with, of course.  This means that a good deal of data pre-processing and massaging is requied, in addition to running the model itself, adding even more to the computing resources needed.

Many observers point to the GFS’s limited computer power as one of the chief weaknesses in the US effort.  (For example, see this blog post by Cliff Mass, Professor of Atmospheric Sciences at the University of Washington, or this post by Richard Rood, Professor at the University of Minnesota in the Department of Atmospheric, Oceanic and Space Sciences.)   The processing speed of the current GFS system is rated at 213 teraflops (1 teraflop = 1 × 10¹² floating point operations per second); the current ECMWF system is rated at 754 teraflops (and is listed as number 38 in the most recent Top 500 supercomputer list, released in November 2012 — the GFS system does not make the top 100).

The projected improvements to the GFS system will raise its capacity to approximately 2600 teraflops; in terms of the most recent Top 500 list, that would put it between 8th and 9th places.  (Over the same period, the ECMWF system is projected to speed up to about 2200 teraflops.)   This will enable the resolution of the GFS to be increased.

The NWS projects the Sandy supplemental funds will help enhance the horizontal resolution of the GFS model by around a factor of 3 by FY2015, enough to rival the ECMWF.

There are also plans to make other improvements in the model’s physics, and in its associated data acquisition and processing systems.

These improvements are worth having.  The projected $25 million cost is a very small percentage of the total Federal budget (about $3.6 trillion for fiscal 2012).  As we are reminded all too often, extreme weather events can come with a very large price tag, especially when they are unexpected.  Better forecasts have the potential to save money and lives.


Homomorphic Encryption Library Released

May 2, 2013

One of the issues that tends to come up when people consider the possible use of “cloud computing” is data security.  The data can be stored in an encrypted form, of course, but it generally has to be decrypted in order to be used; this means that the data is potentially vulnerable while it is is plain text (decrypted) form.

I’ve written before about the potential of homomorphic encryption to address this problem.  The basic idea is that we would like to discover an encryption function such that a defined set of operations can be performed on encrypted data (say, adding them) to produce an encrypted result that, when decrypted, is the same as that obtained by operating on the plain text data.

In other words, if we have two numbers, α and β, and suitable encryption and decryption functions E(x) and D(x), respectively, and if

α + β = S

Then, if

E(α) + E(β) = S*

it will be true that

D(S*) = S

So we are able to add the two encrypted values to get a sum that, when decrypted, is the sum of the original (unencrypted) numbers.

This sounds almost like magic, but it has been proved to be theoretically possible, and there is a considerable amount of ongoing work to try to reach a practical implementation.  (For example, back in 2011, a group of researchers at MIT introduced CryptDB, database software that incorporates homomorphic encryption in a form suitable for some applications.

Now, according to an article at the I Programmer site, researchers from IBM’s Thomas J. Watson Research Laboratory have released an open-source library for homomorphic encryption, HElib (documentation and source code available at Github).

HElib is a software library that implements homomorphic encryption (HE). Currently available is an implementation of the Brakerski-Gentry-Vaikuntanathan (BGV) scheme, along with many optimizations to make homomorphic evaluation runs faster, focusing mostly on effective use of the Smart-Vercauteren ciphertext packing techniques and the Gentry-Halevi-Smart optimizations.

Although, as Bruce Schneier has observed, it will take a fair while for any of this technology to be scrutinized thoroughly enough by enough knowledgeable people to ensure that it doesn’t have serious flaws, getting algorithms and code out and available for inspection is an essential part of that process.

Update Thursday, May 2, 22:59 EDT

I have changed the headline/title of this post; originally, it was “IBM Releases Homomorphic Encryption Library”; that could be interpreted as describing an “official”  corporate action of IBM.  Since I have no knowledge, one way or another, about this sort of thing, I thought the new headline was less likely to lead to misunderstanding.


%d bloggers like this: