Microsoft to Use Hadoop for Big Data

November 20, 2011

It is hardly a secret that, traditionally, Microsoft has not had a particularly friendly view of open-source software, having at various times likened it to communism and, according to Microsoft CEO Steve Ballmer, a cancer.  There is some evidence that Microsoft’s attitude is changing; last fall, for example, Microsoft announced that it was abandoning its Windows Live Spaces blogging platform, and would migrate its users to WordPress.com — that is, of course, the platform I use for this blog.

This past week saw another announcement about Microsoft’s adoption of an open-source technology.   As reported in an article at Wired, Microsoft has decided not only to back the provision of the open-source Hadoop big-data system on its Windows platform, integrating it with future releases of SQL Server, but also to drop development of its own system to do essentially the same thing; the announcement came in a TechNet blog post from Microsoft’s High-Performance Computing team.

Last week, a blog post from Redmond announced that the company would stop development on LINQ to HPC, aka Dryad, a distributed number-crunching platform developed in Microsoft’s Research Lab. Instead, the company will focus on its effort to port Hadoop to its Windows Server operating system and Windows Azure, its online service for building and deploying applications.

Microsoft had announced the planned availability of Hadoop back in October, but now has apparently decided to accept the platform fully.  Hadoop is designed for the analysis of very large, unstructured data sets; it got its real start at Yahoo!, and is also used by Facebook, Twitter, and eBay, among others.  The Hadoop software project is now under the stewardship of the Apache Foundation.  It was originally developed in Java for Linux, so Microsoft will have to port it to the Windows platform.  That work will eventually be fed back into the main Hadoop project.

Doug Leland, general manager of product management for SQL Server, told Wired that the company plans to eventually release its work back to the open source community.

These changes probably are something of a bitter pill for Microsoft, since they are a de facto acknowledgement that the company will not be able to control the future of personal computing as it has controlled the past.  But the changes are good news for users, most of whom care more about getting the best tools than they do about where they come from.

 


Slimming Down Add-Ons

November 19, 2011

Since early this year, the developers at the Mozilla organization have been working on reducing the memory requirements of the Firefox browser, in an effort called MemShrink.  Firefox has had an acknowledged problem with memory leaks; after the browser has been running for a while, it tends to get slower as more memory is lost to useful application.  They’ve made a good deal of progress; beginning with Firefox 7.0, released in September, the memory footprint of Firefox has become noticeably smaller, and many of the more serious problems have been fixed.

Now the Firefox team is planning to turn its attention to memory allocation and use by add-ons.  The wide range of add-ons available for Firefox is one of the things that makes the browser attractive to many users; on the other hand, because the add-ons are generally developed by third parties, Mozilla does not control how they work (other than to screen them to eliminate obvious malware).   Justin Lebar outlined a new strategy for eliminating memory leaks in add-ons, in a blog post last week.

Given that add-ons are so frequently fingered as the causes of leaks, and given that so many add-ons leak, my thesis is that we should stop using the fact that we don’t control add-ons’ code as an excuse not to try to fix this situation.

He sees this as a logical extension of the work Mozilla already does to screen out malicious add-ons, and advocates a three-pronged approach, which he calls “the Carrot, the Stick, and the Wrench” (these have been filed as bugs in Bugzilla).

The Carrot (bug 695471) would establish a process under which all new add-ons would be screened for the most common types of memory leaks.  This should help ensure better-behaved add-ons in the future.

The Stick (bug 695481) would identify add-ons with memory leak problems, so that users could make an informed choice before installing them.  As Lebar notes, the process needs to be set up to be fair to add-on developers, but a similar exercise, conducted earlier, for slow-starting add-ons, produced some real improvements.

The Wrench (bug 695348 initially) “used as a tool, not a bludgeon” is an attempt to provide ad-on developers with better diagnostic tools to track down and squash memory allocation bugs.  Probably a range of tools will be needed.

This is a welcome effort to provide developers with the means to identify and fix memory leaks in their add-ons, to give them an incentive to do so, and to empower users to make informed choices.


Top 500, November 2011

November 17, 2011

Since 1993, the TOP500 project has been publishing a semi-annual list of the 500 most powerful computer systems in the world, as a barometer of trends and accomplishments in high-performance computing.   The systems are ranked based on their speed in floating-point operations per second (FLOP/s), measured on the LINPACK benchmark, which involves the solution of a dense system of linear equations.

This fall’s edition of the list has recently been announced, in conjunction with the SC11 supercomputer conference being held this week in Seattle.   The fastest system is still the Japanese K Computer.

Japan’s “K Computer” maintained its position atop the newest edition of the TOP500 List of the world’s most powerful supercomputers, thanks to a full build-out that makes it four times as powerful as its nearest competitor. Installed at the RIKEN Advanced Institute for Computational Science (AICS) in Kobe, Japan, the K Computer it achieved an impressive 10.51 Petaflop/s on the Linpack benchmark using 705,024 SPARC64 processing cores.

The K Computer is the first system to surpass 10 petaflops (1.0 × 1016 floating point operations per second).  Unlike many large systems, it does not use graphics processors or other special chips as accelerators; it is also one of the more energy-efficient systems on the list.

The second place system is still the Chinese Tianhe-1A system with 2.57 Petaflops.  In fact, the top ten systems have remained in place from June’s ranking.  The entry level for the Top 500 has gone up, though.

In the latest list, the level to the list moved up to the 50.9 Teraflop/s mark on the Linpack benchmark, compared to 39.1 Teraflop/s six months ago.

The total performance of all systems on the Top 500 list is now 74.2 petaflops, compared to 58.7 petaflops in the previous survey.

Intel is the leading supplier of processors for these systems; its processors are used in 384 systems.  AMD Opteron processors are used in 63 systems, and IBM POWER processors in 49 systems.  Graphics processors, principally from NVIDIA, are used to accelerate computations in 39 systems.

As one might expect, the K Computer, now fully built out, uses a good deal of electricity, 12.66 megawatts; however, it is one of the most efficient systems, delivering 830 megaflops per watt.  The average system’s efficiency is 282 megaflops / watt, up from 248 megaflops / watt in the last survey.  The most efficient systems are the IBM BlueGene/Q systems, at 2,029 megaflops / watt.

As has been true for some time, the distribution of operating systems used is rather different from that in the desktop computing market:

OS Family Number % of Capacity
Linux 457 91.4
Unix 30 6.0
BSD-based 1 0.2
Windows 1 0.2
Mixed 11 2.2

In the early days of computing, supercomputers often used highly specialized components and software, and were built in very small numbers.  Today, although the complete systems are still customized for their intended use, the use of commodity processors and open source software has become the norm.  It is interesting, for example, that the number 42 system on the list is Amazon’s EC2 cluster.  Who would have guessed, even twenty years ago, that a bookseller would have more computing horsepower than Sandia National Laboratory (at number 50)?


Google Releases Chrome 15·0·874·121

November 16, 2011

Google has released a new stable version, 15·0·874·121, of its Chrome browser, for Windows, Linux, Mac OS X, and Chrome Frame.  The new version contains some miscellaneous fixes, a new version of the V8 engine, and a security fix.   More details are available in the announcement on the official Chrome Releases blog.

Windows and Mac users should get the new version via the built-in update mechanism.  Linux users should get the updated package from their distributions’ repositories, using their standard package maintenance tools.

Update Thursday, 17 November, 22:40 EST

Clarification: the V8 engine is the component that interprets and executes JavaScript.  The security fix in this release was for a vulnerability in V8.


More DuQu, Stuxnet Similarities

November 16, 2011

A month or so ago, the first news reports began to surface about a new piece of malware called DuQu,  At the time, there was some suspicion that it had been created by the same group that had created the Stuxnet worm, used to attack centrifuge systems in Iran, based on some similarities in the code.  However, since the amount of information available was limited, this was far from certain.

Now, according to an article posted today at ThreatPost, the security news service from Kaspersky Labs, the gradual accumulation of additional evidence has reinforced the similarities, despite the feeling among researchers that they don’t have the whole DuQu story yet.

Researchers are fairly confident now that whoever wrote the Duqu malware also was involved in some way in developing the Stuxnet worm. They’re also confident that they have not yet identified all of the individual components of Duqu, meaning that there are potentially some other capabilities that haven’t been documented yet.

DuQu has been mentioned in the industry press fairly often, and I’ve talked about it here, but it is not particularly widespread.  It has been introduced in a very deliberate, targeted way.  Kaspersky Labs estimates there may be something on the order of fifty infections world-wide, a far cry from some of the “mass market” malware we have seen.   DuQu attacks have been directed at specific targets; different attacks use different encryption schemes, and employ different malware components.  All of this suggests that the people or organization responsible are skilled and well-organized, just as with Stuxnet.

Once again, we are reminded that the malware game has changed a lot since the early days of the Internet.  The attackers are no longer socially- and hygienically- challenged adolescents, but organized crime operations, and perhaps governments.


Malware Signed with Stolen Certificate

November 15, 2011

We are all familiar with the use of secure connections by our Web browsers, which encrypt communications between the browser and the server, in order to prevent eavesdroppers from intercepting confidential information.  (Your browser will indicate a secure session by highlighting the domain name in the URL bar, or with a little padlock icon.)   The SSL/TLS mechanism for establishing these secure connections depends on an infrastructure of cryptographic certificates, issued by a network of Certificate Authorities [CAs].  We saw, back in September,how the compromise of a single CA, DigiNotar in the Netherlands, could create a large headache; there is also some suspicion that the whole CA infrastructure has fundamental problems.

Yesterday, the ThreatPost security news service from Kaspersky Labs reported that a stolen certificate from the Malaysian government was being used to sign malicious software.

F-Secure researchers claim that malware spreading via malicious PDF files is signed with a valid certificate stolen from the Government of Malaysia, in just the latest evidence that scammers are using gaps in the security of digital certificates to help spread malicious code.

F-Secure identified the malware as the Trojan horse program Agent.DTIW.  It apparently exploits a vulnerability in Adobe Reader 8, and comes embedded in a PDF file signed with the stolen certificate.

The malicious PDF was signed using a valid digital certificate for mardi.gov.my, the Agricultural Research and Development Institute of the Government of Malaysia. According to F-Secure, the Government of Malaysia confirmed that the certificate was legitimate and had been stolen “quite some time ago.”

Stolen and bogus certificates have become more common recently.  In addition to the DigiNotar hack, the Stuxnet worm used stolen certificates to infect its target systems. It is somewhat disturbing, in this case, that the Malaysian government has apparently known for some time that the certificate was stolen, but it seems no action was taken to revoke it.

More details are available in a post on the F-Secure News from the Lab blog.


%d bloggers like this: