Searching for things on the Internet has become such a common activity that it is sometimes hard, even for those of us who have been around for a while, to remember that there was a time Before Google[BG]. The search refinements introduced by Google, and others, have turned what was, in times BG mostly an exercise in frustration, into a genuinely useful tool.
Yesterday’s New York Times has an article on a new trend in search and data gathering technology, called sentiment analysis, which attempts to extract information about public opinion from, for example, postings on social networking sites:
An emerging field known as sentiment analysis is taking shape around one of the computer world’s unexplored frontiers: translating the vagaries of human emotion into hard data.
The Web, with its social networking sites, product reviews (including customer reviews on sites like Amazon), blogs, and other user-driven sites, has become an enormous source of news, opinions, and gossip about products and services. Not surprisingly, the sellers of these products and services are interested in knowing what is being said about them.
For many businesses, online opinion has turned into a kind of virtual currency that can make or break a product in the marketplace.
Several start-up firms are attempting to develop a business in supplying this kind of information. They claim that their analysis methods and software are able to troll through a vast quantity of Internet postings, and extract summary data about the public’s reaction to a new product, for example.
This is certainly an interesting idea. To the extent that it provides a way to identify opinion data of interest, it is potentially valuable. For many years, businesses have used clipping services, which employ people to look through published material, such as newspapers and magazines, to find articles relevant to the business. These new services may offer a better or more efficient way to accomplish the same thing,
But I’m fairly skeptical about some of the broader claims for this technology. In essence, what it is claimed to do is:
- Identify relevant postings
- Process and analyze the (natural language) content of these postings
- Assign a “sentiment rating” to the feelings expressed in the posting
This is biting off quite a considerable chunk to chew. The difficulties and ambiguities inherent in processing natural language are fairly well known, and are a key reason why machine translation is a difficult problem. Furthermore, the correct evaluation of feelings or emotions from written material is also notoriously hard: witness the almost universal advice to new users of E-mail to be careful to avoid ambiguities of emotion or “tone”. (Think, for example, of how a program might evaluate a review of a portable heater, described by a customer as “wicked cool”.) One of the firms marketing this technology says it;s technology is not perfect, but is “70 to 80 percent accurate”. Even apart from the obvious question, “70 to 80 percent of what?”, I suspect this is to a certain degree wishful thinking.
Now if this technology, whatever its merits or lack thereof, is just used as a new way to sell things, it’s probably something we can learn to live with. But I think there’s a potential darker outcome. It seems to me that this kind of technology might have a lot a superficial appeal to the kind of security folks that are keen on wholesale trolling of peoples’ communications. After all, many of these same folks are quite keen on polygraph tests, despite the fact that a 2003 report from the National Academy of Sciences said that most of the evidence supporting use of the polygraph was “unreliable, unscientific, and biased”. We really don’t need any more secret “potential terrorist” lists, compiled on the basis of undisclosed evidence.