Trawling for Trouble

August 13, 2012

Banks have gotten quite a bit of bad press in the last few years, much of it well-deserved.  A recent article at Technology Review describes a new type of analytical software that claims to be able to help bank managements spot activity that is ethically or legally problematic.  The software, from a company called Digital Reasoning, uses machine learning techniques to look for potential problems in unstructured data, such as E-mails, tweets, and document files.

The software uses statistical models to break down sentences and infer their meaning. This is important because finding warning signs may not be as simple as matching a string of text.

One can see that string matching would probably not do the job; even the dimmest of potential swindlers probably does not put “Proposed Fraudulent Trade” as the Subject: line of his E-mail.  On second thought, though, that may be an unwarranted assumption:

U.S. Senate hearings later revealed that in 2007, before the financial meltdown, Goldman Sachs employees wrote e-mails bragging of selling blatantly terrible investments to clients.

(For that case, though, it is not clear that identifying the problem requires any particularly sophisticated analysis.)

I haven’t seen Digital Reasoning’s products, and so really can’t comment on them.  But I think this is an interesting example of the general trend in business to (belatedly) realize the potential value of the huge masses of unstructured information that they possess, information that is not captured in standard data bases.  They have, of course, had this sort of data for a very long time; for many years, though, it was on sheets of paper spread across hundreds or thousands of filing cabinets, and there was no practical way to get at it.  Now, because it is available in machine-readable form, it can be examined via statistical and artificial intelligence techniques.

The trend has even acquired its own buzzword: “Big Data”.   Google’s search engine is probably the most well-known example of the approach.  IBM’s Watson system, which beat human Jeopardy! champions, is another good example.   (Dr. Stephen Wolfram, developer of the Mathematica and Wolfram|Alpha software, discussed the different classes of data in his comments on Watson.)

Historically, business computing has been focused on the collection of structured data (in relational data bases, for example) to be processed using well-defined procedures (payroll, or trade settlement, for example).   The interest in Big Data marks a shift toward a less procedural world view; there is an interesting parallel, I think, in the evolution of machine translation.   The new approach will undoubtedly produce some bogus results, just as Watson came up with a few classic bloopers on Jeopardy!.   Still, it is a fascinating area, and its development may also give us some new insights into how we think.


%d bloggers like this: