I’ve written here before about the concept of homomorphic encryption, a technique that allows encrypted data to be processed to get an encrypted result; when the result is decrypted, it will be the same as would be obtained using unencrypted data from the beginning. Craig Gentry, of IBM, published a mathematical proof of the concept in 2009, and some trial implementations have been built. Although there is considerable interest in the concept, to date the very large performance penalty associated with the technique has been a problem.
CryptDB, a piece of database software the researchers presented in a paper at the Symposium on Operating System Principles in October, allows users to send queries to an encrypted set of data and get almost any answer they need from it without ever decrypting the stored information, a trick that keeps the info safe from hackers, accidental loss and even snooping administrators.
The paper is available here [PDF].
The MIT group had two key insights. The first was that, although a fully general homomorphic encryption algorithm was very complex, there were existing algorithms that had similar properties for particular operations. (For example, a method called Paillier encryption allows addition of the encrypted data.) The second insight was that standard SQL queries do not require all that many types of operations. Their approach was to try to find a collection of algorithms that, taken together, would allow the necessary operations to be performed.
“The insight we had, the cool idea, is that SQL queries in a database are composed of relatively few types of operations: equal to, less than, summing up, sorting,” says MIT professor of software technology Nickolai Zeldovich. “For each operation, we were able to find an encryption scheme that is quite efficient at computing on encrypted data.”
The CryptDB approach “wraps” the data with layers of encryption, each of which allows different kinds of operations to be performed on the encrypted data. Depending on the processing to be performed, different layers of encryption are removed, but the data is never fully decrypted. Not surprisingly, this does not provide perfect security.
CryptDB has its limits, the MIT researchers warn–no square roots, for one example. And while the data is never completely decrypted, it does “leak” information about the underlying data when enough outer layers of encryption are removed, revealing attributes like which data points are equal to each other.
Still, the method does provide some significant protection, and its limitations did not seem to have too severe an impact on a sample of real data base queries. Further work will certainly be needed to delineate what can and can’t be done with the CryptDB approach, but it does seem to be the most practical method for processing encrypted data developed so far.