LWN.net Logo

IBM's homomorphic encryption library

By Nathan Willis
May 8, 2013

IBM has released a new GPL-licensed encryption library called HElib which boasts the property of being fully homomorphic. This means that the encrypted message can be manipulated (subject to certain constraints), and the transformations applied to it will propagate correctly to the plaintext message. For example, given a number n and its encrypted "ciphertext" E(n), a homomorphic encryption scheme might allow you to multiply by a constant, and decrypting 42 * E(n) would give you 42 * n.

Exactly which operations preserve this property varies with the encryption scheme used. Schemes which are partially homomorphic preserve the homomorphic property only for addition or for multiplication, but not both. A fully homomorphic scheme preserves computation for both addition and multiplication, which is considerably more powerful. Such a system also preserves the useful algebraic properties of a ring:

  • The associative property of addition: (a + b) + c = a + (b + c)
  • The commutative property of addition: a + b = b + a
  • The associative property of multiplication: (a * b) * c = a * (b * c)
  • The distributive property: a * (b + c) = (a * b) + (a * c)

In short, with a fully homomorphic encryption scheme, you can perform a wide variety of computations on a file without decrypting it first. Consequently, untrusted programs can operate on encrypted data without risking the disclosure of its information. Remote servers could adjust financial documents without knowing their contents (for example, crediting an account, or computing sales tax on a purchase), voting machines could track totals without disclosing what the intermediate counts are, sensitive databases can be recklessly stored on public web sites, and so on (well, perhaps not that final example). There are several well-known cryptosystems which are partially homomorphic; for example, RSA encryption is homomorphic over multiplication only. But only recently have there been practical systems that are fully homomorphic.

The downside is that the few known fully homomorphic systems are considerably slower than are partially homomorphic schemes. The very first was described in 2009 by Craig Gentry. It used lattice-based cryptography, but it was not particularly practical, because the complexity of the operations needed scaled exponentially with the degree of attack-resistance desired. Just as importantly, the tendency is for ciphertext in a homomorphic encryption scheme to get longer (often drastically longer) every time additional operations are applied to it. Thus, the scheme was inefficient in space as well as in time. Gentry's original method also involved setting up an initial lattice that is "somewhat" homomorphic (meaning that it is very limited in the functions that it can be used to evaluate), then "bootstrapping" it by iteratively improving it. Subsequent advances allowed faster computation of Gentry's original method, eliminating bootstrap stages and replacing some of the lattice-based operations with integer math.

HElib implements Brakerski-Gentry-Vaikuntanathan (BGV) encryption, first published in 2011 by Zvika Brakerski, Gentry, and Vinod Vaikuntanathan. It too is a derivative of Gentry's work, building on several intermediate schemes devised during the years in between. HElib also incorporates a number of speed optimizations, including single instruction, multiple data (SIMD) operations to parallelize some encryption steps, and other methods for keeping the ciphertext as compact as possible (thus making it faster to process). For a bit of real-world comparison, one commenter on the Hacker News coverage of the release looked at the results described in one of the optimization papers and estimated that computing AES-128 rounds was about 10 billion times slower in the homomorphic system than when using the straightforward mathematical approach.

HElib is written in C++ using the NTL mathematical library. Currently, one does need to have a good understanding of homomorphic encryption to make much use of HElib—one does not simply pipe a text file to it and drag the result into a spreadsheet cell. So its importance is more about the impact that will be felt in the long term than it is about the prospect of drag-and-drop homomorphic encryption for the average user.

In the future, homomorphic storage might make sense in databases or in a number of different places where a form of "write only" access is desirable. An example is the voting-machine scenario mentioned above. The voting machine's tally for each candidate can be encrypted, so no intruder can read the result. But each voter's ballot can add an encrypted 1 to their selection and an encrypted 0 to each of the others. The voter could theoretically decrypt his or her ballot and verify its contents, but no one else can; the count in the machine is still incremented correctly, without ever needing to be in possession of the ballot in plaintext form. The same approach could be applied to secure event logging, message routing, or any number of other tasks.

Even if practical applications remain years away, it is a big win that HElib is being made available to the public under the GPL, rather than debuting in a proprietary tool. For decades, homomorphic encryption was not even known to be possible; now there is considerable work still remaining in order to develop a practical tool (as well as scrutiny by other cryptographers to analyze the scheme for overlooked weaknesses)—but there is also working code where before there was not.


(Log in to post comments)

"homomorphic"

Posted May 9, 2013 12:13 UTC (Thu) by smitty_one_each (subscriber, #28989) [Link]

Took about three reads of the title to figure out that there was no bigotry afoot. ;-)

"homomorphic"

Posted May 9, 2013 13:49 UTC (Thu) by kpfleming (subscriber, #23250) [Link]

How do I calculate what "10 billion times slower" actually means? Is there some reference for which AES-128 calculations are already "slower" which I can multiply by 10 billion?

"homomorphic"

Posted May 9, 2013 17:36 UTC (Thu) by patrick_g (subscriber, #44470) [Link]

AES encryption at 19 seconds per byte on a 18MB cache, 256GB RAM machine ?
Oh my God !

"homomorphic"

Posted May 9, 2013 17:46 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

IBM motto: "It may be slow, but it's hard to use!"

"homomorphic"

Posted May 9, 2013 21:37 UTC (Thu) by nix (subscriber, #2304) [Link]

Not yet practical, but it was a *lot* less practical only a few years ago, and not known to be possible at all before that. It is clearly on the up and up... and, to be honest, this is one of those advances that to me looks like magic technology that has dropped through a wormhole from the 25th century. If it's not useful until then, I won't be particularly surprised. :)

"homomorphic"

Posted May 10, 2013 20:25 UTC (Fri) by felixfix (subscriber, #242) [Link]

Quick answer: Most processors are a few billion cycles per second. Regardless of how many cores or how many instructions per cycle, the first rough answer is anything which took one cycle before would take one second at the stated ratio.

IBM's homomorphic encryption library

Posted May 10, 2013 12:01 UTC (Fri) by ras (subscriber, #33059) [Link]

Thank you Nathan. I saw the IBM PR release, but I missed them releasing it as a GPL library.

A working homomorphic system will change the world. It's a bit of pity they are concentrating on mathematical operations like addition and multiplication. The only thing you need to change the world is range searches. Perhaps the traditional operations can be translated into that.

I guess you've set my homework assignment.

IBM's homomorphic encryption library

Posted May 10, 2013 14:32 UTC (Fri) by servilio-ap (subscriber, #56287) [Link]

The only thing you need to change the world is range searches.

Can you explain this a bit, please?

IBM's homomorphic encryption library

Posted May 11, 2013 4:00 UTC (Sat) by ras (subscriber, #33059) [Link]

> Can you explain this a bit, please?

Homomorphic encryption doesn't really line up without how we use data stored "in the cloud". We don't as a rule do much in the way computation on it.

Instead, we break down the task like this:

- We have databases that are very good at two things: storing huge amounts of data reliably, and finding it again - ie searching. This has been true for decades, and looks like it will hold true in the cloud era.

- We have powerful CPU's, that ask the storage engine for a small subset of that data, then display and/or modify it and ask the storage engine to save it again.

If you look at this break down in terms what we have to trust - it turns out we have to trust the thing displaying the data to the human. We don't have a choice in that because humans can't do decryption, so the CPU must do it, so it must inevitably will see the plaintext. So in the end we have to trust it will keep that plaintext private.

What about the storage backends? Do we have to trust them? Well until we starting subcontracting storage out to 3rd parties (ie, in marketing speak place our data in the cloud) we didn't have to think about it, as while we had separated out the storage and compute functions for other reasons it was still all owned by us. But now it's not. But we have continued apply the same level of trust as before, and as a consequence for instance, GMail gets to read all your mail even though all it is doing it is storing it. And DropBox is unencrypted, even though there is no need for it to be that way.

Speaking for myself, I would prefer it wasn't that way. Speaking from the viewpoint of someone selling cloud services, they are probably better off if it wasn't that way. I know my government refuses to give them business because they can see into every piece of data given to them, and I hope my insurance company, my doctor, my tax agent and just about everybody else that stores my private data refuses to deal with them for the same reason.

Having said that, I would love to be able to delegate storage of my data to someone like Google. They are so darned good at it, and the price is absurdly cheap.

OK, so I would love Google to be my storage engine. Remember what the storage engine did - it stored and searched. Sending them encrypted stuff so they don't know what is in it is drop dead easy. But searching the data while it remains encrypted so a suitably small subset can be returned to the CPU we must trust - well that is an interesting problem. Finding exact matches - that's easy. But after thinking about it, I think you need more - you need to be able to select all data that sits in a particular range.

There are some systems seem to say they can do this. I have the papers on my laptop, but I haven't got around to reading with the care required yet.

> Wouldn't a range search be a means of decryption?

Yes, to an extent. But remember the purpose of all this searching is to whittle down the data into something that is small enough can be packaged up and sent to a remote CPU for processing. The purpose is not to "find the exact item". The first step would to not allow the bulk of the data to be search at all. For example, for email, just restricting it to date ranges might be sufficient. Certainly just the making the header data searches would almost all searches I do. And you can use tricks - like returning encrypted rubbish even when the search returns nothing, and restricting the precision of the search - say by ensuring you can't reduce the date range to less that 10 emails.

It seems to me a workable compromise could be reached.

IBM's homomorphic encryption library

Posted May 11, 2013 5:03 UTC (Sat) by dlang (✭ supporter ✭, #313) [Link]

The problem is, if you can search to see if a record matches some range and get the matches back, a series of searches will let you discover the exact value.

And once you have a search that matches just one record, you can now do searches to discover the exact value of every field in that record

Using your e-mail example, if there are fields that you are willing to let people discover through searches, then have those fields (or copies of them) exist in the database in cleartext and you don't need to jump through any hoops to do your matching on that data, and the rest of the data remains encrypted.

IBM's homomorphic encryption library

Posted May 11, 2013 5:43 UTC (Sat) by ras (subscriber, #33059) [Link]

> The problem is, if you can search to see if a record matches some range and get the matches back, a series of searches will let you discover the exact value.

Not if you do it right. If you say put the dates into buckets, such that say each bucket contains at least 10 emails. If your date range include any item in the bucket you get all of it.

> then have those fields (or copies of them) exist in the database in cleartext and you don't need to jump through any hoops to do your matching on that data

Yeah, well the only reason you would suffer the complexity and speed hit of homomorphic operations is if you paranoid enough to to make revealing dates an unacceptable choice. If you are happy to store the data in plain text then that is always going to be a better choice.

IBM's homomorphic encryption library

Posted May 11, 2013 6:37 UTC (Sat) by dlang (✭ supporter ✭, #313) [Link]

In theory you are right, in practice, not so much.

You have to _always_ return the same 10 e-mails (what do you do if one of them gets deleted)

the variety of the 10 e-mails is not likely to be very large, so you may not be able to confirm every detail about one particular e-mail, but you can do a pretty good job of confirming every detail about all of the 10 e-mails, even if you can't absolutely know which permutations of details belong to each message (and you can make pretty intelligent guesses)

the bucket size needs to be pretty large, and contain a 'good' mix of data. At that point what was the value of your search again?

IBM's homomorphic encryption library

Posted May 11, 2013 17:39 UTC (Sat) by paulj (subscriber, #341) [Link]

Also, you need to think about what happens when there are multiple ranges supported. Each range search might return a bucket of 10 emails, but their intersection might be just be a smaller number. And that fact of itself might be revealing.

IBM's homomorphic encryption library

Posted May 12, 2013 1:49 UTC (Sun) by dlang (✭ supporter ✭, #313) [Link]

that's the type of thing I was referring to when I said that you would always have to return the same 10 e-mails as a given bucket.

IBM's homomorphic encryption library

Posted May 12, 2013 7:21 UTC (Sun) by ras (subscriber, #33059) [Link]

> Also, you need to think about what happens when there are multiple ranges supported.

No, again not if you do it right. Lets say we have a bucket, and it ends up covering the range 1111-01-01...1111-02-01. Any query that has any part of its range will return every email in that bucket. There may be no emails that actually match in the bucket. What the attacker gets back is effectively a lump of random data. All he knows is that it always the same random data - not that any of the data actually matches his query. The object of the exercise here is to reduce the amount of data that is transferred to something tolerable. A bucket of 100 email-id's would take 5 KB or so. Combining queries with AND's is out of course, but "in this date range OR sender contains 'ABCDEF'" would be fine, and then the back end must calculate the AND to get "find all emails from ABCDEF in this date range".

You are also assuming the attacker knows what you are querying, which he doesn't. What he knows is you have sent some encrypted data to the engine, the engine is applying some homomorphic operation to the payload and what it has stored to yield a result, and in this case the result always returns the same answer. The whole point of homomorphic operation is no one, including CPU executing the operation, as any idea what it means. So neither the CPU doing the operation nor anyone watching knows you are doing a date range query, let alone for what dates. I presume it would be possible to know the homomorphic operation is a range check of some sort - but that is all they know.

I think you may be confusing research showing how it often possible to get information about one specific person out of data that supposedly has been aggregated to prevent that that. You are right to be nervous about that, but it has no relevance here.

IBM's homomorphic encryption library

Posted May 12, 2013 7:25 UTC (Sun) by dlang (✭ supporter ✭, #313) [Link]

Actually, I was assuming that the attacker is crafting the query, and can see the result, but not necessarily decrypt it.

IBM's homomorphic encryption library

Posted May 12, 2013 8:23 UTC (Sun) by ras (subscriber, #33059) [Link]

> Actually, I was assuming that the attacker is crafting the query, and can see the result, but not necessarily decrypt it.

We would try to ensure only an authenticate entities could ask queries, but it is reasonable to assume the attacker has been clever enough to figure a way around that. And in particular if the attacker is the storage system itself, it does have a way around it.

However "Craft a query" implies to me know what you are asking. The storage has encrypted data, does some homomorphic operations on the data it has and encrypted data it has been sent - ("the query"). This operation yields unencrypted data - which effectively a "yes/no" to send some data it has stored back.

So yes, in the sense that the server can dream up throw random queries at the database and see what it says - it can "craft queries". But since it has no idea what it is asking it is of limited usefulness.

The trigger for my comment is what these people aiming for - a complete and fast set of homomorphic operations, seems like it is a long, long away away. But we don't need that to change the world. All we need are a set of homomorphic operations for querying a database. And even that turns out to be simpler than it sounds - all you need to implement is a homomorphic range query. That sounds plausible to me, so plausible I've made it a personal hobby project.

IBM's homomorphic encryption library

Posted May 12, 2013 9:57 UTC (Sun) by paulj (subscriber, #341) [Link]

Yes, I was thinking of the difficulties of anonymising data, and worried if the problems there could apply to this. If they don't apply, as you say, then that is good.

IBM's homomorphic encryption library

Posted May 10, 2013 20:21 UTC (Fri) by felixfix (subscriber, #242) [Link]

"The only thing you need to change the world is range searches."

Wouldn't a range search be a means of decryption? Keep narrowing it down until you have something useful. Even a partial search like SQL provides would be dangerous, seems to me. Only a full match search wouldn't disclose much.

IBM's homomorphic encryption library

Posted May 14, 2013 12:27 UTC (Tue) by ortalo (subscriber, #4654) [Link]

As ras pointed in an earlier comment, the search query itself is made with encrypted parameters. If you can craft that query (meaningfully not at random), it basically means to me you have already broken a lot of the system security. That's the whole point of homomorphic libraries I think: the inputs are encrypted too. So you could theoretically offload some computation to untrusted systems. Well, the idea may well still be far far away from practical use; but like ras, I think too it needs careful studying. BTW, the LAAS lab where I did my PhD pioneered these techniques in the 90s. The overall architecture is still pretty interesting (google for Fragmentation-Redundancy-Scattering , e.g. here).

IBM's homomorphic encryption library

Posted May 10, 2013 14:04 UTC (Fri) by johill (subscriber, #25196) [Link]

Would the voting machine scenario really work?

I don't think even homomorphic cryptosystems can do any useful work with E(n) and E'(m), and even if they could you'd only end up with something like E(E'(n+m)) or something like that?

So I'm not sure I see the application to voting machines.

If you want to keep your vote secret by calculating E(0) or E(1), then you need to know how to calculate E() but maybe not the decryption D(). But then whatever you know as a voter basically has to be public knowledge, so others can calculate E(0) and E(1) and compare to your (encrypted) vote. Or they can just calculate D(E(x)) if D() is public.

If E() on the other hand E() was really parametrized with your key, then you get E_k(), but if k varies for each voter then I don't think the results can be added in a meaningful way?

IBM's homomorphic encryption library

Posted May 20, 2013 14:02 UTC (Mon) by Jan_Zerebecki (guest, #70319) [Link]

As to the usefulness of homomorphic crypto for voting: Verifying Elections with Cryptography by Ben Adida ( http://www.youtube.com/watch?v=ZDnShu5V99s ) explains a cryptographic voting system which uses the homomorphic properties of ElGamal encryption. As far as I remember it ensures everyone can check that the votes are counted correctly but no one can find out what a specific voter voted for.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds