User: Password:
|
|
Subscribe / Log in / New account

IBM's homomorphic encryption library

IBM's homomorphic encryption library

Posted May 11, 2013 4:00 UTC (Sat) by ras (subscriber, #33059)
In reply to: IBM's homomorphic encryption library by servilio-ap
Parent article: IBM's homomorphic encryption library

> Can you explain this a bit, please?

Homomorphic encryption doesn't really line up without how we use data stored "in the cloud". We don't as a rule do much in the way computation on it.

Instead, we break down the task like this:

- We have databases that are very good at two things: storing huge amounts of data reliably, and finding it again - ie searching. This has been true for decades, and looks like it will hold true in the cloud era.

- We have powerful CPU's, that ask the storage engine for a small subset of that data, then display and/or modify it and ask the storage engine to save it again.

If you look at this break down in terms what we have to trust - it turns out we have to trust the thing displaying the data to the human. We don't have a choice in that because humans can't do decryption, so the CPU must do it, so it must inevitably will see the plaintext. So in the end we have to trust it will keep that plaintext private.

What about the storage backends? Do we have to trust them? Well until we starting subcontracting storage out to 3rd parties (ie, in marketing speak place our data in the cloud) we didn't have to think about it, as while we had separated out the storage and compute functions for other reasons it was still all owned by us. But now it's not. But we have continued apply the same level of trust as before, and as a consequence for instance, GMail gets to read all your mail even though all it is doing it is storing it. And DropBox is unencrypted, even though there is no need for it to be that way.

Speaking for myself, I would prefer it wasn't that way. Speaking from the viewpoint of someone selling cloud services, they are probably better off if it wasn't that way. I know my government refuses to give them business because they can see into every piece of data given to them, and I hope my insurance company, my doctor, my tax agent and just about everybody else that stores my private data refuses to deal with them for the same reason.

Having said that, I would love to be able to delegate storage of my data to someone like Google. They are so darned good at it, and the price is absurdly cheap.

OK, so I would love Google to be my storage engine. Remember what the storage engine did - it stored and searched. Sending them encrypted stuff so they don't know what is in it is drop dead easy. But searching the data while it remains encrypted so a suitably small subset can be returned to the CPU we must trust - well that is an interesting problem. Finding exact matches - that's easy. But after thinking about it, I think you need more - you need to be able to select all data that sits in a particular range.

There are some systems seem to say they can do this. I have the papers on my laptop, but I haven't got around to reading with the care required yet.

> Wouldn't a range search be a means of decryption?

Yes, to an extent. But remember the purpose of all this searching is to whittle down the data into something that is small enough can be packaged up and sent to a remote CPU for processing. The purpose is not to "find the exact item". The first step would to not allow the bulk of the data to be search at all. For example, for email, just restricting it to date ranges might be sufficient. Certainly just the making the header data searches would almost all searches I do. And you can use tricks - like returning encrypted rubbish even when the search returns nothing, and restricting the precision of the search - say by ensuring you can't reduce the date range to less that 10 emails.

It seems to me a workable compromise could be reached.


(Log in to post comments)

IBM's homomorphic encryption library

Posted May 11, 2013 5:03 UTC (Sat) by dlang (subscriber, #313) [Link]

The problem is, if you can search to see if a record matches some range and get the matches back, a series of searches will let you discover the exact value.

And once you have a search that matches just one record, you can now do searches to discover the exact value of every field in that record

Using your e-mail example, if there are fields that you are willing to let people discover through searches, then have those fields (or copies of them) exist in the database in cleartext and you don't need to jump through any hoops to do your matching on that data, and the rest of the data remains encrypted.

IBM's homomorphic encryption library

Posted May 11, 2013 5:43 UTC (Sat) by ras (subscriber, #33059) [Link]

> The problem is, if you can search to see if a record matches some range and get the matches back, a series of searches will let you discover the exact value.

Not if you do it right. If you say put the dates into buckets, such that say each bucket contains at least 10 emails. If your date range include any item in the bucket you get all of it.

> then have those fields (or copies of them) exist in the database in cleartext and you don't need to jump through any hoops to do your matching on that data

Yeah, well the only reason you would suffer the complexity and speed hit of homomorphic operations is if you paranoid enough to to make revealing dates an unacceptable choice. If you are happy to store the data in plain text then that is always going to be a better choice.

IBM's homomorphic encryption library

Posted May 11, 2013 6:37 UTC (Sat) by dlang (subscriber, #313) [Link]

In theory you are right, in practice, not so much.

You have to _always_ return the same 10 e-mails (what do you do if one of them gets deleted)

the variety of the 10 e-mails is not likely to be very large, so you may not be able to confirm every detail about one particular e-mail, but you can do a pretty good job of confirming every detail about all of the 10 e-mails, even if you can't absolutely know which permutations of details belong to each message (and you can make pretty intelligent guesses)

the bucket size needs to be pretty large, and contain a 'good' mix of data. At that point what was the value of your search again?

IBM's homomorphic encryption library

Posted May 11, 2013 17:39 UTC (Sat) by paulj (subscriber, #341) [Link]

Also, you need to think about what happens when there are multiple ranges supported. Each range search might return a bucket of 10 emails, but their intersection might be just be a smaller number. And that fact of itself might be revealing.

IBM's homomorphic encryption library

Posted May 12, 2013 1:49 UTC (Sun) by dlang (subscriber, #313) [Link]

that's the type of thing I was referring to when I said that you would always have to return the same 10 e-mails as a given bucket.

IBM's homomorphic encryption library

Posted May 12, 2013 7:21 UTC (Sun) by ras (subscriber, #33059) [Link]

> Also, you need to think about what happens when there are multiple ranges supported.

No, again not if you do it right. Lets say we have a bucket, and it ends up covering the range 1111-01-01...1111-02-01. Any query that has any part of its range will return every email in that bucket. There may be no emails that actually match in the bucket. What the attacker gets back is effectively a lump of random data. All he knows is that it always the same random data - not that any of the data actually matches his query. The object of the exercise here is to reduce the amount of data that is transferred to something tolerable. A bucket of 100 email-id's would take 5 KB or so. Combining queries with AND's is out of course, but "in this date range OR sender contains 'ABCDEF'" would be fine, and then the back end must calculate the AND to get "find all emails from ABCDEF in this date range".

You are also assuming the attacker knows what you are querying, which he doesn't. What he knows is you have sent some encrypted data to the engine, the engine is applying some homomorphic operation to the payload and what it has stored to yield a result, and in this case the result always returns the same answer. The whole point of homomorphic operation is no one, including CPU executing the operation, as any idea what it means. So neither the CPU doing the operation nor anyone watching knows you are doing a date range query, let alone for what dates. I presume it would be possible to know the homomorphic operation is a range check of some sort - but that is all they know.

I think you may be confusing research showing how it often possible to get information about one specific person out of data that supposedly has been aggregated to prevent that that. You are right to be nervous about that, but it has no relevance here.

IBM's homomorphic encryption library

Posted May 12, 2013 7:25 UTC (Sun) by dlang (subscriber, #313) [Link]

Actually, I was assuming that the attacker is crafting the query, and can see the result, but not necessarily decrypt it.

IBM's homomorphic encryption library

Posted May 12, 2013 8:23 UTC (Sun) by ras (subscriber, #33059) [Link]

> Actually, I was assuming that the attacker is crafting the query, and can see the result, but not necessarily decrypt it.

We would try to ensure only an authenticate entities could ask queries, but it is reasonable to assume the attacker has been clever enough to figure a way around that. And in particular if the attacker is the storage system itself, it does have a way around it.

However "Craft a query" implies to me know what you are asking. The storage has encrypted data, does some homomorphic operations on the data it has and encrypted data it has been sent - ("the query"). This operation yields unencrypted data - which effectively a "yes/no" to send some data it has stored back.

So yes, in the sense that the server can dream up throw random queries at the database and see what it says - it can "craft queries". But since it has no idea what it is asking it is of limited usefulness.

The trigger for my comment is what these people aiming for - a complete and fast set of homomorphic operations, seems like it is a long, long away away. But we don't need that to change the world. All we need are a set of homomorphic operations for querying a database. And even that turns out to be simpler than it sounds - all you need to implement is a homomorphic range query. That sounds plausible to me, so plausible I've made it a personal hobby project.

IBM's homomorphic encryption library

Posted May 12, 2013 9:57 UTC (Sun) by paulj (subscriber, #341) [Link]

Yes, I was thinking of the difficulties of anonymising data, and worried if the problems there could apply to this. If they don't apply, as you say, then that is good.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds