Google's fully homomorphic encryption package

Posted Jun 16, 2021 21:43 UTC (Wed) by njwhite (guest, #51848)
In reply to: Google's fully homomorphic encryption package by kleptog
Parent article: Google's fully homomorphic encryption package

Indeed, for me as someone who knew nothing about FHE beforehand, the restrictions on only using "data-independent" computations in the paper were quite a shock - up until that point it very much sounded like this transpiler should work out of the box on normal C++ code.

Can anyone recommend some pointers to learn more about coding within these constraints, which from the "as you can see with graphics cards" comment above I assume are also more common in GPU coding? It's not something I've had to think about before, but sounds interesting.

Google's fully homomorphic encryption package

Posted Jun 17, 2021 8:55 UTC (Thu) by kleptog (subscriber, #1183) [Link] (2 responses)

I don't have any good references, but the basic idea is straightforward.

Suppose you have code like:

if(a == 0)
   r = comp1()
else
   r = comp2()

this can be rewritten as:

r = (a==0)*comp1() + (a!=0)*comp2()

Now there's no branches, but you're doing twice the work. But GPUs in particular excel in parallel processing so as long as you have enough hardware it doesn't matter. What you cant do is loops with an unbounded number of iterations, a limited number you can simply unroll. Lookup tables can be converted to big if statements and flattened.

Put another way, you have to turn your program into a mathematical formula, because FHE works based on the structures of mathematical groups. If you can't turn your program into a formula, it's hard you to see how you could calculate it holomorphically. So it's like quantum computing, good for some tasks, but it won't replace general computing.

Now, crypto algorithms tend to be all made all fixed time for various reasons, which makes them perfect for this kind of construction. 3D graphics is generally long pipelines of various mappings with not much in the way of loops so also work well.

Google's fully homomorphic encryption package

Posted Jun 17, 2021 10:12 UTC (Thu) by excors (subscriber, #95769) [Link]

I think that only really applies to ancient GPUs - they've supported dynamic flow control and loops since about 2004 (with Shader Model 3.0), and some rendering techniques make heavy use of that.

In more modern GPUs, the tricky part is that you write a GPU shader as scalar code but it gets executed as SIMD (with typically around 32 SIMD lanes), and the hardware uses a combination of dynamic flow control and masking. Like in your example "if(a == 0) r = comp1(); else r = comp2();", if the condition is true for all 32 lanes then the GPU will execute comp1() and jump over comp2(). But if the condition is only true for 31 lanes, it has to step through all the instructions of both comp1() and comp2() for all 32 lanes, with a mask register to suppress execution on individual lanes based on the condition. The divergent control flow in 1 out of 32 lanes is doubling the total number of instructions processed for all 32 lanes, so you want to be careful to avoid that if you care about performance.

That masking is not quite the same as manually writing "r = (a==0)*comp1() + (a!=0)*comp2()", because the masking will likely also suppress any memory reads performed by comp1()/comp2() in the disabled lanes. That's good for performance, though obviously not good if your goal is to avoid leaking information. On the other hand, if there are no memory reads and a high likelihood of divergent control flow, then you might get better performance by doing more arithmetic and avoiding the overhead of branches.

Google's fully homomorphic encryption package

Posted Jun 17, 2021 17:18 UTC (Thu) by smurf (subscriber, #17840) [Link]

Yeah, they can be rewritten that way, but only if you have a cryptographic comparison-with-zero operator (which doesn't have to exhibit its plaintext value of course). If you can only do addition and multiplication (which is best-of-class for numeric homomorphic encryption systems AFAIK), no dice.

However, the library Google uses doesn't work that way. Instead, it works by basically simulating a boolean hardware circuit that carries out the operation. Computer hardware doesn't have if/then/else – it has boolean and/or/not. So you want a simple 32-bit addition? Fine, just write a loop over a 32-bit field with hardware carry and all that.

This is where the 9-orders-of-magnitude slowdown WRT "real" computer hardware comes from. That's a lot: a one-second calculation would take 30 years. Thus yes it's nice but still a research proof of concept until somebody manages to speed those bit operations up a whole lot *or* somebody manages to implement subtraction and test-for-zero.