Google's fully homomorphic encryption package
Google's fully homomorphic encryption package
Posted Jun 16, 2021 21:43 UTC (Wed) by njwhite (guest, #51848)In reply to: Google's fully homomorphic encryption package by kleptog
Parent article: Google's fully homomorphic encryption package
Can anyone recommend some pointers to learn more about coding within these constraints, which from the "as you can see with graphics cards" comment above I assume are also more common in GPU coding? It's not something I've had to think about before, but sounds interesting.
Posted Jun 17, 2021 8:55 UTC (Thu)
by kleptog (subscriber, #1183)
[Link] (2 responses)
Suppose you have code like:
this can be rewritten as:
Now there's no branches, but you're doing twice the work. But GPUs in particular excel in parallel processing so as long as you have enough hardware it doesn't matter. What you cant do is loops with an unbounded number of iterations, a limited number you can simply unroll. Lookup tables can be converted to big if statements and flattened.
Put another way, you have to turn your program into a mathematical formula, because FHE works based on the structures of mathematical groups. If you can't turn your program into a formula, it's hard you to see how you could calculate it holomorphically.
So it's like quantum computing, good for some tasks, but it won't replace general computing.
Now, crypto algorithms tend to be all made all fixed time for various reasons, which makes them perfect for this kind of construction. 3D graphics is generally long pipelines of various mappings with not much in the way of loops so also work well.
Posted Jun 17, 2021 10:12 UTC (Thu)
by excors (subscriber, #95769)
[Link]
In more modern GPUs, the tricky part is that you write a GPU shader as scalar code but it gets executed as SIMD (with typically around 32 SIMD lanes), and the hardware uses a combination of dynamic flow control and masking. Like in your example "if(a == 0) r = comp1(); else r = comp2();", if the condition is true for all 32 lanes then the GPU will execute comp1() and jump over comp2(). But if the condition is only true for 31 lanes, it has to step through all the instructions of both comp1() and comp2() for all 32 lanes, with a mask register to suppress execution on individual lanes based on the condition. The divergent control flow in 1 out of 32 lanes is doubling the total number of instructions processed for all 32 lanes, so you want to be careful to avoid that if you care about performance.
That masking is not quite the same as manually writing "r = (a==0)*comp1() + (a!=0)*comp2()", because the masking will likely also suppress any memory reads performed by comp1()/comp2() in the disabled lanes. That's good for performance, though obviously not good if your goal is to avoid leaking information. On the other hand, if there are no memory reads and a high likelihood of divergent control flow, then you might get better performance by doing more arithmetic and avoiding the overhead of branches.
Posted Jun 17, 2021 17:18 UTC (Thu)
by smurf (subscriber, #17840)
[Link]
However, the library Google uses doesn't work that way. Instead, it works by basically simulating a boolean hardware circuit that carries out the operation. Computer hardware doesn't have if/then/else – it has boolean and/or/not. So you want a simple 32-bit addition? Fine, just write a loop over a 32-bit field with hardware carry and all that.
This is where the 9-orders-of-magnitude slowdown WRT "real" computer hardware comes from. That's a lot: a one-second calculation would take 30 years. Thus yes it's nice but still a research proof of concept until somebody manages to speed those bit operations up a whole lot *or* somebody manages to implement subtraction and test-for-zero.
I don't have any good references, but the basic idea is straightforward.
Google's fully homomorphic encryption package
if(a == 0)
r = comp1()
else
r = comp2()
r = (a==0)*comp1() + (a!=0)*comp2()
Google's fully homomorphic encryption package
Google's fully homomorphic encryption package