LWN: Comments on "Parallel page rendering with Mozilla Servo"

Parallel page rendering with Mozilla Servo

zlynx — Sat, 27 Jun 2015 23:49:27 +0000

Oddly this is true of the mid and low-end x86_64 chips but not the high-end CPUs.

For example, the Xeon CPUs and the 5960x do not contain an Intel GPU. The AMD FX chips don't have GPUs either. I suppose that they give up the GPU for more cores and cache.

In the future though, I think the high-end CPUs will pick up the GPU as well, just for the features like accelerated video codecs for remote desktop access.

Parallel page rendering with Mozilla Servo

raven667 — Sat, 27 Jun 2015 18:19:19 +0000

Pretty much every x86_64 CPU has a GPU on-die as well, just like ARM SoC usually do, even if you are using a remote GPU over PCIe to actually drive the displays. If you treat your OpenCL programs as just an extension of your x86_64 programs, like the FPU or SSE, you can use whole sections of the CPU hardware that would otherwise be sitting idle. There is no requirement that you have a discrete GPU.

Parallel page rendering with Mozilla Servo

excors — Sat, 27 Jun 2015 17:04:25 +0000

> Another audience member asked if they had explored other forms of parallelism, such as GPU parallelism. The team had explored it, they replied, but found that as of now, the I/O overhead required to move page contents in and out of the GPU erased all performance gains.

I wonder if they explored it on mobile SoC GPUs, or just discrete GPUs on PCs?

My understanding is that discrete GPUs suffer from high latency (~10usec or more?) for accesses over PCI Express. That latency is negligible if you can copy a very large batch of data into the GPU's VRAM at once and process it all on the GPU, but some tasks can't be easily collected into large batches and will be limited by the latency; and designing data structures to be efficiently copied can be hard (you probably need to pack all the data for one task into as few 4KB pages as possible and avoid pointers).

Mobile SoCs can't afford dedicated VRAM, so their GPUs just use system RAM (plus a wide variety of small caches inside the GPU), and that means they are designed with relatively efficient access to RAM (latency <1usec?). There's rarely any need to copy data, since a page of physical memory can be mapped into CPU and GPU at the same time. Some hardware has full cache coherency between the GPU and CPU so you don't even have the flush/invalidate cost.

Modern hardware and compute APIs (OpenCL 2.0+, CUDA, HSA) support shared virtual memory, where the GPU and CPU essentially use the same page tables, so the CPU can construct a complex data structure full of pointers and the GPU can access it directly with no special pointer translation and no copying.

In principle, all those features should reduce the cost and difficulty of offloading work onto the GPU. And the GPU typically has much higher FLOPS and higher FLOPS-per-watt than the CPU, so it's worth using it when you can.

In practice I suspect support for those features on mobile devices is currently somewhere between spotty and non-existent. But judging by most phone reviews, web browser benchmarks help sell phones, so a browser that benefits from these new features might encourage the SoC vendors to support the features better, which would be nice...

Parallel page rendering with Mozilla Servo

metajack — Thu, 18 Jun 2015 18:02:16 +0000

It's not that it's one long page of text. It's the floats that kill parallel performance. Wikipedia's sidebar is one big unclosed float.

For contrast, the mobile wikipedia page parallelizes amazingly well, and it's the same text :)

Parallel page rendering with Mozilla Servo

marcH — Thu, 18 Jun 2015 16:25:18 +0000

> but even "worst-case scenario" sites like Wikipedia (which is one long page of text) can be parallelized to some degree

Well, maybe this is partly because Wikipedia, using very few bells and whistles, is also some kind of "best case" for today's single-threaded renderers, no? As in: the type of pages that is *already* rendered blazing fast and the the least needing optimizations.

(Fascinating work and great article - thx)

Parallel page rendering with Mozilla Servo

mcatanzaro — Thu, 18 Jun 2015 15:09:28 +0000

I should note that WebKitGTK+ provides API and ABI stability for WebKit (with the exception of the removal of the original WebKit1 API last year, a major one-time event).

Wishing the best of luck to Servo -- it's hard to understate how impressive it will be to have a rendering engine that's immune to all the most common security vulnerabilities, once it's matured enough for use by a major browser.