LWN.net Weekly Edition for September 15, 2016

Automating hinting for every script

By Nathan Willis
September 14, 2016

At TypeCon 2016 in Seattle, Dave Crossland presented a look at the recent advancements in the free-software utility ttfautohint, which has rapidly grown from being a stop-gap measure with plenty of critics into a core part of many font-development workflows—including commercial foundries. The program automatically adds hinting information to fonts by analyzing the shapes of the included glyphs, alleviating the need to add hints manually. Challenges still remain for ttfautohint, though, including expanding the set of writing systems it can handle.

Crossland began with a disclaimer that, although he is known largely for his work with Google Fonts in recent years, ttfautohint is a distinct side project. That said, one of the early challenges faced when building the Google Fonts library was that font hinting is a notoriously complicated, labor-intensive process. The library was growing by more than a font every week in 2011, which provides considerably less time than a typical type foundry would have allotted for tasks like hinting.

So the team needed a different solution. Hinting is vital for reading text on screen, and particularly on Windows machines (which, unlike other platforms, never adopted a text-rendering model that would produce eye-pleasing results without relying on hinted fonts). The essence of hinting is the addition of embedded instructions to be read by the font renderer; instructions that keep similar letters the same size (in Latin alphabets, such alignment efforts keep lower-case letters aligned to the x-height, capitals to the capital height, and so on) and that keep glyph strokes aligned with pixel boundaries. PostScript fonts and their successors, Compact Font Format (CFF)-flavored OpenType fonts, rarely use any hinting because the PostScript rendering model takes care of such alignments internally. But TrueType fonts rely on hints that are embedded in the font and use a particular instruction set for the TrueType virtual machine.

But, many years ago, FreeType developer David Turner had developed a workaround for making those fonts without hints look good: he wrote an "autofit" module for FreeType that used many of the same concepts from the CFF renderer and applied them to TrueType fonts. It was developed, initially, because FreeType had been threatened over patent-infringement claims about the use of built-in TrueType hints. Autofit let FreeType make TrueType fonts look good on screen, essentially, through its own means.

The patents in question eventually expired, but the autofit workaround gave Crossland an idea: why not take the instructions generated and turn them into actual TrueType hints that could be merged into the font file? In 2011, he posed the question to then FreeType maintainer Werner Lemberg, who took the project on. In a few months there was a working prototype called ttfautohint.

Awkward fit

Although the "Inception-style" approach used by ttfautohint (using CFF rendering analysis of a font to inject TrueType rendering hints that will then be used by a TrueType renderer) seemed like it might be workable, Crossland said, there were some practical problems. First, hinting is usually done as an interactive process: the engineer adds hints, examines the rendered result immediately, then adjusts them. But ttfautohint was a command-line-only program with no interactive features. Second, hinting is ideally done during the design process, starting with a few core glyphs, then branching out as the font grows. But ttfautohint was a one-and-done program run only on a completed font file.

So the project set up a crowdfunding campaign to fund Lemberg's time, specifically to add a GUI and to further instrument ttfautohint with the controls needed to fine-tune the results. Other new features were targeted in the campaign, too, including hinting families of related fonts and supporting scripts other than Latin. In 2012, a few months after that campaign ended successfully, Google Fonts began using ttfautohint on fonts in its library.

But even then, the reaction from other font projects was not entirely enthusiastic. Göran Söderström famously criticized ttfautohint's output in a 2012 blog post [Wayback link], calling some of the results borderline unreadable and saying "I can’t really see the point of using ttfautohint at all." He also made a point of lamenting the fact that the hints generated by ttfautohint cannot be edited or adjusted; they are simply generated and inserted into the font file.

Work, however, continued. In 2012, Frederik Berlaen of RoboFont designed a Python-based GUI that would fit in with Apple's Mac OS X user-interface guidelines, although Crossland noted that the GUI's complexity has continued to grow and may have "gotten a little bit out of control." Other contributions began to come in from the developers of various font libraries and editors. In 2013, he noted, the project ran into a nasty bug encountered on certain laser printers, where rounding errors in the printer were causing wildly distorted results. The ttfautohint team released a fix.

Stability and extension

In 2014, ttfautohint version 1.0 was released. That version added support for a control file, which designers could use to tweak hints produced by the program—thus answering one of Söderström's criticisms. By the end of 2014, ttfautohint was in use at the commercial foundry ProductionType and was included with the proprietary Glyphs font editor.

In 2015, Lemberg began to look at how to support additional writing systems. He started by taking the Noto fonts by Google, which are open source and were extensively hand-hinted, stripping out all of the hinting, then working to reproduce the original results with ttfautohint.

There are now 22 scripts supported by ttfautohint. Crossland referred the audience to the online manual for a better description of [Blue zones] how the hinting process works. In a nutshell, ttfautohint first analyzes each font file to automatically determine the width of vertical and horizontal strokes. It does this by examining characters that resemble the Latin letter "o" (there are similarly round characters in most, although not all, writing systems). Each glyph in the font is then examined to locate the stems and each stroke segment that can be adjusted as a single unit, using an approach described in a 2003 paper [PDF] by Turner and Lemberg.

Next, ttfautohint identifies what are called "blue zones" (after terminology used in PostScript), which roughly correspond to the vertical alignment areas of the script—in Latin, as mentioned above, those are the baseline, the x-height, the capital height, and so on. Naturally, each writing system has its own intrinsic set of these blue zones; ttfautohint looks at specific character codes in each script to assess where the blue zones of the font are.

Subsequently, ttfautohint will use this information to make sure that hinting adjustments do not push glyph components out of the zones, in theory keeping the entire font well-aligned. Sets of hints are generated to align edges to the blue zones, then to align components to pixel boundaries, then to align serifs. For now, all of ttfautohint's hinting is done only to change vertical alignments, because altering horizontal alignments usually has the unwanted side effect of changing the number of lines of text.

In practice, as one might expect, there is more to the process than this: there are many special cases, and matters get more complicated once composite glyphs (like accented characters) are added to the mix. Adding support for a new writing system, in most cases, begins with the question of identifying the key characters for ttfautohint to examine and the blue zones that are important to the writing system—followed by a great deal of testing.

Acceptance among designers has increased in recent years, too, Crossland said. In March, Ramiro Espinoza announced on the TypeDrawers discussion forum that he found ttfautohint's results to now be comparable to those from Microsoft's interactive hinting program Visual TrueType—and, in some cases, ttfautohint's to be better. Jasper de Waard discovered ttfautohint and used it to build a large font family called Proza Libre, a process he wrote about online.

Wide world of scripts

Looking forward, Crossland identified two areas where ttfautohint is needing to improve next. The first is it support for hinting font families. Although a family with a lot of weights will hopefully look harmonious when the weights are used together in a page, a bolder glyph is almost always taller than a lightweight version of the same glyph, because the strokes are thicker. Or, at least, that is what happens with lowercase glyphs; capitals tend to stay the same height for a given point size.

This presents a challenge to ttfautohint's analysis, which tries to align all of the glyphs to a single height. The approach currently being explored is to provide separate controls for uppercase and lowercase blue zones.

The other big issue is adding support for Chinese, Japanese and Korean (CJK) fonts, which do not follow the patterns used by the writing systems currently supported in ttfautohint. Thus, new algorithms will need to added. Crossland noted that there is some existing work in this area already, specifically Belleve Invis's sfdhanautohint.

There will, no doubt, be additional challenges in the years to come. But ttfautohint has already come a long way from its genesis as a workaround, so it may not be out of tricks just yet.

Comments (2 posted)

Backports and long-term stable kernels

By Jonathan Corbet
September 14, 2016

One of the longest running debates in the kernel community has to do with the backporting of patches from newer kernels to older ones. Substantial effort goes into these backports, with the resulting kernels appearing in everything from enterprise distributions to mobile devices. A recent resurgence of this debate on the Kernel Summit discussion list led to no new conclusions, but it does show how the debate has shifted over time.

Anybody wanting to use the kernel in a production setting tends to be torn between two conflicting needs. On one hand, the desire for stability and a lack of surprises argues for the use of an older kernel that has been maintained under a fixes-only policy for some time. But such kernels tend to lack features and hardware support; one needs to run a recent kernel for those. The answer that results from that conflict is backporting: pick a long-term support (LTS) stable kernel as a base, then port back the few things from newer kernels that one simply cannot do without. With luck, this process will produce a kernel that is the best of both worlds. In practice, the results are rather more mixed.

The problems with backporting are numerous. It will never be possible to pick out all of the important patches to port, so these kernels will always lack important fixes, some of which could leave open severe security holes. The mainline kernel benefits from widespread testing, but a backported kernel is a unique beast that certainly contains bugs introduced by the backporting process itself. The effort that goes into the creation of backport-heavy kernels is unavailable for the task of getting vendor changes upstream, with costs to both the vendor and the community as a whole. Users of highly divergent kernels are dependent on their vendor for support and updates; the community lacks the ability to help them. And so on.

Backports in the embedded industry

Alex Shi started the discussion with a mention of Linaro's LSK backporting effort and asking if there were ways that groups doing this sort of backporting could collaborate. The development community wasn't much interested in discussing backporting collaboration, though; the conversation turned quickly to the value of backporting efforts in general instead. Sasha Levin got there first with a statement that "what LSK does is just crazy" and suggesting that, if vendors want the latest features and fixes, the best way to get them is to run mainline kernels. He was not alone in this opinion.

James Bottomley pointed out that the backporting happening in the embedded industry looks a lot like what the enterprise vendors did in the 2.4 kernel era. They ended up with patch sets that were, in some cases, larger than the kernel itself and were a nightmare to maintain. To get away from these issues, the kernel's development model was changed in 2.6 and the distributors focused on getting features upstream prior to shipping them. That has greatly reduced the load of patches they have to carry, allowed them to run newer kernels, and reduced fragmentation in the kernel community. Why, he asked, can't embedded vendors do the same?

From the discussion, it would seem that, while there are many reasons cited for shipping backported kernels, there is one overwhelming issue that keeps vendors stuck on that model: out-of-tree code. A typical kernel found in an embedded or consumer electronics setting has a vast pile of patches applied, the bulk of which have never made their way into a mainline kernel. Every time that a vendor moves to a new base kernel, this out-of-tree code, perhaps millions of lines of it, must be forward-ported. That is a huge effort with risks of its own. It is unsurprising that vendors will tend to delay doing that work as long as possible; if an older kernel can support a new device through the addition of a few more backported drivers, that is what they will do.

The longstanding "upstream first" policy says that these vendors should have pushed their code into the mainline before shipping it in their devices; then they would have no forward-porting issues when moving to a newer kernel. But "upstream first" has never been the rule in this part of the market. These products are designed, built, shipped, and obsoleted on an accelerated schedule; there is no space in that schedule for the process of getting code upstream, even if the process goes quickly — which is not always the case. Upstreaming after shipping can be done, but the vendor's attention has probably moved on to the next product and the business case for getting the code upstream is not always clear. Five or ten years in, when vendors find themselves struggling under millions of lines of out-of-tree code, they might have cause to wish they had worked more with the development community, but the people making these decisions rarely look that far ahead.

There was some talk about how the out-of-tree code problem could be addressed, but with few new solutions. As Linus Walleij noted, the only reliable solution seems to be customers demanding upstream support from their suppliers. He suggested that if Google would ever make such a requirement for its Nexus Android devices, "then things will happen". Until then, the best that can be done is to continue to talk to and pressure companies and help them to change slowly. Some of this pressure could yet take the form of changes in how stable kernels are managed.

How stable kernels fit in

While much of the conversation talked about the evils of backporting, another branch was focused on the role that stable kernels play in that ecosystem. Vendors naturally gravitate toward kernels with long-term support, including LSK, the Long-Term Support Initiative (LTSI), or the mainline LTS kernels, as the base for their backports, though, as it turns out, they don't use those kernels as their maintainers might wish.

As Tim Bird described, the kernels shipped in devices are often subject to more than one level of backporting. The system-on-chip (SoC) within the device will have been provided with a kernel containing plenty of backported code, but then the integrator who is building a product from that system will have another set of patches to add. The value of initiatives like LSK and LTSI is that they have reduced the number of kernel versions being provided by SoC vendors, making life easier for those doing backports at the integrator level. Projects like LTSI also list upstreaming of vendor code among their goals, and some of that has been done, but their most important role seems to be to serve as a common base for vendor-specific kernel forks.

There was a certain amount of unhappiness with how these long-term-supported kernels are used, though. An LTS kernel like 4.4 will be supported for at least two years; the LSK and LTSI kernels, based on LTS kernels, will have a similar support period. But SoC vendors are not actually making use of that support. Instead, they grab whatever version of the kernel is available at the time and simply stick with it going forward, ignoring any further updates to that kernel. Should a fix land in the kernel they started from that cannot be done without (a highly publicized security fix, for example), the vendors will, naturally, backport it. Product vendors then take a snapshot of the SoC vendor's kernel and ignore any updates from the SoC vendor in a similar manner. This pattern has led developers like Ted Ts'o to question the value of the entire stable-kernel process and suggest, once again, that vendors would be better off just using mainline kernels:

Why not just encourage them to get their device drivers into staging, and just go to a newer LTS kernel? Because I guarantee that's going to be less risky than taking a random collection of features, and backporting them into some ancient kernel.

Or, he said, SoC vendors could just start with a mainline release and pick their patches from subsequent releases rather than from the LTS kernel.

Time for a change?

Greg Kroah-Hartman, the maintainer of the long-term support kernels, agreed with this assessment of the situation, noting that even serious security fixes don't find their way through to the kernels shipped by vendors despite being promptly included in the LTS kernels. So he is mulling the idea of stopping the the maintenance of the LTS kernels entirely:

But if we didn't provide an LTS, would companies constantly update their kernels to newer releases to keep up with the security and bugfixes? That goes against everything those managers/PMs have ever been used to in the past, yet it's actually the best thing they could do.

Getting to the point where companies might actually see the wisdom of that approach will take some time, he acknowledged, and there will be a great deal of education required. But, he said, he has been talking to people at some vendors in the hope of improving the situation. He closed by saying there might not be a long-term support kernel next year, since it wouldn't be needed. Or, at least, "one has to dream".

In this context, it's interesting to look at this recent post from Mel Gorman, which talks about the problem of performance regressions in newer kernels. The performance hit caused by moving to a newer kernel can often be surprisingly large. It can also be difficult to fix, since it is usually the result of many patches adding a 0.1% cost rather than one or two big mistakes. The work required to get that performance back is significant, and it helps him to understand why vendors in general might be reluctant to move to newer kernels:

This is unrelated to the difficulties embedded vendors have when shipping a product but lets just say that I have a certain degree of sympathy when a major kernel revision is required. That said, my experience suggests that the effort required to stabilise a major release periodically is lower than carrying ever-increasing numbers of backports that get harder and harder to backport.

If embedded-systems vendors were to come to a similar conclusion, the result could be significant changes in how that part of the market works. The benefits could be huge. The upstream kernel would, hopefully, gain the best of the work that those vendors are carrying out of tree for now; the rest would be replaced with more general solutions that would better serve all users. Kernels shipped with devices would have more features and fewer bugs while being more secure than what is shipped now. It might actually become possible to run mainline kernels on these devices, opening up a range of possibilities from third-party support to simply enabling hobbyist developers to do interesting hacking on them. The considerable energy that goes into backporting now could be directed toward testing and improving the mainline kernel. And so on.

All of this seems like a distant dream at the moment, but our community has made a number of dreams come true over the years. It has also been quite successful in convincing companies that working with the community is the best approach for long-term success with Linux. Perhaps we are getting closer to a point where embedded-systems vendors will be willing to rethink their approach to the Linux kernel and find ways to work more closely with the development community that, in the end, they depend on to produce the software they ship. One does indeed have to dream.

Comments (25 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Security: Filesystem images & unprivileged containers; Minijail; New vulnerabilities in libarchive, mysql, webkit2gtk, xen, ...
Kernel: Exclusive page-frame ownership; TTY slave devices.
Distributions: BlackArch: a distribution for pen testing; Elementary OS, ...
Development: Network access during Debian package builds; Vim 8.0; NetBeans and Apache Incubator; Success with interns; ...
Announcements: ArduPilot and DroneCode, ...

Next page: Security>>