November 19, 2007
This article was contributed by Jaya Kumar
The familiar CRT monitors or backlit LCD screens on our desks continuously
consume power in order to hold an image. Electronic paper (e-paper) is
different: power is only needed to change the image. Just like paper,
e-paper is able to hold the image permanently without consuming any power.
Displays using CRT, backlit LCD, plasma and OLED technologies are all
emissive, meaning that they have to produce the photons that reach the eye.
This implies that they have to compete in brightness with ambient lighting,
which can result in eye strain. E-paper is the opposite: it is reflective,
which makes it possible to read the display using ambient light even in the
brightness of a hot sunny day.
E-paper is referred to as a bistable or non-volatile technology because of
its ability to hold a specific pixel state without power. There are several
variations of e-paper; they differ in terms of which physical mechanism
is used to achieve the non-volatility of the display. These mechanisms
include interferometric
modulation, bi-stable twisted nematic liquid
crystal [PDF], cholesteric
liquid crystal, and electrophoretic
phenomena.
Interferometric modulation uses the same principle of light wave
interference that results in the rainbow of colors seen with oil floating on
water. Control of wave interference through bi-stable or multi-stable
micro-electro-mechanical systems (MEMS) is what enables electronic control
of the color of a pixel.
In standard twisted nematic liquid crystal displays (TNLCD), the liquid
crystal is sandwiched between two rubbed polymer orthogonal alignment
layers. Bi-stable twisted nematic implementations such as Zenithal liquid
crystal replace the first or both alignment layers in favour of a sub-micron
relief profile that weakens anchoring to the surface and makes it possible
to latch various stable orientations of a liquid crystal pixel using
electrical pulses.
Cholesteric liquid crystal provides the ability to selectively reflect
various ranges of wavelengths of visible light based on the pitch of the
liquid crystal. The pitch can then be electronically controlled to set
various pixel states.
Electrophoresis describes the fact that particles within a fluid can be
kinetically affected by an electrical field. Basically, applying a voltage
pulse causes pigment particles within a solvent solution to move. This
concept is what is used to control whether a pixel appears black, white or a
shade of gray. This article will focus on electrophoretic displays
since they are relatively easy to obtain.
Controllers
Traditional display controllers are interfaced to the host using a bus such
as PCI Express or AGP. These controllers have local framebuffer memory or
sufficient internal line buffering to utilize shared host memory; they
expose their framebuffers through memory mappable regions.
Display servers like Xorg or Xfbdev that utilize the kernel's fbdev
interface expect to be able to mmap() the device framebuffer. The
implication is that a driver that implemented only write()/seek() access to
the framebuffer would have limited usage.
Electrophoretic displays require specialized controllers that are capable of
driving suitable waveforms in order to control the display media. This is
because of subtle issues around the behavior of pigment particles within a
solvent. The controller must drive waveforms that result in fast,
reproducible and optimal movement of pigment particles. These waveforms are
a key factor in minimizing pixel update latency, achieving good contrast and
reducing ghosting effects in the output image. Currently, electrophoretic
display updates are significantly slower than CRT or LCD display updates.
For example, a grayscale update of E-Ink's most recent Vizplex display
material can take up to 740ms. This latency has an effect on how hardware is
interfaced with electrophoretic display controllers and how software should
then interact with the display.
One of the electrophoretic display controllers for which Linux support has
been posted (tarball) is
a controller from E-Ink called Apollo. This controller is
interfaced to the host through 8-bit data and 6-bit control over General
Purpose IO (GPIO) interfaces. The implication of the use of GPIO is that it
is not a memory mappable interface. Each pixel of the framebuffer has to be
wiggled to the controller by turning individual GPIO lines on and off. Display
updates on the Apollo with an E-Ink 6" panel with a resolution of 800x600
and 2 bits of grayscale require between 500ms - 1200ms. Given this set of
circumstances, it would have been an option to implement a userspace library
or support code that performed the GPIO wiggling. However, such an
implementation would forfeit support from Xfbdev and other common
fbdev compatible applications.
An early driver implementation has also been posted for an E-Ink controller
named Metronome.
This controller interfaces to the host using the Active
Matrix LCD (AMLCD) bus. AMLCD is a 16-bit data bus used to interface LCDs
with CPUs. Normally, the AMLCD bus is used to transfer video display
data only, but, in the case of the Metronome controller, the host transfers a whole
slew of things including waveform, command and image data. The Metronome
becomes a secondary display controller feeding on the output of the primary
display controller on the host. Since AMLCD is an output-only data path, two
GPIO pins are used to retrieve status from the controller.
Many embedded processors provide a built-in LCD controller (LCDC) that is
compatible with the AMLCD interface. For example, the Xscale pxa255 cpu has
an LCDC that has DMA support and is able to pull data directly from host
memory at specified intervals. This type of capability allows drivers to
remap host memory to form an mmap-able framebuffer. However, the Metronome
controller imposes an additional requirement beyond delivering image data
for each display update. This is the need for a specific display update
command that has to be formed and set each time the display is to be
updated. This means that the framebuffer driver needs to know when the
framebuffer has been updated. That is not a trivial task because the nature
of a memory-mapped framebuffer is that the driver is not involved in
changes to the buffer; it is
therefore unaware of when the framebuffer has been written to by a userspace
application.
The three problems described so far can therefore be summarized as follows:
- How to memory map a "non memory mappable" IO interface like GPIO.
- How to mitigate the latency associated with display updates.
- How to cheaply detect when userspace has written to a memory mapped
address.
One early solution to problem 3 was to use a timer and perform framebuffer
differencing to detect the changed pixels. The negative aspects of this
solution are that it requires a large amount of redundant memory and
significant cpu and memory bandwidth consumption every time that framebuffer
differencing is done. Both of these resources are scarce on embedded systems
and, therefore, that solution was not satisfactory.
Deferred IO
Deferred IO is an alternative method of solving these problems. The key concept
behind deferred IO is that one can periodically mark an active page of host
memory as read-only in order to catch writes to it. The way it works is quite
straightforward: page table entries for framebuffer pages in host memory are
initially marked as read-only. When the application first writes to any memory
address that maps to any of those pages, a deferred IO specific page fault
handler is reached. This handler schedules a delayed workqueue job. In the interval
before this workqueue is executed, the application can continue to write to
that page with no additional cost.
When the workqueue task executes, it then marks
the page table entry as read-only and then processes the framebuffer data
stored in that page. At that point, the processed data can be delivered to the
device through its native IO interface, which could be GPIO, AMLCD, USB, or
anything else. Since the page was re-marked to read-only, the sequence would
repeat if the application ever rewrote that page. This is somewhat similar
to a writeback cache. Host memory is used as a cache for device memory or
any output destined for the device. The page fault is then used as a
trigger to determine when to actually "writeback" this memory to the device.
This technique solves problem 1 because host memory is used and can therefore be
memory mapped. The output from the application intended for the device is
written to host memory and, unlike hardware supported memory mapped IO,
this output is not transfered to the device for each memory write. It is
only after the driver specified delay has expired that this collected data
is transfered to the device. The fact that the transfer would be through
GPIO or any other mechanism is transparent to the application and requires
no intervention.
The delay between the page fault and the IO is what addresses problem 2. The
application sees only a framebuffer which happens to be in host memory.
Writes to the framebuffer are therefore as fast as writes to any other part
of memory. The display update latency is therefore transparent to the
application. The driver specified interval should be selected to be
appropriate for the latency of the device. For example, if the device has a
one-second display update latency, then a one-second delay would be reasonable.
A longer delay would result in the display being less interactive than what
it was really capable of. A shorter delay would result in host updates
building up since the device would not be able to keep up. Applications that
require display synchronization primitives could use fsync() or the
FBIO_WAITFORVSYNC ioctl depending on their needs.
Problem 3 is solved because the address that is the cause of the page fault
is known. Internally, deferred IO uses the memory management subsystem's
page_mkwrite() callback and page_mkclean() to implement the core of its
functionality. The current deferred IO implementation passes a list of
page structures to the framebuffer driver's deferred IO callback. The driver
can then use page->index to identify which part of the framebuffer was
written to. This provides PAGE_SIZE granularity in identifying the updated
pixels.
Status
This method works fine in common use cases. For example, if one were to run
xpdf and use page-up to flip through pages, then xpdf would render to the
framebuffer in host memory on each page-up. Then, at the end of each write
induced interval, the driver would deliver the current image to the display.
This would give the effect where one would be seeing the most recent page on
the display rather than every single page that had been flipped through.
This enables the system to be reasonably interactive. Applications like
xclock (an analog clock ticking every second) as well as playback
applications (displaying a slider showing playback position) behave in a
similar fashion.
Deferred IO support was merged into the Linux kernel in 2.6.22;
Documentation/fb/deferred_io.txt contains additional information. The driver
for the Apollo controller was also merged in 2.6.22 and is in
drivers/video/hecubafb.c. The driver for the Metronome controller
is posted
but not yet complete; it also
includes necessary bugfixes for deferred IO.
The current development focus is on the Metronome controller. It is being
tested with a Gumstix Connex board which has an Xscale pxa255 CPU. The
display media that is being used is an E-Ink Vizplex 6" 800x600 panel with 3
bits of grayscale. The metronomefb driver for this controller uses deferred
IO and is still a work-in-progress but it is capable of running Xfbdev. X
clients such as xclock, xeyes, xlogo and xloadimage have been run without
problems. It is not yet clear how to measure framebuffer performance on such
a system; the reason for this is that most display benchmarks use the time
for a drawing operation to complete as the basis for performance statistics.
On this system, such a benchmark would be merely measuring time to render to
host memory rather than time to deliver to the actual display. It may be
necessary to develop an alternate method of measuring display system
performance for e-paper displays.
All is not yet perfect.
Applications that render images that affect only a small number of pixels
but cross multiple pages because of the framebuffer layout (eg: a thin
vertical image) result in reduced efficiency. This is because the ratio of
changed pixels to the number of written pages is low.
The architectural weakness of deferred IO is that it depends on the system
having an MMU. It may be possible to implement a similar approach using the
lower level memory protection capabilities that are available on some no-MMU
systems. For example, the Blackfin architecture has a Data Cacheability
Protection Lookaside Buffer (DCPLB) that has notions of read/write
permissions on its entries. This will be an interesting area for future
exploration.
The current implementation only works with framebuffers allocated from
virtual memory. Support needs to be implemented to achieve the same functionality
with memory obtained from kmalloc() or the DMA layer.
There have been suggestions that this technique may be useful in other
areas. One scenario that has been mentioned is optimizing display bandwidth
consumption by switching between DMA and plain memory copies based on the
number of written pages. Another scenario is USB-to-VGA adapters. It may
also be the case that any device connected via a relatively slow bus where
the data flow is primarily output could benefit from a similar approach.
Acknowledgments:
the author is grateful to E-Ink engineers for their extensive support and
hardware help, Peter Zijlstra, Antonino Daplas, Paul Mundt, Geert Uytterhoeven,
Hugh Dickins, James Simmons and others for mm, fbdev, and general help.
(
Log in to post comments)