June 24, 2009
This article was contributed by Goldwyn Rodrigues
Many embedded systems have a block of non-volatile RAM (NVRAM)
separate from normal system memory. A recent patch,
posted
by Marco Stornelli, is a filesystem for these kinds of NVRAM
devices, where the device could store frequently accessed data (such as
the address book for a cellphone). Protected RAMFS (PRAMFS) protects the
NVRAM-based filesystem
from errant or stray writes to the protected portion of the RAM caused
by kernel bugs. Because it is stored in the NVRAM, the filesystem can
survive a reboot, and hence can also be used to keep important crash
information.
Basic Features
PRAMFS is robust in the face of errant writes to the protected area, which could
arise due to kernel bugs. The page table entries that map the
backing-store RAM are marked read-only on initialization. Write
operations to the filesystem temporarily mark the pages to be written
as writable, the write operation is carried out with locks held, and
then the pte is marked read-only again. This limits the writes to the
filesystem in the window when the locks are held. The
write-protection feature can be disabled by the kernel config option
CONFIG_PRAMFS_NOWP.
PRAMFS forces all files to use direct-IO. The filp->f_flags
is set to O_DIRECT when the files are opened. Opening all files as
O_DIRECT avoids page caching, and data is written immediately to a
storage device. This is nearly equal to the speed of the system
RAM, but it forces applications to do block-aligned I/O.
PRAMFS does not have recovery facilities, such as journaling, to
survive a crash or power failure during a write operation. The
filesystem maintains checksums for the superblock and inode to check
the validity of the stored object. An inode with an incorrect checksum
is marked as bad, which may lead to data loss in case of power failure
during a write operation.
PRAMFS also supports execute in place
(XIP), which is a technique that executes programs directly from the
storage instead of copying it into RAM. For a RAM filesystem, XIP makes
sense since the system can execute from the storage device as fast as it
can from the system RAM, and it does not make a duplicate copy in RAM.
Usage
There is no mkfs utility to create a PRAMFS. The filesystem is
automatically created when the filesystem is mounted with the
init
option. The command to create and mount a PRAMFS is:
# mount -t pramfs -o physaddr=0x20000000,init=0x2F000,bs=1024 none /mnt/pram
This command creates a filesystem of 0x2F000 bytes, with a block size of
1024 bytes, and locates it
at the physical address 0x20000000.
To retrieve an existing filesystem, mount the PRAMFS with the physaddr
parameter that was used in the previous mount. The details of the
filesystem such as blocksize and filesystem size are read from the
superblock:
# mount -t pramfs -o physaddr=0x20000000 none /mnt/pram
Other filesystem parameters are:
- bpi: specifies the bytes-per-inode ratio. For every
bpi bytes in
the filesystem, an inode is created.
- N: specifies the number of inodes to allocate in the inode
table. If the option is not specified, the bytes-per-inode ratio is
used to calculate the number of inodes.
If the init option is not specified, the bs,
bpi, or N options are ignored
by the mount, since this information is picked up from the existing
filesystem. When creating the filesystem, if no option for the inode
reservation is specified, by default 5% of the filesystem space is
used for the inode table.
To test the memory protection of PRAMFS, the developers
have written a kernel module that attempts to write within the
PRAMFS memory with the intention of corrupting the memory space. This
causes a kernel protection fault, and, after a reboot, you may re-mount
the filesystem to find that the test module was not capable of
corrupting the filesystem.
Filesystem Layout
PRAMFS has a simple layout, with the super-block in the first
128 bytes of the RAM block, followed by the inode table, the block
usage map, and finally the data blocks. The superblock is 128 bytes
long and contains all of the important information, such as filesystem
size, block size, etc., needed to remount the filesystem.
![[PRAMFS layout]](/images/pramfs_layout.png)
The inode table
consists of the inodes required for the filesystem. The number of inodes
are computed when the filesystem is initialized. Each inode is 128
bytes long. Directory entry information, such as filename and owning
inode, are contained within the inode. This presents a problem for
hard links because a hard link requires two directory entries under different
directories for the same inode. Hence, PRAMFS does not support hard
links. The inode format also limits the filename to 48 characters. The inode
number is the absolute offset of that inode from the
beginning of the filesystem.
Regular PRAMFS file inodes contain the i_type.reg.row_block field,
which points to a data block which contains doubly-indirect pointers to the
file's data blocks. This is similar to the double
indirect block field of the ext2 filesystem inode. But, that means that a file
smaller than 1 block will require 3 blocks to store it.
![[PRAMFS inode]](/images/pramfs_inode.png)
Inodes within a directory are linked together in
a doubly-linked list. The directory inode stores the first and last
inode in the directory listing. The previous entry of the first inode
and the next entry of the last inode are null terminated.
Write Protection
PRAMFS utilizes the system's paging unit by mapping its RAM
initially as read-only. Writes to data objects first mark the
corresponding page table entries as writable, perform the write and
then mark them read-only again. This operation is done atomically by
holding the page-table spin-lock with interrupts disabled. Following a
write, stale entries in the system TLB are flushed. Write locks are
held at the superblock, inode, or block level, depending on the
granularity of modification.
Since PRAMFS attempts to avoid filesystem corruption caused because of
kernel bugs, shared mmap() regions can only be read. Dirty pages
in the page
cache cannot be written back to the filesystem. For this reason,
PRAMFS defines only the readpage() member of
struct address_space_operations; the writepage() entry
is declared as NULL.
Acceptance
This is the second attempt to get PRAMFS in the mainline. The
previous attempt was done in
2004 by Steve Longerbeam of Montavista.
The home page of PRAMFS claims
the filesystem to be fully-featured. But, as part of the linux-kernel
discussion, Henrique de Moraes Holschuh strongly disagreed:
It is not full-featured if it doesn't have support for hardlinks,
security labels, extended attributes, etc. Please call it a
specialized filesystem instead, that seems to be much more in line
with the comments about pramfs use cases in this thread...
There are not enough performance benchmarks information against other
filesystems, yet, to form an opinion. Performance tests
done while adding Execute in Place (XIP) reveal a performance as low as
13Mbps for per-character writes and 35Mbps for block writes using bonnie.
Pavel Machek considers these numbers to be
pretty low, especially for a
RAM-based filesystem:
Even on real embedded hardware you should get better than 13MB/sec
writing to _RAM_. I guess something is seriously wrong with pramfs.
No tests have been performed using existing solutions, such as ramdisk
on the same hardware, to compare apples with apples. The low
performance is attributed to the excessive locking done for writes.
Pavel believes the developers of PRAMFS
are confused
regarding the goals of the filesystem, and whether they are designing for
speed, completeness, or robustness.
PRAMFS is a niche filesystem, mostly for embedded devices with NVRAM,
and hence lacks important features, such as hard links and shared
mmap()s. However, for quite a number of situations an entire
filesystem seems like overkill. Pavel suggests a special NVRAM-based block device
with a traditional filesystem or a filesystem based on Solid State Device
(SSD) filesystems would be a better option. With the current number of
objections, PRAMFS is unlikely to go into the mainline. However, Marco plans
to further improve the code with more features, and to update the
PRAMFS homepage to better reflect the filesystem's goals.
(
Log in to post comments)