LWN.net Logo

Protected RAMFS

June 24, 2009

This article was contributed by Goldwyn Rodrigues

Many embedded systems have a block of non-volatile RAM (NVRAM) separate from normal system memory. A recent patch, posted by Marco Stornelli, is a filesystem for these kinds of NVRAM devices, where the device could store frequently accessed data (such as the address book for a cellphone). Protected RAMFS (PRAMFS) protects the NVRAM-based filesystem from errant or stray writes to the protected portion of the RAM caused by kernel bugs. Because it is stored in the NVRAM, the filesystem can survive a reboot, and hence can also be used to keep important crash information.

Basic Features

PRAMFS is robust in the face of errant writes to the protected area, which could arise due to kernel bugs. The page table entries that map the backing-store RAM are marked read-only on initialization. Write operations to the filesystem temporarily mark the pages to be written as writable, the write operation is carried out with locks held, and then the pte is marked read-only again. This limits the writes to the filesystem in the window when the locks are held. The write-protection feature can be disabled by the kernel config option CONFIG_PRAMFS_NOWP.

PRAMFS forces all files to use direct-IO. The filp->f_flags is set to O_DIRECT when the files are opened. Opening all files as O_DIRECT avoids page caching, and data is written immediately to a storage device. This is nearly equal to the speed of the system RAM, but it forces applications to do block-aligned I/O.

PRAMFS does not have recovery facilities, such as journaling, to survive a crash or power failure during a write operation. The filesystem maintains checksums for the superblock and inode to check the validity of the stored object. An inode with an incorrect checksum is marked as bad, which may lead to data loss in case of power failure during a write operation.

PRAMFS also supports execute in place (XIP), which is a technique that executes programs directly from the storage instead of copying it into RAM. For a RAM filesystem, XIP makes sense since the system can execute from the storage device as fast as it can from the system RAM, and it does not make a duplicate copy in RAM.

Usage

There is no mkfs utility to create a PRAMFS. The filesystem is automatically created when the filesystem is mounted with the init option. The command to create and mount a PRAMFS is:

    # mount -t pramfs -o physaddr=0x20000000,init=0x2F000,bs=1024 none /mnt/pram

This command creates a filesystem of 0x2F000 bytes, with a block size of 1024 bytes, and locates it at the physical address 0x20000000.

To retrieve an existing filesystem, mount the PRAMFS with the physaddr parameter that was used in the previous mount. The details of the filesystem such as blocksize and filesystem size are read from the superblock:

    # mount -t pramfs -o physaddr=0x20000000 none /mnt/pram

Other filesystem parameters are:

  • bpi: specifies the bytes-per-inode ratio. For every bpi bytes in the filesystem, an inode is created.

  • N: specifies the number of inodes to allocate in the inode table. If the option is not specified, the bytes-per-inode ratio is used to calculate the number of inodes.

If the init option is not specified, the bs, bpi, or N options are ignored by the mount, since this information is picked up from the existing filesystem. When creating the filesystem, if no option for the inode reservation is specified, by default 5% of the filesystem space is used for the inode table.

To test the memory protection of PRAMFS, the developers have written a kernel module that attempts to write within the PRAMFS memory with the intention of corrupting the memory space. This causes a kernel protection fault, and, after a reboot, you may re-mount the filesystem to find that the test module was not capable of corrupting the filesystem.

Filesystem Layout

PRAMFS has a simple layout, with the super-block in the first 128 bytes of the RAM block, followed by the inode table, the block usage map, and finally the data blocks. The superblock is 128 bytes long and contains all of the important information, such as filesystem size, block size, etc., needed to remount the filesystem.

[PRAMFS layout]

The inode table consists of the inodes required for the filesystem. The number of inodes are computed when the filesystem is initialized. Each inode is 128 bytes long. Directory entry information, such as filename and owning inode, are contained within the inode. This presents a problem for hard links because a hard link requires two directory entries under different directories for the same inode. Hence, PRAMFS does not support hard links. The inode format also limits the filename to 48 characters. The inode number is the absolute offset of that inode from the beginning of the filesystem.

Regular PRAMFS file inodes contain the i_type.reg.row_block field, which points to a data block which contains doubly-indirect pointers to the file's data blocks. This is similar to the double indirect block field of the ext2 filesystem inode. But, that means that a file smaller than 1 block will require 3 blocks to store it.

[PRAMFS inode]
Inodes within a directory are linked together in a doubly-linked list. The directory inode stores the first and last inode in the directory listing. The previous entry of the first inode and the next entry of the last inode are null terminated.

Write Protection

PRAMFS utilizes the system's paging unit by mapping its RAM initially as read-only. Writes to data objects first mark the corresponding page table entries as writable, perform the write and then mark them read-only again. This operation is done atomically by holding the page-table spin-lock with interrupts disabled. Following a write, stale entries in the system TLB are flushed. Write locks are held at the superblock, inode, or block level, depending on the granularity of modification.

Since PRAMFS attempts to avoid filesystem corruption caused because of kernel bugs, shared mmap() regions can only be read. Dirty pages in the page cache cannot be written back to the filesystem. For this reason, PRAMFS defines only the readpage() member of struct address_space_operations; the writepage() entry is declared as NULL.

Acceptance

This is the second attempt to get PRAMFS in the mainline. The previous attempt was done in 2004 by Steve Longerbeam of Montavista.

The home page of PRAMFS claims the filesystem to be fully-featured. But, as part of the linux-kernel discussion, Henrique de Moraes Holschuh strongly disagreed:

It is not full-featured if it doesn't have support for hardlinks, security labels, extended attributes, etc. Please call it a specialized filesystem instead, that seems to be much more in line with the comments about pramfs use cases in this thread...

There are not enough performance benchmarks information against other filesystems, yet, to form an opinion. Performance tests done while adding Execute in Place (XIP) reveal a performance as low as 13Mbps for per-character writes and 35Mbps for block writes using bonnie. Pavel Machek considers these numbers to be pretty low, especially for a RAM-based filesystem:

Even on real embedded hardware you should get better than 13MB/sec writing to _RAM_. I guess something is seriously wrong with pramfs.

No tests have been performed using existing solutions, such as ramdisk on the same hardware, to compare apples with apples. The low performance is attributed to the excessive locking done for writes. Pavel believes the developers of PRAMFS are confused regarding the goals of the filesystem, and whether they are designing for speed, completeness, or robustness.

PRAMFS is a niche filesystem, mostly for embedded devices with NVRAM, and hence lacks important features, such as hard links and shared mmap()s. However, for quite a number of situations an entire filesystem seems like overkill. Pavel suggests a special NVRAM-based block device with a traditional filesystem or a filesystem based on Solid State Device (SSD) filesystems would be a better option. With the current number of objections, PRAMFS is unlikely to go into the mainline. However, Marco plans to further improve the code with more features, and to update the PRAMFS homepage to better reflect the filesystem's goals.


(Log in to post comments)

Why yet another filesystem?

Posted Jun 25, 2009 11:37 UTC (Thu) by epa (subscriber, #39769) [Link]

It's not clear why this is a whole new filesystem; surely the question of protecting a RAM device against kernel crashes is orthogonal to how the data is laid out on that device. If blocks need to be written, the device driver can lock and unlock the pages as needed. Similarly, checksumming each block could be done in a device driver layer below the filesystem (so the filesystem might see 4096 byte blocks, although they are stored in memory as 5000 bytes to allow checksums). Even if you do need the filesystem to do some special checking, why not start with an existing one such as minixfs?

These are all ignorant questions but the article doesn't have much rationale for why the new filesystem is needed.

Why yet another filesystem?

Posted Jun 27, 2009 3:04 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

It's not clear why this is a whole new filesystem; surely the question of protecting a RAM device against kernel crashes is orthogonal to how the data is laid out on that device.

I think what you're saying is that this could be done as a whole new block device type with an existing block-device-based filesystem type instead of as a whole new filesystem type.

I believe pramfs recognizes that block devices are appropriate for disk drives and any other use is a stretch. Over the years, people have used block devices for things other than disk drives -- essentially emulating disk drives -- in order to leverage existing filesystem code intended for disk drives. But if you're willing to write the filesystem code, you can get a better result without the emulation.

Indeed, we used to have traditional disk filesystems on a ramdisk; now we prefer ramfs.

Protected RAMFS

Posted Jun 25, 2009 15:01 UTC (Thu) by nix (subscriber, #2304) [Link]

With all those pagetable changes and TLB flushes I'm surprised that performance is as high as 13Mb/s. 26 million pagetable changes per second is a hell of a lot more than I thought we could manage even on a fast system.

Protected RAMFS

Posted Jun 25, 2009 15:09 UTC (Thu) by jake (editor, #205) [Link]

> 26 million pagetable changes per second

I don't think it does 2 page table changes per *byte*, presumably per page, which is still rather a lot.

jake

Protected RAMFS

Posted Jun 25, 2009 19:05 UTC (Thu) by nix (subscriber, #2304) [Link]

Ah, right: I was stuck behind a stupid corporate firewall at the time so
couldn't check the code. I was assuming a pagetable change on every
write(), but I guess they could amortize it to some extent.

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds