The libferris virtual filesystem

November 19, 2008

This article was contributed by Ben Martin

The Unix mantra "everything is a file" gives you great flexibility over where you store your data and how information is manipulated and replicated. Unfortunately, many things in Unix and Linux are not files, or ones that you might want to interact with anyway. For example, a PostgreSQL database is ultimately stored in a collection of binary files though you probably wouldn't want to interact with those files directly. Instead of storing settings in a collection of tiny files, many applications use XML to store settings in a single file but then have to deal with parsing XML instead of just reading little files. libferris lets you mount both PostgreSQL and XML and provides you with a useful way to interact with the data contained in both as a virtual filesystem.

Other operating systems like Plan 9 pushed the envelope further than Unix, making more things "just a file". Unfortunately, to use Plan 9 you had to abandon your trusty old Unix roots and jump to an entirely new operating system.

I started the libferris virtual filesystem project back in 2001 to push the "everything is a file" concept further, it was all implemented on a Linux base. Libferris is a virtual filesystem implemented as a shared library with FUSE bindings. Because FUSE is already in the Linux kernel you don't have to do any kernel patching to use libferris. Because libferris is a shared library and not in the kernel, it can use other libraries to help it mount data sources like XML, relational databases and Emacs to name a few. And as an upshot of being out of kernel, I can work on letting libferris mount anything I like no matter how strange it might be without any third party approval.

There are actually two ways to use libferris -- through a native C++ interface and using the normal Unix APIs with FUSE. The FUSE interface is very useful if you want to rsync(1) some structured information from an XML file into a PostgreSQL database. Just mount them both with FUSE and rsync away. Another few interesting things you can do with the FUSE interface is expose data as a virtual office document using XSLT stylesheets that libferris processes for you as well as geotagging with Google Earth.

The design of libferris revolves around two primitives: exposing file contents as C++ std::iostreams, and rich metadata support through an interface similar to Extended Attributes (EA) attr_get(3). Since then libferris has gained sophisticated support for indexing both the full text contents of files as well as their metadata. Libferris is written in C++ and aims to take full advantage of the language. Interfaces are designed to be as easy to pickup for C++ programmers as possible, for example, displaying a directory can be done using iterators, find(), begin() to end() etc.

Both the types of things that libferris can provide as virtual filesystems and the metadata handling are done through a plugin interface. The handling of metadata is done through the Extended Attributes (EA) interface. This EA interface is also virtualized -- if you write an attribute to file:///foo/bar and the kernel filesystem supports extended attributes, then the value will be saved in a kernel level EA using attr_set(3). On the other hand if file:///foo/bar happens to exist on a network filesystem that does not support EA, then your value is saved in RDF by libferris. In both cases the value can be read again using an identical interface.

Looking at filesystems in an abstract way -- a hierarchy of files, file contents, and metadata associated with files and directories as key-value pairs -- there is somewhat of a resemblance to the data model of XML. Although there are obvious differences: XML elements can have multiple text nodes as contents, an XML element does not need to have specific unique names for each child XML element and so on. In many cases it can be advantageous to smooth over the differences and view a filesystem as XML and vice versa. Over the years libferris has gained the ability to interact with it's virtual filesystems as virtual Document Object Models (DOM)s. The reverse is also true, you can take an xerces-c DOM and interact with it as a virtual filesystem. Using virtual DOMs makes it easy to create a view of a filesystem using a browser and XSLT. See xml.com for information on using XQuery against a libferris virtual filesystem.

The ability to mount XML and Berkeley db4 data as filesystems has long been a part of libferris. If you want to store a filesystem inside a platform independent format, then using XML is great, whereas the speed of individual file look up in a Berkeley db4 database of many many file records can come in handy. Each format has its advantages, but they are all just virtual filesystems as far as libferris is concerned.

When a filesystem can offer what it likes through key-value pairs (EA) associated with files, relational databases can also be viewed as a virtual filesystem. Databases, views, tables and result sets become directories, tuples become files named by the value of their primary key, and the individual values of tuples are exposed as Extended Attributes on their tuple file. Again, PostgreSQL appears just like another virtual filesystem. For relational data there are a few caveats, for example, to create a new "file" in a table you must supply at least the primary key EA as well as any EA which are explicitly marked "not null" in the database.

Libferris will automatically mount many filesystems for the user. For example, if you try to read an XML file as though it is a directory then libferris will implicitly mount it as one for you. This does blur the lines between what is a directory and what is a file in the system. There is some additional metadata that libferris makes available if you would like to avoid the automatic mounting. For example, if you wish not to descend into XML files then read the is-file metadata and if it is true do not attempt to descend into the file.

One of the motivations for creating libferris as a project of its own was to be able to expose anything that I felt could be interacted with in an interesting manner as a filesystem as one. So libferris can mount some things that folks might not think of as filesystems -- including Firefox, Emacs, DBus, LDAP, Evolution, Amarok, klipper, xmms, X Window System and gphoto2.

The metadata plugins for libferris currently support extracting information from file formats automatically, for example, EXIF, XMP and ID3 tags. Metadata overlays are also supported, so you can see what tags you have associated with an image in f-spot through extended attributes in libferris. I use the term overlays because a central repository of tag data (in this case from f-spot) is scattered over an entire filesystem in libferris. The lower level metadata plugins handle more standard extended attributes usage, for example using attr_set(3) to store values or saving them in RDF.

Many of the standard utilities have been rewritten to use the native libferris API and take advantage of extra features it offers. Things like ls, cp, mv, rm, cat, io-redirection, touch, head and tail all have native libferris versions which are shipped with the main tarball. These all also serve as code samples for how to use the libferris API. Extensions to the normal clients include the ability to output directory listings in XML for ferrisls, ferriscp has the ability to use memory mapped IO as well as the more standard open(), read() and write() calls to perform the copy. Using memory mapped IO this way also uses the madvise(2) MADV_SEQUENTIAL call to let the kernel correctly select caching policy.

The indexing support in libferris is also handled using plugins. Two different indexing plugin types exist; full text and metadata. There are two types of plugin, because the strategy for how to create an index can be quite different depending on if you are performing a search for some words in a document text or if you wish to find files with certain metadata values. Using inverted files can be great for resolving a ranked full text query for "alice wonderland" but finding all files in either my home directory or /pictures that have been modified in December 2008 can be solved in a number of ways.

There are currently indexing plugins for CLucene, Lucene, LDAP, Federations of other libferris indexes, ODBC, PostgreSQL, Redland (RDF), Xapian, Beagle, Strigi and some custom designs. There are likely to be more index plugins explicitly designed to work on NAND Flash in the future. Those interested in indexing and libferris should see this article.

A major advantage of closely combining the index and search operations into the virtual filesystem is that anything the virtual filesystem can see can be indexed. When searches are performed you should also be able to interact with any of the results as a virtual filesystem. This avoids the issue where a discrete search library might return a URL that the client can not do anything with.

So, what does it look like to code using libferris? Most objects in ferris are smart pointers, many using intrusive reference counting. The type for such objects is prefixed with "fh_" to indicate a ferris handle. The notion of files and directories is amalgamated into a single "Context" abstraction. To get a smart pointer to a filesystem path the Resolve() function is used. So without further ado, to get a file and its metadata with libferris:

fh_context c  = Resolve( "~/myfile" );
{
  // let the scope close it for me
  fh_istream ss = c->getIOStream( ios::trunc );
  ss << "Bah!" << endl;
}
// std::string getStrAttr( fh_context, eaname, default-value, ... )
string filename = getStrAttr( c, "name", "" );
string md5sum   = getStrAttr( c, "md5", "" );
cout << "the filename should be myfile:" << filename << endl;
cout << "the md5 checksum is:" << md5sum << endl;
setStrAttr( c, "foo", "bar" );
fh_attribute a = c->getAttribute("foo");
fh_istream ass = a->getIStream();
cout << "Getting the metadata again:";
copy( istreambuf_iterator<char>(ass),
      istreambuf_iterator<char>(),
      ostreambuf_iterator<char>(cout));
cout << endl;

Libferris is steadily gaining commercial interest. Currently I provide things like custom builds of libferris, explicit support for new test cases in the core regression test suite that are important to clients and of course extensions to libferris to perform a specific task that might be desired.

There are packages available for both 32 and 64-bit Fedora 8, 9 and Ubuntu 7.10 gusty as well as 32bit packages for openSUSE 10.3. Unfortunately there is currently a bug in building 64bit stldb4 on openSUSE. Install the libferris-suite package to pull in all the dependencies.

Feel free to email the witme-feris mailing list or add comments to this article suggesting any weird and wonderful (and obscure) filesystems you have experienced in the past. Though my libferris.TODO file always grows more than it shrinks, I'm always happy to add new and exciting suggestions near the top of it.

Augeas

Posted Nov 20, 2008 12:18 UTC (Thu) by rwmj (subscriber, #5474) [Link] (1 responses)

[Shameless self-promotion of another Red Hat project ...]

You might also want to have a look at Augeas, similar idea, written in C and comes with a nice command line tool. Augeas works with existing file formats through a series of flexible modular extensions called "lenses", eg. there are lenses for all the usual /etc files found on a typical Linux system.

The command line interface is neat too. See this example.

Rich.

Augeas

Posted Nov 20, 2008 12:45 UTC (Thu) by monkeyiq (guest, #55153) [Link]

The "another" in the first line might incorrectly imply that there was a shameless promotion of a first Red Hat project in the article. Anyway...

I've been meaning to mount Augeas since I dug into it a while back. I think that the XQuery/XUpdate stuff in libferris working on a ferris mounted Augeas would be kind of a fun toy :)

The libferris virtual filesystem

Posted Nov 20, 2008 14:31 UTC (Thu) by ericvh (guest, #29315) [Link]

Of course the Plan 9 idea of virtual files is also directly available in
Linux (without kernel patches) via the 9P file system support. Various
language bindings and packages exist for building virtual file systems
directly on top of 9P: http://9p.cat-v.org/implementations. The one big
difference is all the 9P file systems are implicitly accessible over the
network.

So, what is libferris?

Posted Nov 20, 2008 18:55 UTC (Thu) by bcopeland (subscriber, #51750) [Link] (2 responses)

It sounds like an interesting application.

However, I find the article a bit confusing (or I am dumb). Is it a C++ FUSE wrapper? Is it a set of virtual filesystems? Is it a single FUSE filesystem that mounts config files? Is it a single FUSE filesystem that mounts a random collection of items? It would be nice to have an intro statement of the form "libferris is an X that does Y." It sounds like it can do a lot more than mount XML files or databases, for example.

So, what is libferris?

Posted Nov 20, 2008 22:16 UTC (Thu) by monkeyiq (guest, #55153) [Link] (1 responses)

* Is it a C++ FUSE wrapper?

No, it has an implementation that lets you use FUSE to run it through the kernel if you want. I also have a C++ FUSE wrapper as a separate project but that's another story.

* Is it a set of virtual filesystems?

Yes, there are a bunch of virtual filesystems contained in libferris. One for mounting Berkeley db, one for XML, one for PostgreSQL, etc, etc.

* Is it a single FUSE filesystem that mounts config files?

The whole show can be exposed through the single FerrisFUSE filesystem. Including PostgreSQL, XML etc. The config file mounting is actually a weak point in libferris right now compared to mounting other things.

* It would be nice to have an intro statement of the form "libferris is an X that does Y."

The problem with this is that the project has become large over time so there are various values for X and Y. You could say X=VFS and Y=expose many strange data stores as a filesystem. Equally, X=XQuery store and Y=Allow XQuery on PostgreSQL. Or using desktop search, intranet search, metadata stuff including RDF as the pool to draw X from.

* It sounds like it can do a lot more than mount XML files or databases, for example.

Yep. For example, the index and search stuff took quite a bit of hacking. On the metadata front interesting things like overlaying F-Spot metadata as EA, so you can change your image tags using the same interface as the rest of the metadata in libferris. Having all metadata available through libferris means a single search interface can find files using metadata from multiple locations.

So, what is libferris?

Posted Nov 21, 2008 16:13 UTC (Fri) by bcopeland (subscriber, #51750) [Link]

Thanks, that explanation helps a lot. I'll have to check it out.