Weekly Edition Return to the Kernel pageSponsored link Serve your customers, not your servers, with VERIO Linux VPS. Full-access test-drive here. |
The future of the Linux filesystem
The upcoming release of Microsoft's "Longhorn" version of Windows is two
years off by the best estimates, but some people are beginning to worry
about whether Linux will be able to compete with the features that Longhorn
is promising. Even when factoring in the (often significant) differences
between what Microsoft promises ahead of time and what it actually
delivers, some feel that Longhorn might be good enough to be worth thinking
about.
The Longhorn feature that attracts the most attention is WinFS, a new filesystem. WinFS will push an SQL-based database management system into the filesystem layer, enabling users to use searches to find their files. With some attention to metadata, Longhorn users will be able to ask the system to find, say, all of their William Shatner MP3s or all images of Tux the penguin in a swimsuit. Applications will be able to set up their own schemas for their specific object types; if mail agents can agree on a email message schema, then users should be able to switch easily between them. Making all of this work well could be an interesting challenge. Making applications work well on top of WinFS will be another one. Even so, some people get the sense that Microsoft might just come out with something that people will want to use. If Linux wants to be able to compete on the desktop, it may have to provide a WinFS-like interface too. There are two projects out there which could provide something similar to WinFS's capabilities. Thankfully, neither one proposes to put an SQL query engine into the kernel. One is ReiserFS, a topic which has been covered here before. Hans Reiser believes that the existence of any sort of storage layer above the filesystem implies that the filesystem itself has failed in its duty to organize information in the required way. His Naming System Venture paper describes a world where filesystems impose no structure on data, leaving that task instead to the user. A query language (not SQL) would enable files to be found via free-form searches. In the Reiser vision, everything - even complex databases - could be implemented directly in the filesystem. The current state of ReiserFS is far from that vision. Work so far has concentrated on the infrastructure that will be necessary to implement the wider vision - and on the features that can attract funding for their development. The Reiser4 filesystem, which is in testing now, adds features like built-in transactions, even better small file performance, and a well-developed plugin architecture which makes it easier to add advanced features to the filesystem. The Reiser4 developers hope to get it into the 2.6 kernel, but it is not clear whether that will happen. The other approach doesn't involve the kernel at all. The GNOME Storage project plans to "replace the traditional filesystem with a new document store," but, in fact, it is built on top of existing filesystems and operates entirely in user space. GNOME storage is accessed via (a modified version of) gnome-vfs, so it can operate in user space and be used by GNOME applications without modifying those applications. Underneath the hood, GNOME Storage uses PostgreSQL as its object store, though efforts are being made to make the system portable to other databases. GNOME Storage has an ambitious set of goals; see the features document to see where they are heading - and what has already been done. Where either of these projects will end up is unclear at this time. What is clear, however, is that interesting work is being done in the area of Linux object storage. By the time Longhorn starts showing up on desktops, it might not be the only system with an interesting new approach to storing user data. (Log in to post comments)
The future of the Linux filesystem Posted Nov 6, 2003 2:38 UTC (Thu) by flewellyn (subscriber, #5047) [Link] Y'know, SOME kind of database would not necessarily be a bad thing for filesystems. As it is now, we have slocate, which works for files that are already there, but not when you put new ones on, unless you redo the whole db after doing so. CRON jobs can pick up the slack of updating it every night or so, but that's still not the same as having a reliably up-to-date database for your files.Perhaps hacking Berkley DB support into the VFS layer, such that it records file names to a database on disk? This would avoid the code duplication of having each filesystem use its own scheme. A locating program could then just query the database. I don't know how feasible this is, really, because I'm not a kernel hacker, but if it could be made to work okay, it'd be pretty nice. No more running slocate cron jobs in the middle of the night, or lengthy disk searches using find(1). Needless to say, this would be 2.7 work.
The future of the Linux filesystem Posted Nov 6, 2003 8:27 UTC (Thu) by nix (subscriber, #2304) [Link] Perhaps hacking Berkley DB support into the VFS layer, such that it records file names to a database on disk? It sounds like you want a Linux filesystem driver to access Subversion repositories :)
The future of the Linux filesystem Posted Nov 6, 2003 7:44 UTC (Thu) by arcticwolf (guest, #8341) [Link] Applications will be able to set up their own schemas for their specific object types [...] That sounds suspiciously like OS/2's Extended Attributes in the HPFS filesystem. And I think the next sentence really hits the nail on the head: if mail agents can agree on a email message schema, then users should be able to switch easily between them. *If* they do, the user *should* be able to. I think it's not reasonable to expect that this will actually happen with proprietary products, though - to stay with the email example, there already are standard formats, but most vendors still seem to choose their own, proprietary format so that migrating away from their product to another client becomes more difficult. One could just as well expect Microsoft to use a truly open and standardized format for their Office suit, for example.
Do it yourself :) Posted Nov 6, 2003 13:33 UTC (Thu) by melo@isp.novis.pt (subscriber, #4380) [Link] eheh...Well, if you change the ls command today, so that whenever someone lists a dir (avoid :)
Separating functionality from implementation. Posted Nov 6, 2003 13:52 UTC (Thu) by brugolsky (subscriber, #28) [Link] Discussing semi-structured data, indexing, etc., seems to degenerate rapidly into "we're falling behind!" v. "not in my kernel!" I'm happy to see projects like Storage and Dashboard going at this in GNOME / Mono. Perhaps after *extensive* experience with a working implementation, the hard task of separating out the core functionality, figuring out what sort of assistance userland needs from the kernel (e.g., we already have DNOTIFY), and whittling down the userland side to flatten the web of shared object dependencies.Unix has evolved other mechanisms for naming above the filesystem, e.g., the whole nsswitch subsystem, with pieces in libc, config files, nscd, Solaris doors, etc. It generally sucks (IMHO), but more-or-less works. In principle, gnome-vfs or Storage is not all that different from libresolv. Linux POSIX threading used to be done entirely userland, and did about 90% of the job, slowly. After many years of experience with LinuxThreads, and spurred on by NGPT and Rusty Russell's FUTEX work, Ulrich Drepper and Ingo Molnar figured out what functionality needed to go into the kernel to get it right, and made it happen, within the limits of good taste.
WinFS history... Posted Nov 6, 2003 15:52 UTC (Thu) by Cato (subscriber, #7643) [Link] WinFS used to be called OFS (Object File System) and was part of the Cairo project, a Windows NT successor announced in 1992 (yes, that long ago) but never shipped. See http://www.computerworld.com/softwaretopics/os/story/0,10801,69882,00.html and http://news.com.com/2009-1017-857509.html for some background.I suspect the issues with delivering an OFS/WinFS model are organisational as much as technical - you have to get different product groups (Word, Outlook, etc) or projects (OpenOffice, AbiWord) etc to agree on a common data definition for similar objects. This is reminiscent of what enterprises tried to do a long time ago in defining enterprise-wide data models - it usually turned into a bureaucratic mess, and people ended up going for a suite of products such as SAP, Oracle Applications, etc. This means that Microsoft may have a chance of doing this for its own Office apps, but others will have to follow its lead quite closely. The Linux suites could also do something similar but it might take quite a bit of cooperation to get diverse applications to agree.
WinFS prehistory... Posted Nov 6, 2003 21:14 UTC (Thu) by roelofs (subscriber, #2599) [Link] WinFS used to be called OFS (Object File System) and was part of the Cairo project, a Windows NT successor announced in 1992 (yes, that long ago) but never shipped.Those who remember Dave Cutler's involvement with NT will not be surprised to hear that VMS had database-backed filesystems at least 15 years ago and probably closer to 20 or more. In fact, Unix-like unstructured byte streams ("Stream-LF" format) were almost unheard of in VMS and had all sorts of interoperability problems with the more common record-oriented (IBM-like) files. The actual database-like formats were almost as rare, and I never worked with them myself, but I read about them (more than I ever wanted to) in the Big Orange Wall and its successor. Greg
The future of the Linux filesystem Posted Nov 6, 2003 16:45 UTC (Thu) by stevef (subscriber, #7712) [Link] Although this decription of WinFS over NTFS is reminiscent of OS/2 Extended Attributes, it misses discussion of the functional requirements in the filesystem. Even without standardization (about a dozen named attributes were standardized across applications historically), Extended Attributes had value because they were useful to the program but transparent to the user (streams are even more valuable). For advanced searching in a traditional filesystem model, an obvious choice would be to leverage Alternate Data Streams (similar in some ways although more flexible than Macintosh Forks, OS/2 Extended Attributes or xattrs). This appears to be the pragmatic approach described - putting database link function in user space that leverages minor NTFS extensions. A straw person list of the filesystem requirements this woud generate are:1) The API must be invokable from userspace (Sun's "openat" and the existing xattr API may be reasonable) 2) The API must allow for quick search for existing of a particular xattr on a particular object (Microsoft seems to do this mostly by labelling their streams with UUIDs) and retrieval of the value of the xattr 3) The filesystem must support xattrs (or streams) larger than 64K (probably of arbitrary size) - an example from past history shows why this is important - when each stream contains a translation of the program's messages into a different languages. 4) Common utilities must support (ie modified or new ones created) for copying, moving and archiving the file (including all of its streams) as one command (users should not have to write code to list through the file's streams and copy each individually) 5) The VFS layer and at least one :) local filesystem must support a rename operation that rename a file and all of its streams as one operation. 6) The API should consider the requirements of some of the Servers too - not just Samba (which has needed stream support for a while to provide compat. with a few client apps running in a network with Linux servers) but also for NFSv4 (which even in an all Unix/Linux environment should support opcode 19 which is reminiscent of alternate data streams) All of this can be done [only] in libc or in applications like Apache and Samba if necessary (treating groups of files or stream pseudo-directories with reserved naming conventions ...) but it is much less efficient for file renaming, links and file copy to [solely] rely on userspace changes than it would be with minor extensions to the VFS.
The future of the Linux filesystem Posted Nov 6, 2003 18:42 UTC (Thu) by ksmathers (subscriber, #2353) [Link] There are several active projects (both commercial and open) in the W3C community that seek to implement faceted resource browsing based on metadata associated with those resources. Off the top of my head, there is the Simile project, Siderian software's seamark, MIT's Haystack.
The Future of Data Recovery Posted Nov 7, 2003 5:29 UTC (Fri) by stock (subscriber, #5849) [Link] The Future of Data Recovery will become immense more complicated too.Today we have rescue iso's and floppy which will just routinely load a vfat.o (FAT32) or ntfs.o (NTFS upto version 3.1) driver and any administrator with human skillz will be able to extract his employers valuable data in a breeze. What will happen with WinFS? I wouldn't be surprised if a whole new Windows based Data Recovery Industry will emerge. Selling Enterprise labled recovery software which now also features WinFS Longhorn Edition. Think maybe about prices starting from $1500,= and up? And indeed then Microsoft just might, (after announcing the release of Longhorn in end 2005, just after DRM got out for mandatory implementation on all new Longhorn ready hardware, and just after Novell, IBM and RedHat have beaten each other into a coma state of business affairs) happily announce to small and medium business owners that putting their valuable data on Microsofts Online Public SAN is much more cost effective as ever. Well much more TCO aware as storing data on private Longhorn systems. And don't forget that the Pentagon with its Total Information Awareness Project is only now seeing its true dream come true. Robert
On the business side Posted Nov 13, 2003 4:12 UTC (Thu) by ezyrider (guest, #14889) [Link] Linux doesn't have to be afraid of Longhorn. In fact it may prove to be a blessing in disguise. For the following reasons:1, On the consumer side they'll probably try to shove DRM down everybody's throats. Assume that file swapping continues to thrive using newer protocols that make it harder to sue swappers(like freenet maybe). Would you upgrade if the new os wouldn't allow you to play most of your files? 2, On the corporate side they'll try to innovate greatly. This will mean a lot of changes. That means a painful upgrade process, training users to adjust to the new OS,changing business processes, etc. If you have to go through a painful change then Linux suddenly becomes an option too. Expect a good portion of users to possibly switch to Linux. 3, MS will probably introduce a lot of changes for security. The greater the security the lower the ease of use(well most of the time anyway) Then MS loses one of its great advantages over Linux. 4, Even if Longhorn arrives in 2006 the bulk of users don't usually upgrade
On the business side Posted Nov 13, 2003 15:27 UTC (Thu) by pbakker (guest, #16829) [Link] Apart from technical considerations, the decision for many as to whether to consider WinFS can only be made after reading the accompanying licenses and legal ramifications.When your business data is stored in a proprietary, patented, binary format, you depend on MS for continued access to the data. If the license is by subscription model, you must continue to pay the subscription price. As soon as you stop paying, your access to your data is cut off, and the cutoff may be automatically enforced. Storing all your data in a format only accessible by MS tools is risky. A patent could prevent third party tools from being developed. Trusting your data to WinFS comes down to how much you trust MS and your continuing ability to pay any requested fees.
|
Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.