The Lack of a Small Unified Database (Linux Journal) [LWN.net]

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 28, 2004 19:11 UTC (Tue) by jwb (guest, #15467) [Link] (5 responses)

It seems SQL and the portable, single-file backing store are two orthogonal concepts. Why not tackle them separately?

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 28, 2004 20:25 UTC (Tue) by MathFox (guest, #6104) [Link] (4 responses)

mysqldump and pg_dump are perfectly capable of generating a database dump in SQL format. They can recreate the database (schema plus data) from their respective dumps. As SQL is a standard it should be possible to move these dumpd to another database engine; if not: add a few hacks to the dump program that forces them to emit standard SQL.

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 28, 2004 20:32 UTC (Tue) by jwb (guest, #15467) [Link] (3 responses)

I'm not seeing the point here. You can't reliably move dumps between Pg and MySQL because Pg supports concepts not implemented in MySQL, such as schemas, and in Pg 8, tablespaces.

But it the idea really to make an interchange format between random backends, or to make a portable one-file database? The latter seems like a more interesting goal, even if Berkeley DB probably already fills most of the desired functionality. A SQL interface on top of the portable database is, as I said, a neat trick but completely orthogonal.

The point is: force it to be standard

Posted Sep 29, 2004 1:54 UTC (Wed) by leonbrooks (guest, #1494) [Link] (2 responses)

Between sticking to standard features, exporting as close to standard as one can go, and translating on import, it should be possible to freely migrate the data for almost all of the sort of apps typified in Microsoft-Land by a .MDB file - the kind that are only one step better (if that) than flat-file databases anyway, and could realistically be handled by BerkeleyDB or similar.

Even beyond this, however, is the need for an MDB-like package that would work for even one type of database regardless of the distro's packaging system (RPM, DEB, slack TGZ, hand-installed), which specifies ownership and access and things like that in a general enough way that a *user* can reasonably expect to "install" a database and all of the fluff to go with it.

At the moment, SQL databases are dealt with either by including a set of generic SQL commands, or a different feed-this-to-your-DB script for each database type, or by supplying code to create the databases and populate them through a DBI or similar interface which is at least nominally back-end agnostic.

There are aspects of specific databases like MS SQL Server's almost-inherent case-blindness which are difficult to blip over but by and large statements like SELECT forename,surname FROM clients WHERE last_contact > '2004-08-29' will work across the board. It's only when you get down to tricky stuff like volume-spanning tables or non-trivial date manipulation that the differences start to become hard.

The point is: force it to be standard

Posted Sep 29, 2004 10:14 UTC (Wed) by adegert (guest, #5503) [Link] (1 responses)

SELECT forename,surname FROM clients WHERE last_contact > '2004-08-29'

... illegal value: 2004 for 'day of month'... ? ;-)

The point is: force it to be standard

Posted Oct 4, 2004 16:32 UTC (Mon) by kreutzm (guest, #4700) [Link]

Why ? I always use year-month-day, and not the (for me illogical) "month first".

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 28, 2004 19:11 UTC (Tue) by hisdad (subscriber, #5375) [Link] (2 responses)

Sqlite does the job, but why bother?

SQLite does indeed do (most of) the job.

Posted Sep 28, 2004 22:16 UTC (Tue) by AJWM (guest, #15888) [Link] (1 responses)

SQLite was my first thought, too.

As for "why bother?", well, there are many apps out there that could use some sort of query-based access to structured data, but that neither handle a large volume of such data nor need the overhead of a dedicated RDBMS. The sort of "mini database" that too many people end up using spreadsheets for.

Think data sets of no more than a few thousand records. Is it really worth the overhead of PostgreSQL or even MySQL for, to use a canonical example, a recipe database? There are probably thousands of other applications (think hobby and educational software) that use such small databases and would benefit from a "one file" way to update the app with a fresh copy of the DB. One example -- an application for mineral collectors and rock hounds to help identify and catalog their collections, requiring a prepopulated -- and updateable -- DB of minerals and their characteristics. Ditto for numerous other collecting type hobbies (a DB of Beanie Babies, anyone? ;-), and related educational apps.

And sure, those could all be coded with custom code seeking through formatted text files, if you want to use 1980s technology, or even Berkely DB, but both those approaches are pretty inflexible compared to SQL and modern DB APIs (eg JDBC, Qt's SQL classes, Perl DBI, etc.)

SQLite does indeed do (most of) the job.

Posted Oct 4, 2004 16:40 UTC (Mon) by kreutzm (guest, #4700) [Link]

Well, I don't get it:

If I want to use data, I use an application for it, even it being grep if the data is a flat file. Even these "mini-databases" need the table definition. And do you really expect Joe Buck to learn SQL? He sure does not want to, and would still prefer a SpreadSheet or a flat file.

So, instead of looking for a small database, I suggest asking the people writing the applications to improve their import/export (maybe via flat file for people wanting to do hand edits --- but thats defeating the purpose of databases).

Have you ever measured the "Overhead of an RDBMS"? If I install an app which -- in the background -- installs say PostgreSQL, I think many users wouldn't notice at all. And for the user, the UI is all he cares. If the UI presents an "export" and "import" to share data, than this should be more valuable than an abstract "flat file" for interchange.

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 28, 2004 19:21 UTC (Tue) by jhknight (guest, #22426) [Link] (5 responses)

With all the inconsistancies between current SQL-like implementations, this could cause lots of problems... For instance, older versions of MySQL doesn't enforce foriegn keys (I think newer ones do). So a database/schema created with postgresql, sent to a mysql user in this new magic format, editted, and sent back might have inconsistant data.

There is sqlite (sqlite.org) that stores a database in a single file. But you can only use that file with sqlite. On the other hand, many dbms can read/write data to/from xml files. So, a well defined xml-based language could accomplish this, with the drawbacks mentioned above. Any format would require an engine that would handle them appropriately. It's basically the same problem with word processors, etc.

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 28, 2004 19:34 UTC (Tue) by dang (guest, #310) [Link] (4 responses)

Well the attractive part of SQL is that it is a standard. Kind of the point of it in fact. So keeping everything SQL92-centric is an appropriate design constraint. And if your DB of choice isn't SQL-compliant, or if you want to use non-sql compliant oddities, then you are already off on your own anyway ( and good luck if you ever want to migrate to a different backend :) )

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 28, 2004 20:14 UTC (Tue) by oak (guest, #2786) [Link]

Standard? There are several things that are not properly standardized in
SQL92 and for which all databases have their own ideas about (it's some
time since I did any SQL :-), but I think e.g. date formats and sequences
were differently implemented)

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 28, 2004 21:51 UTC (Tue) by AJWM (guest, #15888) [Link] (2 responses)

SQL is a standard in the same way that RS-232 is a standard.

For those of you too young to have had the, ah, pleasure of trying to get two RS-232 serial devices talking to each other, I'll leave it up to your imagination, just throwing out a few relevant terms: null modem, gender bender, 25-9 pin adapter, breakout box, ... you get the idea.

(Oh, and as if SQL wasn't bad enough, try Embedded SQL. Again a nominal standard, but you have to resort to m4 macros to write code that can be handled by the PostgreSQL, Oracle, and Sybase preprocessors. And of course MySQL doesn't support it at all.)

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 28, 2004 22:13 UTC (Tue) by dang (guest, #310) [Link] (1 responses)

But the funny thing is that you can hew very closely to the standard for SQL and do just fine ( even 92, though 99 is much richer ). And for things that are slightly off, all of the abstraction that you need to cast things correctly is available or easily implimented. Especially for a database that is small by spec, there are many good reasons to keep the bugger SQL92 or 99 compliant.

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 28, 2004 22:19 UTC (Tue) by AJWM (guest, #15888) [Link]

Oh, certainly, I'd rather see the thing more SQL compliant than less. I'd favor SQLite over BerkeleyDB, for example.

Just a caveat on "standards".

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 28, 2004 19:47 UTC (Tue) by gh (guest, #25066) [Link] (1 responses)

I did not bother to read the article but what is needed is something to replace ms works. What is missing is the front-end. There are lots of ways to get the backend cheaply.

This is something I plan to play with in the near future using wxwidgets (formerly wxwindows) and python. I'd rather use perl but python seems to have a more mature interface. If someone else does it first, I would be glad to drop this project.

--GH.

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 29, 2004 1:21 UTC (Wed) by josh_stern (guest, #4868) [Link]

I'm not totally sure what you mean by front-end, but I
was just looking into adding an SQLite driver to
Rekall for a project where I wanted a database without
much size or admin overhead and an interface. Drop
me a line, (josh &&&at&&& neurovia.umn.edu) if you
are interested.

Net Packaging Frontier

Posted Sep 28, 2004 20:17 UTC (Tue) by AnswerGuy (guest, #1256) [Link]

I can see why they think of SQL schema as the next packaging frontier.

I see an increasing number of packages that require an RDBMS backend. Thinks like the mediawiki engine that runs Wikipedia and several other wiki systems, webmail gateways, blog packages, GForge, and many of the forum packages like Slashcode and phpBB, the GNUe project, the SQL-Ledger package.

I suspect that many cases will require a specific backend database (MySQL or PostgreSQL). Some will be flexible enough to configure into which ever a given side wanted (when the Perl, Python, or PHP code can just load the appropriate DBD, DBAPI, or whatever driver for the back end). Perhaps an rc file could provide the server DSN to the server, and the package manager could login, add the necessary database, create the tables, etc. and write back a configuration file (in a standard format) telling the application the DSN with the database name included. (Note the db name might have been munged to avoid conflicts).

This would imply that applications/packages conforming to this spec. (for DB applications and middleware) would have to be able to cope with munged db names. This would allow them to co-exist peacefully on the same db servers.

It should be interesting and it would be nice if this packaging standard would play well with existing packaging standards (RPM, dpkg/APT)

JimD

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 28, 2004 21:13 UTC (Tue) by denials (subscriber, #3413) [Link] (2 responses)

Apache Derby goes unmentioned by the author, but it offers some features similar to SQLite:

open source under the Apache License, Version 2.0
SQL92 compliant (actually, statement for statement compatible with DB2 Universal Database -- so there's a scalability option if your application grows more than you anticipated)
small (2.0MB JAR file)
ODBC access

Derby also offers features that SQLite does not offer today:

referential integrity
user-defined functions
stored procedures

Derby would probably be a good choice for any project in which the front end is implemented in Java (Eclipse). While Derby lacks the author's 'single-file' requirement, tarring or zipping the database directory for the transfer scenarios in the article is probably an acceptable tradeoff for the I/O benefits of storing logs separately from the data.

IMHO, the larger problem posed by the author's scenarios is the interchange of data between applications. So let's assume you can use one file to store addresses in OO.o, Koffice, Kmail, Evolution, and a zillion other applications -- along with the data, you need a standard data definition so that each application can use the same tables / columns / data types to store and retrieve data. There's one technology that already has a well-defined answer to this problem: XML (through either Document Type Definitions or Schemas).

So my final comment is -- who really cares about the underlying database as long as you have a well-defined data format that you can use interchangeably between applications (and the applications stick to that agreed-upon format). In which case, all that you need from your database is the ability to publish to XML.

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 30, 2004 2:38 UTC (Thu) by piman (guest, #8957) [Link] (1 responses)

Derby: "small" (2.0MB JAR file)
SQLite: "small" (less than 30K lines of C code, less than 250KB code space)

Which one of those is *really* small?

> all that you need from your database is the ability to publish to XML.

And the assumption that no one (person or program) will write bad XML, and everyone will be happy using the same Schema (unlikely, or we'd be using the same schema already...)

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 30, 2004 12:38 UTC (Thu) by Wol (subscriber, #4433) [Link]

Oh - and XML fits *really* *badly* into the SQL (or relational) model.

If you want to play with XML, you'd be far better off with a Pick-style DBMS system.

Certainly when dealing with data, the transform between Pick and XML is trivial in BOTH directions.

Cheers,
Wol

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 28, 2004 21:28 UTC (Tue) by petegn (guest, #847) [Link]

At long last ..

Myself and a few others have been saying this for some considerable time now and kept getting to to go away and use MySql or Posrgress ect .. well they aint not the way and at last people may just have to sit up shut up listen and do what is wanted/needed yes Linux is Superbe but it has lacked in this department for the longest time maybe NOW we get it fixed at long last ..

Pete geting happy for once

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 28, 2004 21:52 UTC (Tue) by manuel.flury (guest, #7880) [Link] (4 responses)

In fact, they're talking about an clone of Microsoft Access, isn't it ?

A single .mdb file can contain an application as far as the runtime is installed.

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 29, 2004 8:31 UTC (Wed) by hippy (subscriber, #1488) [Link] (3 responses)

I think they are talking about a representation standard, or collection of
representation standards that can be used to exchange all the data
associated with a personal database application. If that is what .mdb
format does, then yes they are talking about a clone of its capabilities.

The important point, that many people appear to have missed, is that the
representation standard needs to cover more than just the data itself. It
should also, IMO, not be specified in terms of an application (e.g.
Sqlite). It needs to support the representation of relational data,
queries, reports and forms etc. In a manner that is portable across
architectures, applications and extensible over time.

XML might well be a suitable starting point for this representation, the
performance arguments against this rather miss the point. The is no need
of an application to keep the data in the standard representation. If
performance is a problem the an import/export facility could be used to
transform the data into a more performance orientated format. The point of
the standard representation is for interoperability across application and
platforms. It should also be viewed as a format that ensures the longevity
of databases.

I for one hope that OASIS pick this up.

Richard

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 29, 2004 9:51 UTC (Wed) by tzafrir (subscriber, #11501) [Link] (2 responses)

an MDB file is easy to currpt, right?

You want a standard format for exchangeing data.

Create a zip file containing:

a. a separate db file for each table (db can be berkely db, sqlite, or one of many other alternatives)
b. a subdirectory "queries" with a separate file for each query
c. a subdirectory "forms" with a separate file for each form

You can also make B and C one XML file, a-la OOo documents. ATM I see no need for the extra complication, but maybe there is.

Those data are expected to compress well, so saving them in a zip archive is a sane choice. It also allows relatively checp extraction.

An application will need to extract the data to its own internal representation and preform the actions (be that even by delegating it to other programs: 'mktmpdir; cd_to_tmpdir unzip /path/to/dbfile; dbprogram runquery queries:a:bc' )

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 29, 2004 10:16 UTC (Wed) by hippy (subscriber, #1488) [Link] (1 responses)

The file layout is the easy bit.

The specification of the representation within those files is where OASIS
can really play a part. Just representing the data as the data files of an
existing database application does not really get to the nub of the
problem. I don't want to find that when I access a database in 10 years
time that the only way I am going to read it is to find an application
that has been dead for 5 years. A standard representation is required that
is endorsed by OASIS and supported by lots of database applications. A
pipe dream maybe.

Richard

The Lack of a Small Unified Database (Linux Journal)

Posted Sep 29, 2004 12:36 UTC (Wed) by josh_stern (guest, #4868) [Link]

The file specification would also be an easy part if one
didn't care much about performance. As soon as one does,
we are talking things like btrees and their binary layout
on disk which is far from standardized and complex to
optimize. The least understood component of the btree
in this context is probably the disk-based analog of malloc.
What features should it optimize (space vs. time?,
coarse grained vs. fine grained locking?,
64 bit addresses?, etc).

GNU Schema Definition File Format

Posted Sep 29, 2004 11:28 UTC (Wed) by siesel (guest, #5021) [Link]

The GNU Enterprise Project (www.gnuenterprise.org) has created a file format called GNUe Schema Definition (gsd)*. It stores database schemata and data in one XML file and can be converted into database specific SQL.
There is no way to directly change and access data through SQL querys (like sqlite). But its the perfect data exchange format between sql databases, or for a initial database setup for every kind of project using a sql database.

* Part of the gnue-common base library.