LWN: Comments on "Status.net service to phase out, replaced by pump.io"

Cool new Free software

nix — Fri, 04 Jan 2013 16:56:39 +0000

I might note that you can in fact store lists in PostgreSQL. It's had arrays since before it was called PostgreSQL. :)

Cool new Free software

Wol — Tue, 25 Dec 2012 17:56:52 +0000

Which is why it sounds like Pick is a good fit for people who understand both relational and NoSQL.

Your "contorted language" is spot on. Basically, in relational you cannot STORE that most natural of data concepts, the list. You have to MODEL it. At which point your database becomes unnecessarily complex and complicated.

Referring back to the comment in your earlier post about "You can either read everything from many different "tables" or you can just store every piece of data multiple times. With relational databases you can also denormalize data, but they are usually less flexible as to how it is stored."

But that's exactly what Pick does! NFNF is what relational purists would call "denormalised". The fact that it can be mechanically normalised by the DB seems to have passed them by. So in Pick, I don't "read from many tables OR store the data many times", I just store the data ONCE in ONE table. imho it is *relational* that needs to "store the data many times in many tables" - just try to store a list, then ask yourself how many times you have to store the name of the list (or its id, the same thing...) As a system that requires (as part of its definition) that you don't store duplicate data, the relational system is very poor at living up to its own definition!

Cheers,
Wol

Cool new Free software

man_ls — Tue, 25 Dec 2012 17:38:41 +0000

Well, designing a relational database is not trivial: modeling n-m relationships can be challenging for beginners. On the other hand, using a document store like MongoDB is trivial until you get to the serious stuff -- or to transactions. No wonder it is the most popular of the NoSQL bunch.

Other NoSQL families, like key-value (Dynamo-like) or graph databases, are more specialized and need more effort to keep going. But I would argue that none are near the level of sophistication of a normalized database. Most of you probably don't feel it because you have worked with SQL for many years, but it is a contorted language that uses a highly unnatural data model. Yes, relational databases are highly optimized, but you pay the price every time you read or write anything to them.

Cool new Free software

raven667 — Tue, 25 Dec 2012 05:26:19 +0000

> You can try to use PostgreSQL as a NoSQL store, but you will be swimming upstream for the rest of your career. How do you share the load, or replicate between nodes? How do you deal with consistency if you need it in a non-relational table? How do you optimize a single database for both consistency and lack of it?

There are certainly specific use cases that the various NoSQL databases were designed for and are optimized for, if you have one of those cases then by all means use the right tool for the job, but there seem to be a lot of inexperienced developers who think that traditional SQL databases are slow creaking dinosaurs, unsuitable for any purpose when in fact they contain highly optimized data stores which have been developed over decades by people competent in the relevant math and computer science. Different does not always mean better.

Cool new Free software

man_ls — Mon, 24 Dec 2012 09:58:21 +0000

There is a subtle difference between premature optimization and sensible design. Premature optimization tends to be low level, while sensible design is more of a big picture thing. You cannot commit all your data and your code to live in PostgreSQL and one day migrate magically to Riak; or go back and forth as you need it. Even migrating between similar databases is an error prone task, and JDBC and similar libraries that try to abstract the underlying database are just an excuse for managers. So changing databases is not an "optimization", and therefore it cannot be premature.

NoSQL stores usually allow you to use different schemas on the same table. Isn't it better to enforce a single schema? It depends; after all you will have to make sure that your data have the correct format before writing to the store (or after reading from it), so having the database reject your data is not a substitute for thorough testing. Also, having a single schema goes directly against reversible DevOps, as it entails offline data migration. Not everyone can afford downtime to migrate data between schemas.

As to consistency, the advantage of NoSQL stores is that they allow you to choose the degree of consistency that you need. You can either read everything from many different "tables" or you can just store every piece of data multiple times. With relational databases you can also denormalize data, but they are usually less flexible as to how it is stored. (It is harder e.g. to store an array inside a table, you can just store the first n items.) If you need total consistency, then by all means go to a relational store since it will give you better guarantees. But consistency is again not a magic pixie dust you can sprinkle on your data; it has to be there from the start.

Same with transactions: if you need them, go to a transactional store. But first think if you need them, and if you do then design them properly. Do not just trust your store to do the right thing because there are 100 ways to mess it up.

You can try to use PostgreSQL as a NoSQL store, but you will be swimming upstream for the rest of your career. How do you share the load, or replicate between nodes? How do you deal with consistency if you need it in a non-relational table? How do you optimize a single database for both consistency and lack of it?

Informix and IBM

markhb — Fri, 21 Dec 2012 18:21:51 +0000

You're right about Ardent acquiring the Uni products, but in reality, Ardent took over Informix. Yes, I know Informix the company bought out Ardent, but six months later the Informix board was gone, replaced entirely by Ardent people. The rumour is that IBM bought Informix for the Informix database, only to discover that the company's primary product by then was U2.

Another rumor I heard, from a consultant who knew a lot of people in IBM, was that when they bought Informix they did, in fact, plan to merge the IDS (or Universal Server) tech into DB2, only to find that the Informix stuff was so far ahead of where DB2 was that they couldn't make it happen.

Cool new Free software

Wol — Fri, 21 Dec 2012 15:51:41 +0000

Following up to myself, let's take a quick look at that invoice, with ten items.

Let's start with the invoice and delivery addresses. Are they attributes of the invoice, stored in the invoice record, or relations to a WORM table of locations? As far as Pick is concerned, it doesn't care, it can store a foreign key or the location itself. Okay, the same is true of relational, but depending on how relational physically stores the data, it may have an impact later on.

Now the line items. Are they an attribute of the invoice, an attribute of some ledger, or an entity in their own right? I'm inclined to make them entities in their own right, not knowing enough about accounting off the top of my head to make the best call. I *could* make them an attribute of the invoice.

Now to save it all. Assuming the addresses all exist on file, that's one write for the invoice record and ten writes for the ten line items (if the line items were invoice attributes, I could have written the lot in just ONE write). Eleven atomic writes, wrapped in a transaction.

In relational, however, I have to add a row to the invoice table. Ten rows to the line item table. And update the line-item index on invoice. That's over and above the fact that I have to muddle data and metadata in the line item table - creating some random field I can sort on to return the line items in the correct order (in Pick, I simply store a *list* of line-item keys in the invoice record). So relational has the extra overhead of more "data" to store, and (unless it's willing to incur a massive hit on reading) the overhead of updating a whole bunch of indexes. The same eleven writes of data (with no option to reduce it to one) plus a bunch of indexes.

Now, let's assume we come back a week later and want to print off the index. I'll ignore how we obtain the invoice number. In Pick, we have ONE read for the invoice record, TWO reads for the addresses, and TEN reads for the line items. By the way, a read is defined as a SINGLE disk seek instruction. Statistics tell me the engine is going to make one mistake, so I need to make 14 seeks.

In relational, however, I guess I need to read the invoice table index to find out where to find the invoice. That's two seeks minimum. Then I need to read the two addresses. Another four seeks. Then the index on the line item table followed by the line items. That's eleven seeks, assuming the location is stored in that index or twenty-one if it isn't. I make that 17 *minimum*, probably a lot more.

Remember I said Pick optimises retrieving data from disk?

What if I made a mistake and stored line items as an invoice attribute when I shouldn't? I end up with the equivalent of the relational line item table, clustered by invoice number. Given that relational has to guess how best to cluster data, chances are my arrangement is just as good :-)

At the end of the day, as soon as we start arguing performance, I have a MASSIVE advantage over you. The relational model explicitly forbids you knowing the internal structure of the database, so that the engine can optimise it as best it sees fit. As an application programmer, I know *exactly* how Pick is storing its data at the disk level. There's a reason why Pick doesn't have a query optimiser - it's a fairly trivial exercise in logic to prove that disk access is so efficient (approx 97%) that any attempt to optimise it will cost more than it saves. Pick enforces primary keys. The primary key enables Pick to calculate the location of any item on disk. The Pick data structure pretty much enforces logically tightly coupled attributes to be physically tightly coupled on disk. The ability to store a LIST of foreign keys in a single atomic record eliminates the need for many indices (and because it's a LIST eliminates the need to muddle data and metadata).

In Pick's worst-case scenario (provided the data has been normalised), it degrades to a weakly optimised relational scenario. (The enforcement of primary keys provides some indexing.) In Pick's typical scenario, any half-way complex query is going to leave relational in the dust. That P90 query I mentioned? I bet those Oracle consultants were adding indexes up the wazoo to try and improve performance. The Pick query was probably thrown together in five minutes, and because it was pretty much solely hunting down pre-known primary keys, could go straight to the data it wanted without needing to search for it on dis.

If you want to know how Pick finds data so fast - http://en.wikipedia.org/wiki/Linear_hashing - given a known primary key, it takes on average 1.05 requests to disk to find what you're looking for!

Cheers,
Wol

Cool new Free software

Wol — Fri, 21 Dec 2012 13:23:11 +0000

I think UniData is a bit of a red-headed stepchild in the Pick world. Certainly the reports I've seen seem to say that under the hood it doesn't do it like the other db's in the family.

Yes, I suspect internationalisation may be a bit of a pain, but it has been done. I haven't used it, but I haven't used internationalisation on linux either (I guess it's there, but I'm not conscious of it).

Limited depth hierarchies? In reality, how often do you blow off the end of Pick's ability? The tools aren't necessarily that good, but it handles between five and seven levels "out of the box". How many entities have attributes nested that deep?

You're right about Ardent acquiring the Uni products, but in reality, Ardent took over Informix. Yes, I know Informix the company bought out Ardent, but six months later the Informix board was gone, replaced entirely by Ardent people. The rumour is that IBM bought Informix for the Informix database, only to discover that the company's primary product by then was U2.

And as you say about Postgres, I don't know anything about it but I understood it could handle multivalue columns sort of thing. If you're going to be strictly relational, however, that's not allowed :-) Postgres is moving away from a pure relational db to a more NFNF model. Pick was there first ... :-)

Cheers,
Wol

Cool new Free software

pboddie — Fri, 21 Dec 2012 12:51:28 +0000

As I recall from having brushed up against UniData a few years ago, this class of database system works well for limited-depth hierarchical data because of the various things you've mentioned, but that doesn't necessarily mean that all of the advantages apply to other kinds of database structure.

I think it's also pertinent to mention that PostgreSQL has been able to deal with things like multivalued columns for a long time and in an arguably more sane fashion than, say, UniData in various respects, such as in the storage representation which, as I recall with UniData, involved various "top-bit-set" characters as field boundaries that probably made internationalisation a pain.

Certainly, this class of system works well for certain kinds of systems and there's undoubtedly a lot of people still sticking with them, as well as a lot who tried to migrate from them in a bad way, either causing lots of teething troubles and organisational stress with the new system or reinforcing prejudices about the capabilities of "those new-fangled client-server RDBMSs". That the migration exercises probably involved Oracle or some other proprietary solution, where only a single vendor can ease the pain, probably won't have helped.

It's telling that UniData and UniVerse ended up ultimately with IBM after an acquisitions cascade that involved lots of slightly bigger fish eating other fish. I think it was Ardent acquiring the Uni-products, being acquired by Informix, being acquired by IBM. Unlike HP who would have killed the product line in order to add a few cents to the dividend for that quarter, IBM probably see the long-term value in those remaining customers.

Cool new Free software

Wol — Fri, 21 Dec 2012 12:09:57 +0000

No I am NOT.

What I am claiming is that where a relational database HAS to do MULTIPLE writes, a Pick database usually only has to do ONE!

Who cares if the overhead PER WRITE is the same, if I'm doing half the writes, and that overhead is expensive, I'm going to trounce you for speed! Chances are, I'm doing a lot LESS than half the writes. That's the whole point of NFNF. (And as I keep saying, ACID has nothing to do with relational, and everything to do with reality, so Pick doesn't have to do it the same way as relational. It can if it wants, no reason why not.)

(I also forgot to mention, because Pick is primary-key-driven, data retrieval usually involves a direct key access, not a search via an index for a row - more savings on disk access!)

And chances are I'm doing a heck of a lot less i/o, because I have far less duplicate data all over the place, and I'm storing it much more compactly too. I was involved in porting an app from Pick to SQL-Server, so I've got direct comparisons to hand - dumping the data from the Pick datastore and loading into SQL-Server, the resulting SQL-Server database was MANY times the size of the Pick one. Four, five times? Maybe more. Oh, and I'm including in the Pick datastore all the data overhead we didn't transfer. And the Pick datastore by default (we didn't change it) runs at 80% full. I can't give you figures for SQL overhead because I don't know it.

Cheers,
Wol

Cool new Free software

dlang — Fri, 21 Dec 2012 01:43:55 +0000

> If your relational engine can force the issue, so can my Pick engine.

you are claiming that you are so much better than the relational databases because you don't do all the hard work that they do to be safe in the face of multiple writes.

you can't then turn around and say "if they do it, I can do"

you could, but then you loose what you are claiming is such a great advantage.

Cool new Free software

andresfreund — Fri, 21 Dec 2012 00:15:50 +0000

Wait what? "Heavily customized"? I now checked and I changed 3 performance critical parameters (and loads of logging/debugging ones, but those don't increase performance).

1) synchronous_commit = off. Is it ok to loose the newest (0.2 - 0.6s) transactions in a crash. Older transactions are still guaranteed to be safe. Thats a choice *YOU* have to make, it really depends the data youre storing (and can be toggled per session & transaction). Obviously things are going to be slower if you require synchronous commits.

2) shared_buffers = 512MB. How much memory can be used to buffer reads/writes. Would I have optimized I would have probably set it to 4-6GB.

3) checkpoint_segments = 40. How much disk can we use for the transaction log. Would I have optimized for benchmarking/write heavy it would have been set up to 300.

I don't know how those could be determined automatically. They depend on what the machine is used for.

And I used plain INSERT & SELECT, no COPY.

Cool new Free software

man_ls — Thu, 20 Dec 2012 23:23:27 +0000

So there is your "completely unoptimized" database: a heavily customized installation by an expert. Oh, and don't use INSERTs like you were told, just COPYs. And a schemaless specific datatype. And hope that performance is still good... At that point you might as well use a data store which at least has been designed with that scenario in mind, and which any idiot (e.g. me) can use to do thousands of writes per second -- out of the box.

Cool new Free software

Wol — Thu, 20 Dec 2012 22:39:59 +0000

Following up to myself.

ACID sits at the interface between the database and the OS.

A transaction that is seen by the app as atomic is very UNlikely to be passed from a relational database to the OS as an atomic write.

That same transaction is far more likely to be passed to the OS as an atomic write by Pick.

Far less complicated. Far less overhead. Far easier for the user or programmer to understand.

Cheers,
Wol

Cool new Free software

Wol — Thu, 20 Dec 2012 22:26:06 +0000

To make the problem more explicit, how do you store a list in an rdbms without mixing data and metadata? IT CAN'T BE DONE, because you can't store a list in a set.

So as soon as your data contains lists, an rdbms is inefficient. Much data consists of lists, does it not?

Cheers,
Wol

a lot of familiarity with how the data is stored

Wol — Thu, 20 Dec 2012 22:23:08 +0000

:-)

Which is EXPLICITLY FORBIDDEN by the relational definition!

Exactly my point about Postgres taking a lot of ideas from the NoSQL crowd.

In Pick, I know *exactly* how my data is stored, because I told the database how to do it. Which is why Pick databases can be a mess because it's not properly normalised :-( The programmer needs to normalise his FILEs properly or you can lose the benefits. I've seen too many badly normalised FILEs that are a pain to work around.

Cheers,
Wol

Cool new Free software

Wol — Thu, 20 Dec 2012 22:16:21 +0000

Falls apart how?

EXACTLY THE SAME is true (or not true) of a relational database. If my Pick engine can't assume that a write has succeeded, nor can your relational engine. If your relational engine can force the issue, so can my Pick engine.

ACID is nothing to do with relational. A relational app has to deal with the database returning a status of "unable to save data" - so does a Pick app.

A relational database has to make sure that when it tells the app that the data has been saved to disk, it really has. A Pick database, likewise.

And you completely miss my major point about speed. Let's say I am writing that invoice. First of all, when I read the client info, chances are it is a *single* disk access. In a relational database it is probably several reads scattered across multiple rows and tables. Then when I write the invoice, it is a SINGLE write to the invoice table. Done! Then I need to update the sales ledger and the client outstanding. I've already read the client info - that's cached - ONE write to the client table and the client outstanding is done. One read and write to the sales ledger and that is done.

Obviously those three writes need to be wrapped in a transaction to enforce ACID, but that would be true of relational. The difference between Pick and relational is that in relational that could easily be twenty or thirty writes. That's a LOT of i/o. And if your ACID is doing a sync, to ensure it's all flushed to disk, that's a LOT of overhead my Pick database doesn't have. Now add in all your integrity checks - that make sure your invoice has a matching delivery address, invoice address, line items, etc. That aren't needed in Pick because the invoice is one atom, not plastered across multiple tables ...

Where Pick scores massively, as I keep trying to hammer home, is at the (*expensive*!!!) interface between the database engine and the disk, there is FAR LESS TRAFFIC. So Pick is a lot faster. As the Pick FAQ put it - "other databases optimise the easy task of finding data in ram. Pick optimises the difficult task of getting it into ram in the first place". Where relational typically passes *many* writes to the OS, Pick passes a single ATOMIC write.

To drift the topic a bit, do you know why Linux tends to panic rather than fix things? Because, like a true engineer, Linus assumes that - when things go wrong - the code to handle any problems is likely to be buggy and untested, so it's far better to fail noisily and let the user handle the mess, than try to fix it automatically and fail. Yes, that's a judgement call, but a good one. Relational tends to try to enforce perfection. Getting that last ten percent can be VERY expensive. More than expensive enough to be totally counter-productive.

If I was writing a new Pick app I would be more concerned about the integrity of the objects - such as the invoice and information provided by the customer - that they were saved. And that's a single write to disk.

Stitching the customer order into the accounts and shipping can come later. And if it screws up I can use transactions to roll back, or run an integrity check to complete the failed commit. (In actual fact, I used exactly this technique - for a different reason - in an accounting app I wrote many moons ago.)

At the end of the day, Murphy WILL strike. And if you rely on ACID in your database to save you it won't. If, instead, you rely on phased, replayable commits, you're laughing. My database will be vulnerable to a crash while the rep is on the phone to the customer. Once the rep has hit "save order" and that has succeeded, all I'd be afraid of is losing the disk - the order can be committed to fulfilment and accounts as and when.

WHEN Murphy strikes and ACID fails, how easily could you run an integrity check on your database and fix it? In Pick, it's both EASY and FAST. Pretty much all the systems I've worked with do it as a matter of course as an overnight job.

Cheers,
Wol

Cool new Free software

job — Thu, 20 Dec 2012 20:57:33 +0000

What's interesting about it? It's even less interesting than Twisted or POE, which is at least ten years older.

But I would say that's probably what people like about it. It's not interesting, it works and it's easy to understand. Perhaps that makes for a large ecosystem one day. V8 certainly gives great performance. That's a great combination.

(Personally I think the callback spaghetti makes for hard to read code. Let's just say I wouldn't want to debug someone else's code. And this is from someone who actually likes Perl.)

Cool new Free software

nix — Thu, 20 Dec 2012 20:21:29 +0000

NPTL was very well optimized, every bit polished until it was far more scalable than normal users would ever need on the assumption that high-end users would push even those limits. A superb piece of work all around.

Cool new Free software

raven667 — Thu, 20 Dec 2012 19:21:55 +0000

> Now, I know from experience that there are scenarios where RDBMS's don't work very well, but - sorry to be harsh here - I don't see too many people making those 'RDBMS/ACID/whatever sucks' arguments here have actual experience with that kind of workloads. Also note that most shops using fancy KV stores and such *also* use relational databases for parts of their data. Its simply a question of the right tool for the right job.

One thing I see from people who do a lot of work with Postgres and understand it very well is that there is a lot of flexibility on the data structures used and methods to access those data structures that have been optimized by developers competent in the math over the course of decades. So whatever algorithms are used to drive these NoSQL databases and make them fast are probably already implemented as good or better in Postgres, but it does take a lot of familiarity with how the data is stored to make use of this fact. You also have the ability to make plugins to add new data types and storage methods for whatever use case you need.

I guess to distill the point is that many of the techniques which make NoSQL databases fast can also be used with SQL databases if you know how, and you get the benefit of the decades of research and development that has gone into the traditional SQL engines.

Cool new Free software

dlang — Thu, 20 Dec 2012 19:05:21 +0000

a single write is not guaranteed to complete or fail.

In addition, the OS and disk may re-order your writes so that some parts of your file may be written to disk, but earlier parts of your file are not (i.e. gaps in the file on disk)

since these two assumptions you are making are false, the rest of your discussion falls apart.

Cool new Free software

jch — Thu, 20 Dec 2012 18:37:12 +0000

> Most sane languages with a good RTS and green threads will be able to match the performance of NodeJS for anything involving IO.

We've done some benchmarks comparing threads with (machine-generated) event-driven code. Unsurprisingly enough, the results indicate that well implemented user-space cooperative ("green") threads are just as fast as event-driven code, but use vastly more memory. Surprisingly enough, Linux' NPTL is only some 50% slower, which I find very impressive.

See Figures 4 and 6 on pages 10 and 12 of http://www.pps.univ-paris-diderot.fr/~jch/research/cpc-20...

-- jch

Cool new Free software

pboddie — Thu, 20 Dec 2012 16:41:24 +0000

To get most of the way there, just choose trick #1 in that list. In any book about PostgreSQL performance, that trick should have a chapter by itself consisting of the words "Use COPY instead of INSERT" in very big letters with some smaller print used to clarify stuff like how to do COPY-like operations instead of using the full-privilege COPY, and so on.

Cool new Free software

Wol — Thu, 20 Dec 2012 16:20:42 +0000

Replying to everyone here, but ...

First of all, what's wrong with an ACID NFNF database? ACID has ABSOLUTELY NOTHING to do with relational. Relational (both FNF and NFNF) is mathematical theory. ACID is engineering reality. If I execute my relational database program, my database will reflect that EXACTLY. Any errors are down to a buggy program. If I *run* my database program on a computer, however, reality may intervene and things will go wrogn. ACID is meant to address the vagaries of the real world. One only has to follow what's going on with linux file systems to know that integrity and ACID is very important there!!!

The thing is, because an atomic update in a relational database is a row (indeed, it may even be smaller than that), ACID is extremely important to a relational engine because there are so many points at which failure could corrupt the database. In a Pick database, an atomic update is an object - a complete view - so Pick can *safely* rely on the file system to a far greater extent.

As for NFNF being for database gurus, there's a reason most Pick applications were written by *USERS* - it's actually a lot easier for users than relational FNF, while people who've been indoctrinated in FNF can't get their head round it :-) Plus, for FNF people, it lacks large chunks of what they consider necessary, BECAUSE FNF needs those things to get engineering weaknesses.

Things like integrity, for example. I divide integrity into "strong" and "weak". Attributes that refer to a single instance of an object are strongly related. By storing them in a single RECORD, Pick doesn't need integrity checking. But relational needs to store them across multiple tables and rows, so integrity checking is paramount. Weak integrity (the relationship between two objects) is important to both FNF and NFNF (but, because NFNF doesn't have strong integrity checking, it doesn't tend to have weak checking either).

This is where transactions come in, and MODERN Pick databases have it. But many old applications don't. The nice thing with Pick is I can lock all the *objects* I want to update, then update them in a transaction. So (unless I overflow the lock table) I haven't had to lock an entire table to do a complicated transaction. I've offloaded integrity to the db engine, as you wanted, but because each *object* is an atom, the complexity at the db level is far less, and the chance of a transaction failure is far less, because I'm updating far fewer rows in far fewer tables, for the same change to the db.

Picking up on andresfreund here - Pick does NOT have transactions across a single statement. Why? It doesn't need it! A single statement is always atomic, and ACID will be enforced by the OS. The db engine can ignore the problem. That's my point above, about Pick lacking a lot of "necessities" because the Pick model makes them unnecessary.

As for "no experience of workloads", you did pick up on my comment about a P90? That took 5 mins to run its query (db == UniVerse). The replacement system - the twin Xeon - managed to run the query in 4.5 mins after six months hard work tweaking and optimising (db == Oracle). I guessed (and gather I'm right) that the company in question was William Hill, one of the biggest bookies in the UK.

Cheers,
Wol

Cool new Free software

andresfreund — Thu, 20 Dec 2012 15:52:22 +0000

> How? With an absolutely unoptimized default configuration, each commit involves a sync, which involves waiting for the disk.

unoptimized != default. It was just the system wide installation I had lying around, it sure has some adjustments, just not optimized for anything. I don't know from the top of my head which ones exactly, I can look it up though if you want.

> There's no way you can do more than a few hundred such transactions per second unless you have a battery-backed RAID array so you don't need to wait for on-the-average half a disk rotation to bring the data under the head

Yes, I had synchronous_commit turned off (I do that on all my development systems) thats why I wrote that I had a 1/3 second window of data-loss. Thats ok enough in many scenarios.

You btw. can get into the lower thousands on a single spindle these days (since 9.2) with synchronous commit as well, due to the way WAL flushes are batched/grouped across different sessions. Requires quite a bit of concurrency though.

> (and then possibly another avg half rotation for associated metadata)

PG writes the WAL sequentially into pre-allocated files to avoid that...

> Doing 320,000 transactions per second with PostgreSQL is hard to imagine.

Those were the read only ones. 320k writing transactions to logged (i.e crash-safe) tables is probably impossible independent from the hardware right now due to the locking around the WAL.

Sorry, should have given a bit more context, but it really just was 5 minutes of benchmarking without any seriousness to it.

Cool new Free software

nix — Thu, 20 Dec 2012 15:15:38 +0000

on my workstation from 2009, with other stuff going on, in an absolutely unoptimized configuration, PG does 320k readonly transactions per sec and 140k trivial writing ones.

How? With an absolutely unoptimized default configuration, each commit involves a sync, which involves waiting for the disk. There's no way you can do more than a few hundred such transactions per second unless you have a battery-backed RAID array so you don't need to wait for on-the-average half a disk rotation to bring the data under the head (and then possibly another avg half rotation for associated metadata). Doing 320,000 transactions per second with PostgreSQL is hard to imagine.

I am obviously missing something. (Again. It's not my week...)

Cool new Free software

paulj — Thu, 20 Dec 2012 13:43:19 +0000

At least for me, I'm not at all an expert in how to do transactions. If I wanted to store data, and have it remain self-consistent with some schema, wouldn't it be easiest for me to start with using a DB that already provided consistency-features? It could be that it will be fast enough for my purposes, no? Wouldn't it be premature optimisation for me to go with a system that provided more performance by not providing consistency? While a good developer who's been around data-bases and relational data storage a good while may know how to implement consistency themselves, what are the chances of arbitrary developers being able to better implement consistency themselves than the database developers?

Unless of course the argument is that often the performance-impacting consistency guarantees are not required. But then, aren't ACID DBs tweakable? You can make, e.g., PostGres play a lot more loose with data than it does by default, I gather, and gain a lot of performance.

Cool new Free software

njs — Thu, 20 Dec 2012 12:53:15 +0000

There is no "the kernel" versus "the user process" when it comes to doing things -- there is just a CPU running some code, with different memory access rules. Presumably the way you would tell the kernel about this cool shadow and how to draw it is that you would provide some shadow drawing machine code, and then when the mouse pointer moved the kernel would set up some registers etc. (a "context") and then run this machine code. But now we have just described exactly what user processes are...

When a user process blocks, it doesn't actually sit there as an entity taking up space. It gets converted into a callback attached to whatever event caused it to block.

Cool new Free software

andresfreund — Thu, 20 Dec 2012 12:20:48 +0000

> Transactions are tricky anyway you look at them. Developers not thinking about integrity will end messing up their transactions and leaving things in mid-air. Automatic transactions in a middleware layer (e.g. web services) help, but then those web services have to be designed to encompass atomic transactions. I have seen this situation far too often even in large banks, where system failures resulted in thousands of unfinished transactions that had to be repaired by hand.

Transactions in the usual database system only persist across session end if they are started as two phase commit transactions. Which you only use if *have* to because you need to coordinate critical changes between distributed systems where divergence is not an option (NB: 2PC != 2PL).

> Also, modern databases force everything to be in a transaction, which is horrible to performance. Why force developers to lock tables when sometimes there is no need to?

Uhm. a) Most RDBMS's allow you to have transactions that automatically only span a single statement b) Why do transactions, unless used together with synchronous commits, imply being horribly slow. c) What does that have to do with locking tables? No sensible database forces you to exlusively lock a table to make modifications to the contents of said table. Most require some lock that prevents it from being ALTERed or DROPed, but thats something different.

Just some trivial number towards b), on my workstation from 2009, with other stuff going on, in an absolutely unoptimized configuration, PG does 320k readonly transactions per sec and 140k trivial writing ones. With a data-loss window of 1/3s. Now, I'll freely admit that the latter number only is that high because I didn't create an index but even after that its still 90k.

Now, I know from experience that there are scenarios where RDBMS's don't work very well, but - sorry to be harsh here - I don't see too many people making those 'RDBMS/ACID/whatever sucks' arguments here have actual experience with that kind of workloads. Also note that most shops using fancy KV stores and such *also* use relational databases for parts of their data. Its simply a question of the right tool for the right job.

Disclaimer: I do work with/on/arround PG, so I am certainly biased. But some other RDBMS's are certainly able to produce similar numbers.

Disclaimer-2: Yes, those numbers are not realistic for any real-world workload.

Change of licence

coriordan — Thu, 20 Dec 2012 11:15:22 +0000

Hmm. Thanks for the link.

Cool new Free software

man_ls — Thu, 20 Dec 2012 10:02:55 +0000

The first part looks good to me. Allow me to answer the last part even if the question was not for me specifically.

Would it be fair to say that the NFNF way would be more to suited to DB gurus, while anyone who hasn't spent a good bit of time working with and thinking about multi-update integrity issues might be better off using the ACID DB approach (at least, until update performance is actually proven to be an issue)?

Transactions are tricky anyway you look at them. Developers not thinking about integrity will end messing up their transactions and leaving things in mid-air. Automatic transactions in a middleware layer (e.g. web services) help, but then those web services have to be designed to encompass atomic transactions. I have seen this situation far too often even in large banks, where system failures resulted in thousands of unfinished transactions that had to be repaired by hand.

Also, modern databases force everything to be in a transaction, which is horrible to performance. Why force developers to lock tables when sometimes there is no need to?

I almost prefer that transactions have to be done explicitly so that developers don't have the lazy option of letting the framework do them. On the other hand, NoSQL databases make transactions too hard to do properly. I miss some kind of system where transactions are not mandatory, have to be done explicitly, and can be activated easily.

Cool new Free software

paulj — Thu, 20 Dec 2012 09:51:07 +0000

I'm curious. You describe the problem with FNF as being the complexity of ensuring consistency between multiple writes, with NFNF having the advantage of generally doing single writes. However, you then describe the problem of cases where you may still have to update multiple entities in one transaction even in NFNF (the invoice example), and say it's tricky. How do you deal with this?

It seems to me that the fundamental performance/integrity trade-off and the attendant complexity is there in both cases, but in the FNF, ACID DB the DB handles this for me while with the NFNF DB I have to handle that myself. So the ACID DB way may result in all transactions suffering some performance overhead, from ensuring integrity, but it will stay consistent (if I've done the schema right) and handle those issues for me. While the NFNF DB gives me fast, no-overhead writes, but I have to be aware of and handle the complex, tricky multi-update integrity issues myself. Is that a fair view?

Would it be fair to say that the NFNF way would be more to suited to DB gurus, while anyone who hasn't spent a good bit of time working with and thinking about multi-update integrity issues might be better off using the ACID DB approach (at least, until update performance is actually proven to be an issue)?

Bad change of licence

mlinksva — Thu, 20 Dec 2012 04:40:47 +0000

The developer (Evan) makes exactly the tactically useful argument in https://identi.ca/conversation/97606977

The proprietary software in question is presumably primarily Twitter and Facebook.

Cool new Free software

Wol — Wed, 19 Dec 2012 23:37:45 +0000

I forgot to add. Look at this problem from the APPLICATION'S point of view. Design a (set of) relational tables to store a list of pizzas, and the toppings you need to create those pizzas.

Oh - one of the requirements, for an easily maintainable application, is you MUST NOT MIX data and metadata in the same table. Or store the same data multiple times across several tables.

oh no! I've just specified an impossible requirement!

The problem with mixing data and metadata is that the database has no way of knowing which is which. It's all crucial information, that exists only in the programmer's head. BAD BAD BAD!

But I can do that easily in Pick. Precisely because I can do things in n dimensions, not just 2. The only data I need to store across multiple tables is foreign keys. And even then, I can get away with far less duplication than relational.

Cheers,
Wol

Cool new Free software

Wol — Wed, 19 Dec 2012 23:26:09 +0000

Well, I've looked at ACID and a lot of it is "irrelevant". In quotes, because yes I understand the requirement, but NFNF is just *more* *efficient*.

A lot of ACID and data integrity in FNF databases is needed to make sure a SERIES of updates don't get interrupted and corrupt the data. In an NFNF database, that is ONE update.

Let's say I'm writing a new invoice to my accounting system. There's, say, ten line items, and a new delivery address.

Firstly, an application written to a FNF database is likely to treat the whole thing as a single transaction, with a lot of ACID overhead as it creates a new address in the address table, ten new items in the line-item table, and a new entry in the invoice table, plus updating all the associated ledgers, customer record, etc.

An application written to a Pick database, on the other hand, will probably create the new delivery address as a single, unprotected database update. What's the point of protecting it? So long as a *single* database write is guaranteed to complete or fail, you don't need fancy-shmancy ACID, and a delivery address is a stand-alone entry. Creating an invoice is a bit trickier, not because of integrity in the invoice, but because of the need to update multiple entities in one step.

But again, so long as a *single* write is guaranteed to complete or fail, I don't need fancy-shmancy ACID to create the invoice. It's a single write to the invoice FILE. However, I do need to wrap it in a transaction because I need to update the associated ledgers.

So, as I said before, the point is I need FAR FEWER DISK ACCESSES. Which makes a MAJOR difference to speed, whether reading OR writing. Plus, statistically, I'm far more likely to be reading than writing, and as I said my "intelligent read-ahead" will massively reduce the need for reads. The fact that I'm storing the equivalent of a relational view in a single database "atom" or RECORD (the equivalent of a relational row) massively reduces the need for writes.

FNF does exist for a reason, but it's not the reason you think. It makes the maths solveable. Once you've solved it for FNF, you can then go to NFNF knowing that the MATHS is just the same, but the ENGINEERING is MUCH SIMPLER.

If you want to work in FNF, my NFNF database can give you an FNF view with minimal effort. It probably takes less time to convert (in RAM) an NFNF record into an FNF view than it did to retrieve that record from disk. I can present you with an entire view, in the same time that you take to retrieve a single row from your relational database. (And update that view, in the same time it takes you to write a single row.)

Two points to think about ...

1) Relational works with 2-dimensional tables. (done properly) NFNF works with n-dimensional tables. In maths, the generic always trumps the specific - n-dimensional trumps 2-dimensional.

2) FNF is maths. As Einstein said, "As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality."

I've taken your maths, and applied engineering to it. If you want speed, I can guarantee you WON'T get it with a relational engine, my NFNF engine is a superset (ie it can give you *everything* you get from relational, and more beside, eg speed).

Cheers,
Wol

Cool new Free software

man_ls — Wed, 19 Dec 2012 23:03:34 +0000

I know this is just another load of armchair OS design, and that it will not probably work in real life, but perhaps it is worth a try for somebody.

Remember that this article refers to node.js. People have tried many things over the years in the web server arena, and around 2000 Apache was the clear market leader, with Microsoft a distant second. Apparently everything had been invented already. Then in 2002 a Russian guy decided to start development on a different model. Guess what: 10 years later it is the second web server in actives, ahead of Microsoft IIS. A year ago some crazy guys took Google's V8 engine and coded a web server around it, not in C or C++ but in... JavaScript. Preposterous. The result: the new web server everyone is talking about, capable of sustaining thousands of requests per second out of the box.

Cool new Free software

dlang — Wed, 19 Dec 2012 22:50:15 +0000

Please try doing some GUI programming, your ideas sound good in the abstract, but they fail when you get down to trying to use them

Taking your mouse example, some graphics cards do support hardware mouse cursors, where the card handles moving the mouse pointer.

however, such support really doesn't help much and is fading from use today because nowdays people want to change the shape of the cursor depending on what you are doing, etc.

you also have to think about how the program should know that it should look for an update to the mouse position. you don't want it spinning in a loop when nothing happens.

you also don't just want the current position of the mouse, you want to know it's path, and how fast it's moving (along with what buttons, if any are pressed during any part of this time)

There are pretty good reasons that systems do things the way they do. People have tried a LOT of things over the years, and what's in place may not be the perfect way to do things, but they've survived lots of competing ideas along the way.

Cool new Free software

man_ls — Wed, 19 Dec 2012 20:46:14 +0000

Of course not, spawning a new process for every event is not a good idea. I did not explain myself clearly and got confused in the meantime. Allow me to reboot the thread.

The thing is that an event should only do what it is supposed to do, and nothing else. If the result of a mouse movement is that a counter is updated, then let the kernel do it, and not bother my own process; when my process needs to read the counter it will do that. If the mouse pointer is moved and I want to paint a cool shadow, then ideally I would let the kernel know and the kernel will paint the cool shadow. Many things work like that now; it would be great if it was the default mechanism to answer user events. Look at it as dtrace on steroids, if you want.

For a long task (such as a menu selection), then by all means create a process and let it deal with the event. Otherwise, store the new value and let me query it if and when I need it. Taken to the extreme I don't even need to have running processes except for background tasks.

Cool new Free software

nix — Wed, 19 Dec 2012 19:48:30 +0000

Oh, there certainly can be 10k or more user events per second. Even ignoring the extreme case of network cards (10GbE cards can generate so many packets that you have only a clock tick or so to process each byte), mice can easily generate a few thousand events per second, and eye trackers will probably be even worse. I don't really want my CPU pegged because I waved my mouse around the screen -- not unless it's really got work to do, that is. Processing the mouse events is not work. (Note that if you're running X on a uniprocessor system this probably involves lots of context switches between multiple distinct processes, but even that is many many times faster than process creation.)

The softer version of fork() you refer to is called thread creation. It is faster -- I can generate a few tens of thousands of threads a second on this system -- but still not anywhere near as fast as context switching. You are still throwing performance away for, as far as I can tell, no benefit.

Worse yet -- if you're receiving a stream of events from some input device, they are almost certainly related in some way, and you almost always want to process them in the same process, or at least in related processes. Kicking off a new process for each event seems likely to lead to cursing from developers as they have to marshal the things back together again before they can process them. End result: a slower system, annoyed developers, more complex and buggy code, and... uh... no benefit, as far as I can see. What benefit is all this supposed to bring? Is it just 'more processes -> more isolation -> always necessarily better'? 'cos I used to think like that, and it just isn't so.

Cool new Free software

man_ls — Wed, 19 Dec 2012 19:47:26 +0000

Way to go then :) Thanks for your patience.