Is this SQL databases or No-SQL?
Is this SQL databases or No-SQL?
Posted Mar 28, 2014 19:33 UTC (Fri) by Wol (subscriber, #4433)In reply to: Is this SQL databases or No-SQL? by intgr
Parent article: A discussion between database and kernel developers
Urmmm. No.
What I mean by "keying" is sort-of what an RDBMS would call a primary key. In Pick, a primary key is mandatory. And that key contains all the information required for Pick to go straight to the place on disk where that row is stored. Pick files use linear hashing, so basically it's a modern hashed file. And unlike in a traditional hashed file, it only requires rewriting about three disk blocks to change the modulo. So it's FAST.
(So basically, the difference between a key and an index may be minor but it's crucial. An index tells you the key, while a key directly tells you where the data is. And again, Pick gives you rope. Screw up your primary keys and you screw up the hashing, which cripples performance. But in modern implementations that is HARD.)
> This is true, but you're only talking about OLTP performance, with the assumption that every access to an invoice needs to read all the invoice lines and vice versa. It's true that a system like Pick will be far more efficient RDBMSes in this scenario, but in the all the systems I work on, OLTP queries are not the bottleneck (unless you do crazy things like don't index your data properly). It's the more complex queries that kill you: find all customers that have item X on their invoice; give me a breakdown of the costs of all invoice lines by item type. That's where query optimizers really shine. All evidence I've seen suggests that Pick deals terribly with these.
Erm - no. Assuming I've understood what you want, it's a sequence of commands in Pick (and no, a Pick SELECT and a SQL SELECT are not equivalent).
SELECT INVOICES WHERE ITEM-TYPE EQ "X". If I've got an index on X, that's a single request to disk. That gives me a list of all invoices.
SELECT INVOICES SAVING CUST-ID. Okay, I have to go down my list of invoices but, assuming they're scattered randomly on disk that's 105 disk seeks per 100 invoices.
Okay, I can't remember the syntax for the next command, but it's something like
SELECT INVOICES WITH CUST-ID IN CURRENT SELECTLIST. If we have an index on CUST-ID, once again that's 105 seeks per 100 customers.
I've now got a list of invoices to report on ...
LIST INVOICES BY.EXP ITEM-TYPE TOTAL PRICE
Note that absolutely *nowhere* have I had to search for any information whatsoever. The user provided the initial "X", and every single data access from then on has been keyed - with a near perfect disk hit rate. I've had to throw away about 5% of my disk seeks because they failed to hit their target.
There are two places you might score on me. Sorting and collating the final report is one. But if you can't fit all the invoice details in ram I think I'm going to fly past you. I don't know the internals of the sort, but I suspect the systems I'm used to scan through each invoice in turn, creating an (invoice, line, price) tuple, and then sort that. They will then scan through that sorted list building up the report as they go. So they have to read each invoice at least twice, and once more for each extra relevant line.
But if it all fits in ram, the OS disk caching will help me, just as much as being able to store the entire report in ram will help you.
The other place you will score is indices. Your optimiser will spot it needs to scan INVOICES multiple times, and it will build indices on the fly if they don't exist. Pick assumes the user knows what s/he is doing, and gives you the rope to hang yourself.
Plus, you say "that's only relevant for OLTP" - erm no there too. Okay, it's only Monte Carlo probability, but in pretty much ALL scenarios the cost of retrieving that extra data is minimal, while the chance of it being required is relatively high. So much so that it's well worth it. If the system has been designed by a "practitioner in the art" (in this case an accountant), they will have subconsciously skewed the definition of an object such that this is pretty much inevitable.
As an aside, it was mentioned way back in the thread that PostgreSQL often requires gigabytes of temporary storage. I just cannot CONCEIVE of Pick needing that much, even if handling those sort of quantities of data!
> But there are real advantages to normalizing (with exceptions of course).
I couldn't agree more !!! But not in storing the data in First Normal Form. In fact, if a Pick developer did not normalise the data ON PAPER before designing his Pick schema, I would consider him negligent!
> Even in development, I frequently whip up multi-line queries to look for anomalies in the data. It often just blows me away how these queries complete in seconds, despite churning through gigabytes of data and without any advance planning to make them fast.
> In other words: you're worrying about optimizing something that's already more than fast enough. I worry about optimizing the things that are hard to optimize.
To quote from the Pick FAQ - "SQL optimises the easy task of finding things in memory, Pick optimises the hard task of getting them from disk into memory".
And to repeat my Xeon 800 war story - this Oracle system was designed to replace the old Pick system. It took Oracle Consultants some SIX MONTHS of hard work before it could run that query faster than the Pentium 90 system it was replacing ...
> That may be true for Oracle, but the page layout of PostgreSQL is documented. http://www.postgresql.org/docs/current/static/storage-pag... The difference from Pick is that the storage layout is not part of the API, but just an implementation detail. If the developers need to change it in a future version, for performance or other reasons, they can.
My point is that you're not supposed to know that information, not that you can't have insider knowledge. And if developers do need to change it, you have no recourse when all your finely tuned racing queries suddenly become bogged down in treacle. It's not their fault if your system is suddenly unusably slow and you have six months work (or more) retuning the system to get response times back to acceptable. Okay, that's unlikely to happen, but Oracle customers complain about it often enough. It just can't happen with Pick.
Cheers,
Wol
