Kuhn: Copyleft Won't Solve All Problems, Just Some of Them

Posted Mar 21, 2022 13:45 UTC (Mon) by excors (subscriber, #95769)
In reply to: Kuhn: Copyleft Won't Solve All Problems, Just Some of Them by Wol
Parent article: Kuhn: Copyleft Won't Solve All Problems, Just Some of Them

> My blinding flash of inspiration, thanks to nix, was that the second law of thermodynamics proves that a set has far more entropy than a matrix. Therefore Pick MUST, as a matter of Physics, be a much more information-dense environment.
>
> So when Physics says you can sack two thirds of your relational guys and replace your system with Pick, surely you should take it seriously?

I suspect people would take you more seriously if you tried to sound less like a total crank :-) . Working code and benchmarks and technical documentation would be much more convincing than reams of pseudoscientific nonsense.

Kuhn: Copyleft Won't Solve All Problems, Just Some of Them

Posted Mar 21, 2022 14:51 UTC (Mon) by Wol (subscriber, #4433) [Link] (4 responses)

I appreciate I tend to use Gedanken experiments much more than real ones, but my statement about the Second Law of Thermodynamics, Entropy, and comparing Sets to Matrices is a statement of Pure Maths.

You cannot benchmark a Mathematical Theorem.

I stated that in twenty years of Pick programming, I've never met a complex Pick query stretching over more than a couple of lines. But in six months of SQL programming they're pretty much all complex monsters. Either I'm a liar or that's pretty damning! Either way, it's evidence.

And it's very noticeable when discussing performance, I'm pretty eager to do a Gedanken benchmark experiment - yet the response from the relational guys is ... crickets ... (take my word on this, but real experiments show Gedanken is accurate to about 5%).

At the end of the day, if you're going to fight conventional wisdom, OF COURSE you're going to come over as a crank. It's inevitable.

The relational guys love to argue maths. I'm trying to counter with Maths. The Mathematics of Entropy says Pick is objectively a better model - no experiment required.

You can't counter a mathematical theorem with empirical facts - Newton's Laws Of Motion are mathematically totally solid. It's just that Reality said "Einstein did a better job".

Both the Pick and Relational models are theoretically sound - indeed if a Pickie doesn't use relational analysis he's an idiot. But the Second Law of Thermodynamics says that Pick is objectively more efficient. And all my career's experience bears that out in spades.

Please. Look at the MATHS. Look at the ENTROPY. Look at the evidence (yes you'll initially have to trust my claims - verify them when you get the chance). Then see what you think.

Cheers,
Wol

Kuhn: Copyleft Won't Solve All Problems, Just Some of Them

Posted Mar 21, 2022 15:47 UTC (Mon) by nix (subscriber, #2304) [Link] (3 responses)

> You cannot benchmark a Mathematical Theorem.

RDBMSes are not mathematical theorems. JIT-capable interpreters are not mathematical theorems, and it is perfectly possible and indeed routine to run experiments to determine the performance characteristics of specific programs on them.

This obviates the rest of what you wrote.

> I stated that in twenty years of Pick programming, I've never met a complex Pick query stretching over more than a couple of lines. But in six months of SQL programming they're pretty much all complex monsters. Either I'm a liar or that's pretty damning! Either way, it's evidence.

To me this seems just as likely to be evidence that implementing complex stuff in Pick is so hard and the model is so inexpressive that nobody even tries, so the maximum complexity of stuff written in Pick is bounded above at a very low level. Nobody's written word processors in assembly since WordStar, either (and indeed writing it in assembly is what eventually killed it). The maximum observed complexity of stuff written directly in raw assembly is fairly low, and that's not any sort of indication that assembly is somehow massively more expressive and capable than higher-level languages.

Kuhn: Copyleft Won't Solve All Problems, Just Some of Them

Posted Mar 21, 2022 18:40 UTC (Mon) by Wol (subscriber, #4433) [Link] (2 responses)

> > You cannot benchmark a Mathematical Theorem.

> RDBMSes are not mathematical theorems. JIT-capable interpreters are not mathematical theorems, and it is perfectly possible and indeed routine to run experiments to determine the performance characteristics of specific programs on them.

Then please. Did you see my Gedanken experiment where - was it mpr22 - gave me a typical query of his?

Please take that same query, and do the same Gedanken experiment.

> This obviates the rest of what you wrote.

I appreciate I come over as a bit of fanatic. As indeed I am.

But you said you were frustrated that Relational *should* be able to take advantage of opportunities to speed up, and yet it didn't. My Gedanken experiment shows clearly that there is *no* *room* available for Pick to speed up. I would love to see a similar experiment with a Relational database ...

Show me you don't have a closed mind. Take mpr22's query. Tell me how many requests a Relational database will pass to the underlying OS to get the data. Now compare how many I needed with Pick (a full index scan of renewal dates, cherry-picking the ledger by customer id, then one read per ledger entry for the selected customers). Beat that with Relational!

> > I stated that in twenty years of Pick programming, I've never met a complex Pick query stretching over more than a couple of lines. But in six months of SQL programming they're pretty much all complex monsters. Either I'm a liar or that's pretty damning! Either way, it's evidence.

> To me this seems just as likely to be evidence that implementing complex stuff in Pick is so hard and the model is so inexpressive that nobody even tries, so the maximum complexity of stuff written in Pick is bounded above at a very low level. Nobody's written word processors in assembly since WordStar, either (and indeed writing it in assembly is what eventually killed it). The maximum observed complexity of stuff written directly in raw assembly is fairly low, and that's not any sort of indication that assembly is somehow massively more expressive and capable than higher-level languages.

As I said before, converting the Pick data model to the Relational one is as simple as hiding the primary key, and restricting the dimensions to 1. It's so easy, that every modern Pick database has a SQL layer built on it as a matter of course. But practitioners would much rather use the underlying Pick features, than ask Maxwell's demon to put back all the information that converting from Pick to SQL lost. 90% of your complex SQL query is Maxwell's demon hard at work.

Nix. I don't want to get into an argument with you. I respect you too much. PLEASE do that Gedanken experiment on mpr22's query.

Cheers,
Wol

Kuhn: Copyleft Won't Solve All Problems, Just Some of Them

Posted Mar 21, 2022 19:42 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> Then please. Did you see my Gedanken experiment where - was it mpr22 - gave me a typical query of his?

Try examples from, e.g. here: https://www.databasestar.com/complex-sql-query-example/#f...

I've seen analytic queries with dozens of joins, but extracting these kinds of schemas is impractical.

Kuhn: Copyleft Won't Solve All Problems, Just Some of Them

Posted Mar 21, 2022 23:34 UTC (Mon) by Wol (subscriber, #4433) [Link]

Okay. Just taking the first example I saw, the book store. Let's quickly summarize the ERD.

Publisher is Publisher.

Book_author and Book_language are attributes of Book, so they would go in the same FILE (table).

Author is author.

In this case, I'd say Customer_address is an attribute of Customer rather than an entity in its own right, so both of them, together with address status, would go in the same FILE. I see here, the ERD disagrees with me. Likewise, I probably wouldn't break Country out into its own FILE.

Do I treat order_line as an entity (in which I case I would say it belongs in the sales ledger), or as an attribute of Order? I'll treat it as an attribute of Order, so along with shipping_method, it belongs in the Order FILE. Looking at order_history and order_status, I think they belong in the same FILE, too.

Note that if I've got a one->many relationship, I list all the sub-table fields in an ASSOCiation, and that tells Pick it is a "table within a table".

"Some parts look simple. Others may not be so simple". I'm thinking the same about Pick ... :-)

Note that I'm going to use the LIST command, not the SELECT. Under the hood, they're pretty much the same (much like argc[0] makes the same executable do different things in Unix). But the LIST command actually outputs the data like SQL SELECT, the PICK SELECT just collects primary keys behind the scenes.

LIST cust_order BY order_date order_date

LIST cust_order COUNT order_date BY order_date BREAK.ON order_date

It didn't cross my mind that date might be a timestamp, not a date, but let's create a calculated column that converts timestamp to date ... (The conversion code stands for Date, Day Month Year)
DATE = OCONV(order_date,"DDMY"). Rather than creating a new column, I could always just put that in the query with the EVAL keyword - I'll use both variants in the next query ...

LIST cust_order COUNT DATE BY EVAL('OCONV(order_date,"DDMY")) BREAK.ON DATE

I almost never used EVAL and that stuff so I can't remember the AS syntax, but it's probably the same ...

LIST cust_order COUNT DATE AS "Orders" BY DATE

The next one REALLY shows the difference between Pick and Relational ...

LIST cust_order COUNT DATE AS "Orders" COUNT book_id AS "Books" BY DATE BREAK.ON DATE

Here I would have to experiment slightly. I think "COUNT book_id" works, but it might have to be SUM EVAL( DCOUNT( book_id, @VM)). The DCOUNT stuff counts the books per order - I could put that as a calculated field in the dictionary, if I wanted.

And note that I am still only reading one row per cust_order - and not one per order plus one per book!

LIST cust_order COUNT DATE AS "Orders" COUNT book_id AS "Books" SUM price AS "Total Price" BY DATE BREAK.ON DATE

The next bit, the running total, makes me go OUCH! I really don't have a clue how to handle it in the query language, but it's pretty easy to do. Define a function RUN_AND_BREAK( var_to_sum, var_to_break) in BASIC. It just sums var_to_sum, and resets the count every time var_to_break changes. It stores all this in named common, so be careful!

LIST cust_order COUNT DATE AS "Orders" COUNT book_id AS "Books" SUM price AS "Total Price" EVAL(SUBR(*RUN_AND_BREAK(DCOUNT(book_id, @VM),OCONV(DATE,"DM")))) AS "Running Total" BY DATE BREAK.ON DATE

As before, I can just create a virtual column RUNNING.TOTAL and put all that complicated mess in it :-)

And I've just found the "stored procedure". That makes me happy. Pick doesn't have a LAG function, but now I've got SQL stored procedures, it makes me much happier writing PickBASIC functions. Unlike SQL however, I can make my custom LAG look back 7 days, not 7 rows ... But you'll counter, quite fairly, that I've written custom code ...

Now to explain ...

Unlike SQL, I'm just scanning one table! So I can't add an index to improve join performance :-) Unlike MySQL, however, I can create an index on almost any information in my FILE - I just need to create a virtual column and I can index it, so I could easily index month, for example, if I wanted - OCONV( order_date, "DY4M2").

The way I would optimise this, is to create an index on DATE. That way, when I order BY DATE, the LIST will start by scanning and sorting the index, then retrieving all the keys from the index in sorted order. So running the report is just doing a sequential scan of the cust_order FILE by pre-sorted id. So especially if I'm only reporting a small subset of a large file, the ONLY table I've had to scan in full is the date index. (And depending how that index is organised, most recent Picks use a btree I believe, and can just scan the keylist to retrieve a pointer to the index data.)

And if I had split order_line out into its own FILE, I would have stored all the keys in the cust_order FILE, so at no point would I have had to scan the cust_order FILE, I would have just declared price as a TRANS in the cust_order FILE and Pick would have taken care of all the details, retrieving the data directly with a single disk hit.

(Oh, and I didn't add that Pick, for the most part, is order-insensitive. "LIST cust_order BY DATE fields_to_list" or "LIST cust_order fields_to_list BY DATE" Pick couldn't care.)

Cheers,
Wol

Kuhn: Copyleft Won't Solve All Problems, Just Some of Them

Posted Mar 21, 2022 14:57 UTC (Mon) by Wol (subscriber, #4433) [Link]

Oh. I forgot. Two (well sort of) benchmarks.

I'm pretty certain the company was William Hill the Bookies. Six months of effort by consultants to get a complex query to run faster on Oracle / Twin Xeon 800 than the Pick / Pentium 90 system it was replacing ... ??????

And an Oracle / Cache shootout - Oracle struggled to meet the 100K insertiion target speed, Cache breezed through 250K pretty much from installation.

(Yes Cache isn't Pick. But the data model is similar.)

Cheers,
Wol