Kuhn: Copyleft Won't Solve All Problems, Just Some of Them

Posted Mar 21, 2022 19:42 UTC (Mon) by Cyberax (✭ supporter ✭, #52523)
In reply to: Kuhn: Copyleft Won't Solve All Problems, Just Some of Them by Wol
Parent article: Kuhn: Copyleft Won't Solve All Problems, Just Some of Them

> Then please. Did you see my Gedanken experiment where - was it mpr22 - gave me a typical query of his?

Try examples from, e.g. here: https://www.databasestar.com/complex-sql-query-example/#f...

I've seen analytic queries with dozens of joins, but extracting these kinds of schemas is impractical.

Kuhn: Copyleft Won't Solve All Problems, Just Some of Them

Posted Mar 21, 2022 23:34 UTC (Mon) by Wol (subscriber, #4433) [Link]

Okay. Just taking the first example I saw, the book store. Let's quickly summarize the ERD.

Publisher is Publisher.

Book_author and Book_language are attributes of Book, so they would go in the same FILE (table).

Author is author.

In this case, I'd say Customer_address is an attribute of Customer rather than an entity in its own right, so both of them, together with address status, would go in the same FILE. I see here, the ERD disagrees with me. Likewise, I probably wouldn't break Country out into its own FILE.

Do I treat order_line as an entity (in which I case I would say it belongs in the sales ledger), or as an attribute of Order? I'll treat it as an attribute of Order, so along with shipping_method, it belongs in the Order FILE. Looking at order_history and order_status, I think they belong in the same FILE, too.

Note that if I've got a one->many relationship, I list all the sub-table fields in an ASSOCiation, and that tells Pick it is a "table within a table".

"Some parts look simple. Others may not be so simple". I'm thinking the same about Pick ... :-)

Note that I'm going to use the LIST command, not the SELECT. Under the hood, they're pretty much the same (much like argc[0] makes the same executable do different things in Unix). But the LIST command actually outputs the data like SQL SELECT, the PICK SELECT just collects primary keys behind the scenes.

LIST cust_order BY order_date order_date

LIST cust_order COUNT order_date BY order_date BREAK.ON order_date

It didn't cross my mind that date might be a timestamp, not a date, but let's create a calculated column that converts timestamp to date ... (The conversion code stands for Date, Day Month Year)
DATE = OCONV(order_date,"DDMY"). Rather than creating a new column, I could always just put that in the query with the EVAL keyword - I'll use both variants in the next query ...

LIST cust_order COUNT DATE BY EVAL('OCONV(order_date,"DDMY")) BREAK.ON DATE

I almost never used EVAL and that stuff so I can't remember the AS syntax, but it's probably the same ...

LIST cust_order COUNT DATE AS "Orders" BY DATE

The next one REALLY shows the difference between Pick and Relational ...

LIST cust_order COUNT DATE AS "Orders" COUNT book_id AS "Books" BY DATE BREAK.ON DATE

Here I would have to experiment slightly. I think "COUNT book_id" works, but it might have to be SUM EVAL( DCOUNT( book_id, @VM)). The DCOUNT stuff counts the books per order - I could put that as a calculated field in the dictionary, if I wanted.

And note that I am still only reading one row per cust_order - and not one per order plus one per book!

LIST cust_order COUNT DATE AS "Orders" COUNT book_id AS "Books" SUM price AS "Total Price" BY DATE BREAK.ON DATE

The next bit, the running total, makes me go OUCH! I really don't have a clue how to handle it in the query language, but it's pretty easy to do. Define a function RUN_AND_BREAK( var_to_sum, var_to_break) in BASIC. It just sums var_to_sum, and resets the count every time var_to_break changes. It stores all this in named common, so be careful!

LIST cust_order COUNT DATE AS "Orders" COUNT book_id AS "Books" SUM price AS "Total Price" EVAL(SUBR(*RUN_AND_BREAK(DCOUNT(book_id, @VM),OCONV(DATE,"DM")))) AS "Running Total" BY DATE BREAK.ON DATE

As before, I can just create a virtual column RUNNING.TOTAL and put all that complicated mess in it :-)

And I've just found the "stored procedure". That makes me happy. Pick doesn't have a LAG function, but now I've got SQL stored procedures, it makes me much happier writing PickBASIC functions. Unlike SQL however, I can make my custom LAG look back 7 days, not 7 rows ... But you'll counter, quite fairly, that I've written custom code ...

Now to explain ...

Unlike SQL, I'm just scanning one table! So I can't add an index to improve join performance :-) Unlike MySQL, however, I can create an index on almost any information in my FILE - I just need to create a virtual column and I can index it, so I could easily index month, for example, if I wanted - OCONV( order_date, "DY4M2").

The way I would optimise this, is to create an index on DATE. That way, when I order BY DATE, the LIST will start by scanning and sorting the index, then retrieving all the keys from the index in sorted order. So running the report is just doing a sequential scan of the cust_order FILE by pre-sorted id. So especially if I'm only reporting a small subset of a large file, the ONLY table I've had to scan in full is the date index. (And depending how that index is organised, most recent Picks use a btree I believe, and can just scan the keylist to retrieve a pointer to the index data.)

And if I had split order_line out into its own FILE, I would have stored all the keys in the cust_order FILE, so at no point would I have had to scan the cust_order FILE, I would have just declared price as a TRANS in the cust_order FILE and Pick would have taken care of all the details, retrieving the data directly with a single disk hit.

(Oh, and I didn't add that Pick, for the most part, is order-insensitive. "LIST cust_order BY DATE fields_to_list" or "LIST cust_order fields_to_list BY DATE" Pick couldn't care.)

Cheers,
Wol