LWN: Comments on ""Big data" features coming in PostgreSQL 9.5"

abbreviated keys for BYTEA?

petergeoghegan — Fri, 05 Feb 2016 11:17:23 +0000

This exists now: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;...

"Big data" features coming in PostgreSQL 9.5

ron.dunn — Sun, 16 Aug 2015 00:24:22 +0000

Ajilius is a data warehouse automation product that builds star schemas for PostgreSQL and EnterpriseDB. We've seen great performance gains, processing large dimensions, from the indexing and sorting enhancements. The OLAP functions are nice, but possibly too late as their value has been subsumed by the in-memory engines of BI products like Qlik and PowerBI.

"Big data" features coming in PostgreSQL 9.5

nix — Sun, 09 Aug 2015 12:20:31 +0000

I think the abbreviation is helpful but perhaps not the primary reason why strxfrm() works here but not for sort(1). The biggest problem with strxfrm() is that the blobs are so much larger than the input that the dcache hit simply ruins builds and comparisons using it: you blow the dcache building the blobs and then blow it again using them. Hence the cutover point from cache-dominated to memory-dominated comes much earlier when you're using strxfrm() blobs than when you're using straight strcoll(), which is a killer for things like sort(1) which are massively affected by cache, and repeatedly so, and tries to avoid hitting the disk.

Index building... is of a different order. The memory hierarchy is less important, because everything is hitting the disk anyway, and because you abbreviate the blobs their horrendous size is reduced to something well below one cacheline for every blob.

(This has been a genuine guess: no numbers or experiments were conducted and no strxfrm() blobs were harmed in the making of this post. But I have tried to use strxfrm() before, myself, long ago, and come to similar conclusions back then, though it was the Solaris strxfrm() bugs that really killed that idea.)

"Big data" features coming in PostgreSQL 9.5

petergeoghegan — Sat, 08 Aug 2015 07:24:38 +0000

strxfrm() blobs are usually a little over 3 times as large as the original strings with glibc. As I go into in my blog post about abbreviated keys, this is because there are multiple "levels" (at least 3) in the blob, demarcated by sentinel bytes, whose order relative to each other relates to how heavily some aspect of a code point should be weighed. For example, with Latin scripts, primary alphabetical ordering is represented at the primary level (case and punctuation are represented at subsequent levels). You get a far higher concentration of entropy in the first 8 bytes than (say) the last 8 bytes. Sometimes that concentration is much higher than you'd expect, since (for example) whitespace and diacritics are also not represented at the primary weight level. Also, by just comparing the first 8 bytes, a significant amount of pointer chasing is avoided (other indirection is avoided too).

The idea that strxfrm() is not generally useful seems dubious, even leaving aside a technique like abbreviated keys. It's in the C standard. And Certainly, ICU offers something that's similar but very much more advanced. The glibc docs strongly suggest using strxfrm() where the space overhead is acceptable during a sort of a non-trivial number of strings.

"Big data" features coming in PostgreSQL 9.5

kleptog — Sat, 08 Aug 2015 06:01:58 +0000

I think it also has to do with the fact that in PostgreSQL the abbreviated keys approach allows the removal of a lot of indirect function calls from tight loops. In specialised code that knows beforehand that it's sorting text data replacing strcoll() with strxfrm() might not be that much of a win. But in the general PostgreSQL sorting code there is always an extra layer of indirection (for example TEXT values internally are not null-terminated, so you have to compensate for that).

"Big data" features coming in PostgreSQL 9.5

jberkus — Fri, 07 Aug 2015 21:20:04 +0000

I think the difference is the abbreviated keys approach.

abbreviated keys for BYTEA?

jberkus — Fri, 07 Aug 2015 21:17:41 +0000

Yes, they are related. Although if you're just trying to get an exact match on a checksum, you might consider using GIN indexes instead. For a simple scalar, GIN indexes in Postgres behave a lot like hash indexes.

Anyway, I encourage you to bring up the idea of BYTEA on a PostgreSQL mailing list. Nobody's opposed to extending abbreviated keys further, we just ran out of time for 9.5.

abbreviated keys for BYTEA?

zack — Fri, 07 Aug 2015 18:58:10 +0000

> I guess I don't understand why you'd care that much about abbreviated key sorting on checksums? It's not like you'd care about soriting them. I guess just because of index build time?

Index (and specifically b-tree) build time and maintenance are my main concerns, yes.

But I was also under the impression that abbreviated keys are relevant also for (b-tree) index lookups and uniqueness constraint verifications, due to the comparisons needed to get down to the actual indexes values, no matter how shallow the index is. Maybe that's a wrong impression of mine?

"Big data" features coming in PostgreSQL 9.5

nix — Fri, 07 Aug 2015 17:12:39 +0000

Hm. The glibc folks, last I heard, were mostly under the impression that strxfrm() was pretty much useless (as was I, to be honest). Now that a use has been found, someone should drop a note to the libc-alpha list to inform them that it is not as useless as all that!

(e.g. Paul Eggert discussing a failed attempt to use it in GNU sort in <https://sourceware.org/ml/libc-alpha/2014-11/msg00673.html>.)

I wonder why it was too slow for GNU sort while simultaneously being fast enough for PostgreSQL? It's not like it's sped up noticeably in the intervening years.

Tablesample with proportion really constant time?

smurf — Fri, 07 Aug 2015 15:29:59 +0000

AFAIU the random sampling itself (i.e. deciding which entries to take) would take const time. Retrieving the actual data, obviously not.

Tablesample with proportion really constant time?

epa — Fri, 07 Aug 2015 09:32:19 +0000

Thanks. I just meant that if you want constant time, you need to specify something other than the '0.001' proportion in the reviewer's example. It would have to be tablesample (1000 rows) or something similar, I'm not sure of the exact syntax.

"Big data" features coming in PostgreSQL 9.5

xtifr — Thu, 06 Aug 2015 22:52:44 +0000

Er, I meant to say, "faster sorting" sounds like a feature for everyone, not, "faster indexing". In case it wasn't obvious. Had indexes on the brain.

abbreviated keys for BYTEA?

jberkus — Thu, 06 Aug 2015 19:23:16 +0000

I guess I don't understand why you'd care that much about abbreviated key sorting on checksums? It's not like you'd care about soriting them. I guess just because of index build time?

If so, well, "patches welcome". Probably what you'd want to do with BYTEA is compare the first 8 bytes, sort, then compare the full values to break ties.

On the other hand, you could just store the checksums as NUMERICs, if you are fine with converting to hex and back.

Tablesample with proportion really constant time?

jberkus — Thu, 06 Aug 2015 18:47:25 +0000

That's right.

And it's not precisely constant time; it will take longer to pull 100 rows out of a billion row table than a million row table. However, the increase in time will be incremental (and small) instead of a multiple of the original request, since we're just looking up data by pageID.

For example, I tested SYSTEM between returning 100 rows from a 100000 row table vs. a million row table. Regardless of which table I used, the difference in request time was below significance thresholds. However, with BERNOULLI, the thousands table took around 5ms, whereas the millions table took around 14ms.

"Big data" features coming in PostgreSQL 9.5

jberkus — Thu, 06 Aug 2015 18:35:10 +0000

I'm not.

However, LWN has a lot of Australian readers, and while I was writing this article, a bunch of my friends were at PyCon.AU and DjangoCon.AU. Hence the use of Australia as the 2nd set of cities.

FWIW, the data was generated using skewed random functions, so it's inherently meaningless.

abbreviated keys for BYTEA?

zack — Thu, 06 Aug 2015 13:19:47 +0000

> Andrew's patch is specific to numeric.

Thanks for your answer!

> Do you regularly sort/index large amount of bytea values?

I'm storing lots of checksums (of various kinds: sha1, sha256 for now), in the order of a few billion entries.

I haven't yet firmly chosen the postgres datatype to do that.

On the one hand, I'm inclined to implement a custom data type right away. It is my understanding that while doing that one can plug into the new sort support facilities that give the benefits of abbreviated keys (right? :-)).

On the other hand, if an appropriate built-in data type (such as BYTEA or its variants) have already support for abbreviated keys, that would be a good incentive to start with it, and migrate to a custom data type only later.

Tablesample with proportion really constant time?

dskoll — Thu, 06 Aug 2015 11:50:13 +0000

I think it means that if you pick M rows out of N with N much bigger than M, the time is dependent on M only and not on N. At least, that's how I read it.

It's not constant wrt to the percentage of rows, of course, becuase in that case M depends on N.

abbreviated keys for BYTEA?

andresfreund — Thu, 06 Aug 2015 09:53:11 +0000

Andrew's patch is specific to numeric.

I guess you could devise something for bytea as well, but it'd have to look a bit different. Actually it'd be much closer to the abbreviated key logic for text than to numeric. Just without having to care about locales. With numeric you have to care about NaN and such.

Do you regularly sort/index large amount of bytea values?

Tablesample with proportion really constant time?

epa — Thu, 06 Aug 2015 09:34:26 +0000

Which will return around 0.001%, or 100, of the rows in the table. In the query above, SYSTEM is the name of the chosen sampling algorithm. The SYSTEM algorithm chooses a set of pseudo-random data pages, and then returns all rows on those pages, and has the advantage in running in constant time regardless of the size of the table.

Surely not! 0.001% of a million-row table must take longer to fetch than 0.001% of a ten-row table. It would run in constant time if you specified a fixed number of rows, but not for a proportion of the total size.

abbreviated keys for BYTEA?

zack — Thu, 06 Aug 2015 07:56:40 +0000

From the text (and the patch diff, but I'm not fluent in postgres internals) is not clear to me whether Gierth's improvement means that BYTEA (byte arrays) values will also benefit from abbreviated keys.
Does any fellow LWN reader know?

"Big data" features coming in PostgreSQL 9.5

JdGordy — Thu, 06 Aug 2015 05:51:09 +0000

scolled past the article as I'm not really interested in postgres.. saw the tables and that Melbourne was behind Brisbane (which is pretty unlikely given the relative size of the cities) and wanted to make sure my location was correct incase I was registered as **shudder** sydney or something.... and then...

"Note: the above subscription data is fictitious and not related to LWN's actual subscription data." :)

Is Josh an aussie expat?

"Big data" features coming in PostgreSQL 9.5

kleptog — Wed, 05 Aug 2015 20:56:12 +0000

> BRIN indexes are niche! It's just a fairly common and very important niche. :)

Yes! One thing is saw in the initial discussions but I'm not seeing mentioned anywhere is that the theory behind BRIN indexes should be able to be applied to support Bloom filters. And they are seriously useful, especially for high cardinality columns which sort really badly. Btrees don't do nicely on those kind of columns. I'm hoping that they haven't been forgotten.

"Big data" features coming in PostgreSQL 9.5

corbet — Wed, 05 Aug 2015 20:28:40 +0000

Yes and yes.

"Big data" features coming in PostgreSQL 9.5

post-factum — Wed, 05 Aug 2015 20:19:59 +0000

Does LWN use PostgreSQL or that were purely fictitious examples?

"Big data" features coming in PostgreSQL 9.5

xtifr — Wed, 05 Aug 2015 20:13:19 +0000

BRIN indexes are niche! It's just a fairly common and very important niche. :)

Faster indexing and, to a lesser extent, BRIN both sound like features that will seriously benefit most users, not just Big Data.

PostgreSQL used to have the reputation of being the slow-but-reliable FOSS dbms. The reliable part may have taken a minor ding with the recent multixact problems, but I think they can get over that. I think the slow part is already much less true than it used to be, but I still approve of any and all speedups they can add. If they became the faster-and-more-reliable FOSS dbms, I wouldn't mind one bit!

(I really hope the planned Parallel Table Scan thing works out well, and provides a decent performance boost as well.)