LWN: Comments on ""Big data" features coming in PostgreSQL 9.5" https://lwn.net/Articles/653411/ This is a special feed containing comments posted to the individual LWN article titled ""Big data" features coming in PostgreSQL 9.5". en-us Sat, 11 Oct 2025 04:33:24 +0000 Sat, 11 Oct 2025 04:33:24 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net abbreviated keys for BYTEA? https://lwn.net/Articles/674679/ https://lwn.net/Articles/674679/ petergeoghegan <div class="FormattedComment"> This exists now: <a href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b47b4dbf683f13e6ef09fa0d93aa6e84f3d00819">http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;...</a><br> </div> Fri, 05 Feb 2016 11:17:23 +0000 "Big data" features coming in PostgreSQL 9.5 https://lwn.net/Articles/654640/ https://lwn.net/Articles/654640/ ron.dunn <div class="FormattedComment"> Ajilius is a data warehouse automation product that builds star schemas for PostgreSQL and EnterpriseDB. We've seen great performance gains, processing large dimensions, from the indexing and sorting enhancements. The OLAP functions are nice, but possibly too late as their value has been subsumed by the in-memory engines of BI products like Qlik and PowerBI.<br> </div> Sun, 16 Aug 2015 00:24:22 +0000 "Big data" features coming in PostgreSQL 9.5 https://lwn.net/Articles/653971/ https://lwn.net/Articles/653971/ nix <div class="FormattedComment"> I think the abbreviation is helpful but perhaps not the primary reason why strxfrm() works here but not for sort(1). The biggest problem with strxfrm() is that the blobs are so much larger than the input that the dcache hit simply ruins builds and comparisons using it: you blow the dcache building the blobs and then blow it again using them. Hence the cutover point from cache-dominated to memory-dominated comes much earlier when you're using strxfrm() blobs than when you're using straight strcoll(), which is a killer for things like sort(1) which are massively affected by cache, and repeatedly so, and tries to avoid hitting the disk.<br> <p> Index building... is of a different order. The memory hierarchy is less important, because everything is hitting the disk anyway, and because you abbreviate the blobs their horrendous size is reduced to something well below one cacheline for every blob.<br> <p> (This has been a genuine guess: no numbers or experiments were conducted and no strxfrm() blobs were harmed in the making of this post. But I have tried to use strxfrm() before, myself, long ago, and come to similar conclusions back then, though it was the Solaris strxfrm() bugs that really killed that idea.)<br> </div> Sun, 09 Aug 2015 12:20:31 +0000 "Big data" features coming in PostgreSQL 9.5 https://lwn.net/Articles/653911/ https://lwn.net/Articles/653911/ petergeoghegan <div class="FormattedComment"> strxfrm() blobs are usually a little over 3 times as large as the original strings with glibc. As I go into in my blog post about abbreviated keys, this is because there are multiple "levels" (at least 3) in the blob, demarcated by sentinel bytes, whose order relative to each other relates to how heavily some aspect of a code point should be weighed. For example, with Latin scripts, primary alphabetical ordering is represented at the primary level (case and punctuation are represented at subsequent levels). You get a far higher concentration of entropy in the first 8 bytes than (say) the last 8 bytes. Sometimes that concentration is much higher than you'd expect, since (for example) whitespace and diacritics are also not represented at the primary weight level. Also, by just comparing the first 8 bytes, a significant amount of pointer chasing is avoided (other indirection is avoided too).<br> <p> The idea that strxfrm() is not generally useful seems dubious, even leaving aside a technique like abbreviated keys. It's in the C standard. And Certainly, ICU offers something that's similar but very much more advanced. The glibc docs strongly suggest using strxfrm() where the space overhead is acceptable during a sort of a non-trivial number of strings.<br> </div> Sat, 08 Aug 2015 07:24:38 +0000 "Big data" features coming in PostgreSQL 9.5 https://lwn.net/Articles/653908/ https://lwn.net/Articles/653908/ kleptog <div class="FormattedComment"> I think it also has to do with the fact that in PostgreSQL the abbreviated keys approach allows the removal of a lot of indirect function calls from tight loops. In specialised code that knows beforehand that it's sorting text data replacing strcoll() with strxfrm() might not be that much of a win. But in the general PostgreSQL sorting code there is always an extra layer of indirection (for example TEXT values internally are not null-terminated, so you have to compensate for that).<br> </div> Sat, 08 Aug 2015 06:01:58 +0000 "Big data" features coming in PostgreSQL 9.5 https://lwn.net/Articles/653897/ https://lwn.net/Articles/653897/ jberkus <div class="FormattedComment"> I think the difference is the abbreviated keys approach.<br> </div> Fri, 07 Aug 2015 21:20:04 +0000 abbreviated keys for BYTEA? https://lwn.net/Articles/653895/ https://lwn.net/Articles/653895/ jberkus <div class="FormattedComment"> Yes, they are related. Although if you're just trying to get an exact match on a checksum, you might consider using GIN indexes instead. For a simple scalar, GIN indexes in Postgres behave a lot like hash indexes.<br> <p> Anyway, I encourage you to bring up the idea of BYTEA on a PostgreSQL mailing list. Nobody's opposed to extending abbreviated keys further, we just ran out of time for 9.5.<br> </div> Fri, 07 Aug 2015 21:17:41 +0000 abbreviated keys for BYTEA? https://lwn.net/Articles/653888/ https://lwn.net/Articles/653888/ zack <div class="FormattedComment"> <font class="QuotedText">&gt; I guess I don't understand why you'd care that much about abbreviated key sorting on checksums? It's not like you'd care about soriting them. I guess just because of index build time?</font><br> <p> Index (and specifically b-tree) build time and maintenance are my main concerns, yes.<br> <p> But I was also under the impression that abbreviated keys are relevant also for (b-tree) index lookups and uniqueness constraint verifications, due to the comparisons needed to get down to the actual indexes values, no matter how shallow the index is. Maybe that's a wrong impression of mine?<br> </div> Fri, 07 Aug 2015 18:58:10 +0000 "Big data" features coming in PostgreSQL 9.5 https://lwn.net/Articles/653882/ https://lwn.net/Articles/653882/ nix <div class="FormattedComment"> Hm. The glibc folks, last I heard, were mostly under the impression that strxfrm() was pretty much useless (as was I, to be honest). Now that a use has been found, someone should drop a note to the libc-alpha list to inform them that it is not as useless as all that!<br> <p> (e.g. Paul Eggert discussing a failed attempt to use it in GNU sort in &lt;<a href="https://sourceware.org/ml/libc-alpha/2014-11/msg00673.html">https://sourceware.org/ml/libc-alpha/2014-11/msg00673.html</a>&gt;.)<br> <p> I wonder why it was too slow for GNU sort while simultaneously being fast enough for PostgreSQL? It's not like it's sped up noticeably in the intervening years.<br> <p> </div> Fri, 07 Aug 2015 17:12:39 +0000 Tablesample with proportion really constant time? https://lwn.net/Articles/653840/ https://lwn.net/Articles/653840/ smurf <div class="FormattedComment"> AFAIU the random sampling itself (i.e. deciding which entries to take) would take const time. Retrieving the actual data, obviously not.<br> </div> Fri, 07 Aug 2015 15:29:59 +0000 Tablesample with proportion really constant time? https://lwn.net/Articles/653818/ https://lwn.net/Articles/653818/ epa Thanks. I just meant that if you want constant time, you need to specify something other than the '0.001' proportion in the reviewer's example. It would have to be <code>tablesample (1000 rows)</code> or something similar, I'm not sure of the exact syntax. Fri, 07 Aug 2015 09:32:19 +0000 "Big data" features coming in PostgreSQL 9.5 https://lwn.net/Articles/653788/ https://lwn.net/Articles/653788/ xtifr Er, I meant to say, "faster <em>sorting</em>" sounds like a feature for everyone, not, "faster indexing". In case it wasn't obvious. Had indexes on the brain. Thu, 06 Aug 2015 22:52:44 +0000 abbreviated keys for BYTEA? https://lwn.net/Articles/653777/ https://lwn.net/Articles/653777/ jberkus <div class="FormattedComment"> I guess I don't understand why you'd care that much about abbreviated key sorting on checksums? It's not like you'd care about soriting them. I guess just because of index build time?<br> <p> If so, well, "patches welcome". Probably what you'd want to do with BYTEA is compare the first 8 bytes, sort, then compare the full values to break ties.<br> <p> On the other hand, you could just store the checksums as NUMERICs, if you are fine with converting to hex and back.<br> </div> Thu, 06 Aug 2015 19:23:16 +0000 Tablesample with proportion really constant time? https://lwn.net/Articles/653775/ https://lwn.net/Articles/653775/ jberkus <div class="FormattedComment"> That's right.<br> <p> And it's not precisely constant time; it will take longer to pull 100 rows out of a billion row table than a million row table. However, the increase in time will be incremental (and small) instead of a multiple of the original request, since we're just looking up data by pageID.<br> <p> For example, I tested SYSTEM between returning 100 rows from a 100000 row table vs. a million row table. Regardless of which table I used, the difference in request time was below significance thresholds. However, with BERNOULLI, the thousands table took around 5ms, whereas the millions table took around 14ms.<br> </div> Thu, 06 Aug 2015 18:47:25 +0000 "Big data" features coming in PostgreSQL 9.5 https://lwn.net/Articles/653774/ https://lwn.net/Articles/653774/ jberkus <div class="FormattedComment"> I'm not.<br> <p> However, LWN has a lot of Australian readers, and while I was writing this article, a bunch of my friends were at PyCon.AU and DjangoCon.AU. Hence the use of Australia as the 2nd set of cities.<br> <p> FWIW, the data was generated using skewed random functions, so it's inherently meaningless.<br> </div> Thu, 06 Aug 2015 18:35:10 +0000 abbreviated keys for BYTEA? https://lwn.net/Articles/653721/ https://lwn.net/Articles/653721/ zack <div class="FormattedComment"> <font class="QuotedText">&gt; Andrew's patch is specific to numeric.</font><br> <p> Thanks for your answer!<br> <p> <font class="QuotedText">&gt; Do you regularly sort/index large amount of bytea values?</font><br> <p> I'm storing lots of checksums (of various kinds: sha1, sha256 for now), in the order of a few billion entries.<br> <p> I haven't yet firmly chosen the postgres datatype to do that.<br> <p> On the one hand, I'm inclined to implement a custom data type right away. It is my understanding that while doing that one can plug into the new sort support facilities that give the benefits of abbreviated keys (right? :-)).<br> <p> On the other hand, if an appropriate built-in data type (such as BYTEA or its variants) have already support for abbreviated keys, that would be a good incentive to start with it, and migrate to a custom data type only later.<br> </div> Thu, 06 Aug 2015 13:19:47 +0000 Tablesample with proportion really constant time? https://lwn.net/Articles/653702/ https://lwn.net/Articles/653702/ dskoll <p>I think it means that if you pick M rows out of N with N much bigger than M, the time is dependent on M only and not on N. At least, that's how I read it. <p>It's not constant wrt to the <i>percentage</i> of rows, of course, becuase in that case M depends on N. Thu, 06 Aug 2015 11:50:13 +0000 abbreviated keys for BYTEA? https://lwn.net/Articles/653695/ https://lwn.net/Articles/653695/ andresfreund <div class="FormattedComment"> Andrew's patch is specific to numeric.<br> <p> I guess you could devise something for bytea as well, but it'd have to look a bit different. Actually it'd be much closer to the abbreviated key logic for text than to numeric. Just without having to care about locales. With numeric you have to care about NaN and such.<br> <p> Do you regularly sort/index large amount of bytea values?<br> </div> Thu, 06 Aug 2015 09:53:11 +0000 Tablesample with proportion really constant time? https://lwn.net/Articles/653692/ https://lwn.net/Articles/653692/ epa <blockquote>Which will return around 0.001%, or 100, of the rows in the table. In the query above, SYSTEM is the name of the chosen sampling algorithm. The SYSTEM algorithm chooses a set of pseudo-random data pages, and then returns all rows on those pages, and has the advantage in running in constant time regardless of the size of the table.</blockquote> Surely not! 0.001% of a million-row table must take longer to fetch than 0.001% of a ten-row table. It would run in constant time if you specified a fixed number of rows, but not for a proportion of the total size. Thu, 06 Aug 2015 09:34:26 +0000 abbreviated keys for BYTEA? https://lwn.net/Articles/653688/ https://lwn.net/Articles/653688/ zack <div class="FormattedComment"> From the text (and the patch diff, but I'm not fluent in postgres internals) is not clear to me whether Gierth's improvement means that BYTEA (byte arrays) values will also benefit from abbreviated keys.<br> Does any fellow LWN reader know?<br> </div> Thu, 06 Aug 2015 07:56:40 +0000 "Big data" features coming in PostgreSQL 9.5 https://lwn.net/Articles/653682/ https://lwn.net/Articles/653682/ JdGordy <div class="FormattedComment"> scolled past the article as I'm not really interested in postgres.. saw the tables and that Melbourne was behind Brisbane (which is pretty unlikely given the relative size of the cities) and wanted to make sure my location was correct incase I was registered as **shudder** sydney or something.... and then...<br> <p> "Note: the above subscription data is fictitious and not related to LWN's actual subscription data." :)<br> <p> Is Josh an aussie expat?<br> </div> Thu, 06 Aug 2015 05:51:09 +0000 "Big data" features coming in PostgreSQL 9.5 https://lwn.net/Articles/653652/ https://lwn.net/Articles/653652/ kleptog <div class="FormattedComment"> <font class="QuotedText">&gt; BRIN indexes are niche! It's just a fairly common and very important niche. :)</font><br> <p> Yes! One thing is saw in the initial discussions but I'm not seeing mentioned anywhere is that the theory behind BRIN indexes should be able to be applied to support Bloom filters. And they are seriously useful, especially for high cardinality columns which sort really badly. Btrees don't do nicely on those kind of columns. I'm hoping that they haven't been forgotten.<br> <p> </div> Wed, 05 Aug 2015 20:56:12 +0000 "Big data" features coming in PostgreSQL 9.5 https://lwn.net/Articles/653646/ https://lwn.net/Articles/653646/ corbet Yes and yes. Wed, 05 Aug 2015 20:28:40 +0000 "Big data" features coming in PostgreSQL 9.5 https://lwn.net/Articles/653642/ https://lwn.net/Articles/653642/ post-factum <div class="FormattedComment"> Does LWN use PostgreSQL or that were purely fictitious examples? <br> </div> Wed, 05 Aug 2015 20:19:59 +0000 "Big data" features coming in PostgreSQL 9.5 https://lwn.net/Articles/653637/ https://lwn.net/Articles/653637/ xtifr <p>BRIN indexes <em>are</em> niche! It's just a fairly common and very important niche. :) </p><p> Faster indexing and, to a lesser extent, BRIN both sound like features that will seriously benefit most users, not just Big Data. </p><p> PostgreSQL used to have the reputation of being the slow-but-reliable FOSS dbms. The reliable part may have taken a minor ding with the recent multixact problems, but I think they can get over that. I think the slow part is already much less true than it used to be, but I still approve of any and all speedups they can add. If they became the faster-<em>and</em>-more-reliable FOSS dbms, I wouldn't mind one bit! </p><p> (I really hope the planned Parallel Table Scan thing works out well, and provides a decent performance boost as well.) </p> Wed, 05 Aug 2015 20:13:19 +0000