LWN.net Logo

Why json

Why json

Posted Sep 11, 2012 0:48 UTC (Tue) by aristedes (guest, #35729)
Parent article: PostgreSQL 9.2 released

I don't really understand the use of json support. Is it:

1. Getting normal SQL columns in and out in json format instead of normal SELECT output? Wrapping SQL output is something every database wrapper/ORM already does for you.

2. Storing json blobs of data in a dynamic way without having to predefine which columns (keys) are available and still being able to search with an index on the data attached to those keys. This would be very cool, but I don't understand how it would work when the data structure isn't defined in advance.

3. Just storing a json blob of data without any new indexing of that data at all. Basically nothing more than a BLOB with a new name.

Why on earth would I want to push down my application logic into javascript within the database? What is the use-case of this approach rather than a standard MVC approach where the model (ActiveRecord, Hibernate, Cayenne, etc) contains the relevant database entity logic.


(Log in to post comments)

Why json

Posted Sep 11, 2012 1:06 UTC (Tue) by JohnLenz (subscriber, #42089) [Link]

I suspect, based on their selling it as a NoSQL type feature, that it is to be able to have javascript code on the client talk directly to the database and avoid the server altogether. You can see this already happening with for example the CouchDB hosting at https://cloudant.com/ and many others.

At first glance, accessing the database directly from the javascript sounds horrible. But with the modern javascript frameworks like dojo or extjs, these have MVC frameworks built in. For example, dojo has dojo stores plus dijit views. ExtJS has their extensive modeling. Defining the model and having a full ORM in javascript means makes sense for these frameworks since then you can bind grids and other widgets to the model, automatically have widgets respond to changes in the data, etc. But having an ORM in javascript and a second ORM in the server, while it might make sense some of the time, is a bunch of extra work if the ORM in the server does nothing but pass objects to the client.

I don't know if PostgreSQL allows clients direct access quite like CouchDB, where queries are HTTP requests which return JSON bodies, but even if the server just passes the PostreSQL response to the client it is easier to have PostgreSQL return the json directly.

Why json

Posted Sep 11, 2012 4:37 UTC (Tue) by aristedes (guest, #35729) [Link]

That makes sense to some extent, but I'm still unclear what real-world environment would exist where there is no server application to manage sessions, authorisation, and enforce data validation. And of course the scary "let's give all our users the complete source code of the application so they can modify it with Firebug to write anything they want to the live database."

Perhaps I am just used to thinking of a database as a big interchangeable repository of data and not an integral part of the application code. Even stored procedures are something I avoid.

I did however discover that you can create functional indexes on json data like this:

CREATE INDEX x_in_json ON mytable (jmember(jsonfield,'x'));

so I can see the point of being able to store certain types of data in this way and still have indexed access to it.

Why json

Posted Sep 11, 2012 13:35 UTC (Tue) by beagnach (guest, #32987) [Link]

>> I'm still unclear what real-world environment would exist where there is
>> no server application to manage sessions, authorisation, and enforce
>> data validation.

there's a new generation of client-side JavaScript frameworks - http://backbonejs.org/ is what I'm most familiar with. The client side interface has become so rich that it now needs a proper ORM.

This is _not_ to say we do away with all the sever-side infrastructure you mention (though a few sites seem to go most of the way) - in general there seems to be a hybrid approach where a subset of date is mapped client side for manipulation by the rich interface there.

Of course you want to avoid duplication of models between client and server-side, and of course you still need to validate data - the trend is to reduce the server-side representation, not eliminate it completely.

Why json

Posted Sep 13, 2012 19:17 UTC (Thu) by jberkus (subscriber, #55561) [Link]

> Perhaps I am just used to thinking of a database as a big interchangeable
> repository of data and not an integral part of the application code. Even > stored procedures are something I avoid.

Well, the feature is certainly less compelling if that's your orientation towards database systems. However, a modern DBMS is a powerful, feature-rich parallel execution engine, and some users like to harness that capability. If you don't want to use it, you don't have to.

The main benefit to this is that some users can eliminate the need to use a document database alongside PostgreSQL in mixed environments. This simplifies administration and provisioning.

Why json

Posted Sep 11, 2012 8:18 UTC (Tue) by robert_s (subscriber, #42402) [Link]

"I don't know if PostgreSQL allows clients direct access quite like CouchDB, where queries are HTTP requests which return JSON bodies"

It doesn't.

Why json

Posted Sep 13, 2012 19:00 UTC (Thu) by jberkus (subscriber, #55561) [Link]

It doesn't *yet*. We plan to develop more JSON functionality in the future.

My personal ambition is "CouchGres", which would be CouchDB-over-Postgres. ;-)

Why json

Posted Sep 11, 2012 1:06 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

I believe that it's #2, you feed the database a JSON string and then later you can query on fields and subtrees of that blob without having to define the structure ahead of time.

It's less efficient than if you do plan things out and store things in the traditional way, but it may still be a lot faster than pulling the data all the way to the client, parsing it, and then using a part of it in a follow-up query.

Why json

Posted Sep 11, 2012 4:56 UTC (Tue) by sciurus (subscriber, #58832) [Link]

The core type is basically your number 3, but with validation.

"The json data type can be used to store JSON (JavaScript Object Notation) data, as specified in RFC 4627. Such data can also be stored as text, but the json data type has the advantage of checking that each stored value is a valid JSON value."

http://www.postgresql.org/docs/devel/static/datatype-json...

Why json

Posted Sep 11, 2012 9:53 UTC (Tue) by intgr (subscriber, #39733) [Link]

It's actually multiple things.
> 1. Getting normal SQL columns in and out in json format instead of normal SELECT output?

For one, every complex type can now be represented in JSON and transparently combined with other JSON fragments. Previously, the textual representation of complex types -- especially nested types -- was quite complicated and error-prone to parse by third party code.

# select row_to_json(row('some text', 123.4, null, row(1,2,3), array[1,2,3,4,5], json '{"a":1, "b":{}}' ));
                                                row_to_json                                                
-----------------------------------------------------------------------------------------------------------
 {"f1":"some text","f2":123.4,"f3":null,"f4":{"f1":1,"f2":2,"f3":3},"f5":[1,2,3,4,5],"f6":{"a":1, "b":{}}}
> 3. Just storing a json blob of data without any new indexing of that data at all.

Even this alone is pretty useful because you have a guarantee that data stored in a "json" field is actually valid JSON

> 2. Storing json blobs of data in a dynamic way without having to predefine which columns (keys) are available and still being able to search with an index on the data attached to those keys

Not quite there yet. It's already possible and useful to index certain json keys when combined with functional indexes (as shown by aristedes).

It's also possible to use an hstore-like GIN/GiST indexing strategy to index arbitrary keys; it's not implemented yet, but only a matter of time before someone writes such an extension.

Why json

Posted Sep 12, 2012 9:17 UTC (Wed) by ringerc (subscriber, #3071) [Link]

Right now it's #3 with validation, plus some functions to turn result sets or arrays into structured JSON for easy consumption in applications. That's the main benefit at this point.

However, future versions are extremely likely to add enhancements like:

  • Indexing, like hstore currently has
  • Operators or functions to merge json objects, extract values, get key lists, etc like the hstore and xml support have
  • More features for constructing and consuming JSON in the DB

Because Pg is so extensible, much of that is likely to be available as backports into 9.2.

Why json

Posted Sep 12, 2012 9:32 UTC (Wed) by andresfreund (subscriber, #69562) [Link]

> Indexing, like hstore currently has
Its easier for hstore because hstore isn't nested and not individually typed (everything is a string).
An efficient index over arbitrary nesting and several types isn't exactly easy and I am not that sure that will come that soon. An opclass for not-nested json objects or such would be relatively easy though.

Why json

Posted Sep 12, 2012 10:29 UTC (Wed) by yodermk (subscriber, #3803) [Link]

MongoDB has something very much like that.

Why json

Posted Sep 12, 2012 10:42 UTC (Wed) by ringerc (subscriber, #3071) [Link]

Once there's a json lookup operator (say implementing the [] operator for json to get the value of a key, or a path lookup function kind of like xpath provides for xml) then a functional index might be reasonably practical, at least if the result is a scalar.

Right now the lack of the concept of a "scalar json value" (due to an IMO over-eager interpretation of the JSON "spec") is frustrating all sorts of things, including proper json constructor functions.

More general indexing might be done through GiST, but I imagine it'd be a pretty complicated job involving extracting paths-to-values.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds