|
|
Log in / Subscribe / Register

MySQL AB to counter Oracle buy of Innobase (InfoWorld)

InfoWorld reports that MySQL is looking for alternatives to the InnoDB engine, now that Innobase is owned by Oracle. "The first question asked of MySQL AB co-founder David Axmark was about how the Oracle deal would affect MySQL's database software. Axmark said the storage engine is 'pluggable,' meaning other storage engines can be substituted instead. He said the code for InnoDB is under the GPL (General Public License), so 'the code is always out there. It will always be out there.'" The article completely misses the effect on MySQL's business model, though.

to post comments

Excellent Chess Mr Ellison

Posted Nov 23, 2005 17:17 UTC (Wed) by b7j0c (guest, #27559) [Link] (7 responses)

MySQL AB will waste two years replacing InnoDB, giving Oracle more time at the top of the feature list. Assuming they grow it in-house, the new engine will have to go through the same alpha-beta-rc0,rc1...rcN cycle with at least one public release before users will trust their data to it.

Well played Larry.

Excellent Chess Mr Ellison

Posted Nov 23, 2005 17:31 UTC (Wed) by arcticwolf (guest, #8341) [Link] (6 responses)

Why do they have to replace InnoDB at all? The current version is GPL'ed, and in theory at least, there is nothing that'd keep MySQL AB from fixing bugs etc. themselves. They might have to hire someone with a good knowledge of the InnoDB internals, but chances are they have people like that already, and even if they don't, it's not *that* big a problem.

So, what is the problem?

Excellent Chess Mr Ellison

Posted Nov 23, 2005 17:58 UTC (Wed) by allesfresser (guest, #216) [Link] (5 responses)

It seems to me that MySQL has a dual license for InnoDB from their upstream provider (now Oracle)--a GPL license which they share with us, and a commercial license which they sublicense to their paying customers. It is this commercial license which is their problem at the moment: since Oracle could possibly rescind it, their customers would be left without a commercial option. Those people that have bought the MySQL engine to use in proprietary products (like Adobe, which uses it in their "Version Cue" product, part of their Creative Suite, to manage all the version data) would not be able to use it any longer without switching to the GPL version and providing source code to all their customers (and how I would chuckle if Adobe were to be forced to do that...but dream on, I guess.)

If I've been mistaken in my analysis, somebody please feel free to correct me...

Excellent Chess Mr Ellison

Posted Nov 25, 2005 19:12 UTC (Fri) by jayorke (guest, #10685) [Link] (1 responses)

Why wouldn't they go to the pay for support model rather than the pay for software model like all the Linux distros out there? In addition, since the DB engine is pluggable couldn't they still sell apps that use it as long as they don't sell InnoDB software itself? I don't see how this is an obstable to MySQL since they can fork the GPL'd InnoDB project if they don't like where Oracle is taking it anyways as long as their fork is also GPL'd. Am I wrong?

Excellent Chess Mr Ellison

Posted Nov 25, 2005 19:43 UTC (Fri) by dlang (guest, #313) [Link]

the problem is that they make a large amount of their money by chargeing companies to allow them to use MySQL in non-GPL programs. so if they loose access to InnoDB then their paying customers can no longer use it, and those customers don't want to do the GPL thing. As a result those customers will eitehr stop paying MySQL and go GPL (possibly paying a lessor amount for support) or stop paying MySQL and go with a different database vendor.

Excellent Chess Mr Ellison

Posted Nov 25, 2005 19:20 UTC (Fri) by jayorke (guest, #10685) [Link] (2 responses)

"However, if Oracle holds patents or licenses for the underlying technology such as algorithms or file structures, "then that could get quite interesting," he said."

I don't get that comment either. If a company comes up with some algorithims, patents them, and then GPLs the code using those algorithims... wouldn't that make the patent unenforcable as long as it is used in other GPL'd software?

Excellent Chess Mr Ellison

Posted Nov 25, 2005 19:39 UTC (Fri) by dlang (guest, #313) [Link]

unfortunantly no, you can selectivly enforce patents and set different licensing terms for different people for your patents.

saying that you can use a patent freely with GPL'd software, but have to pay $$$ to use the same patent for other software is very legal.

"Intellectual Property"... not!

Posted Nov 25, 2005 21:01 UTC (Fri) by allesfresser (guest, #216) [Link]

No. Patents and copyrights are completely separate--this is one particular confusion that "intellectual property" as a vague collection term tends to promote: that patents, copyrights, trademarks and trade secrets are all pretty much equivalent kinds of things, and one cascades to the others. They're not, and they don't, unless some contract specifies it. The GPL is a copyright license--permission to use, modify and redistribute the specific licensed work.

Patents (if any) cover the various algorithms that are used in the copyright-licensed software, and can be enforced at will, with no regard at all to the license given for the work that expresses (tangibly) the patented (abstract) algorithm.

I know

Posted Nov 23, 2005 17:30 UTC (Wed) by ncm (guest, #165) [Link] (49 responses)

They should just use PostgreSQL underneath. It's BSD-licensed, compatibly with MySQL AB's business model.

:-)/2

I know

Posted Nov 23, 2005 18:11 UTC (Wed) by einstein (subscriber, #2052) [Link] (24 responses)

unfortunately, customers would immediately notice the huge performance hit...

I know

Posted Nov 23, 2005 19:04 UTC (Wed) by stumbles (guest, #8796) [Link] (14 responses)

Performance hit? PostgreSQL? Prove it.

I know

Posted Nov 23, 2005 20:36 UTC (Wed) by nix (subscriber, #2304) [Link]

You probably can... with a five-year-old copy of PostgreSQL.

I know

Posted Nov 24, 2005 0:47 UTC (Thu) by flewellyn (subscriber, #5047) [Link] (12 responses)

To be fair...PostgreSQL is slower than MySQL in the simple case: one database user, a few tables,
simple SELECT queries hitting the database at high volume. This is what you'd find in a basic
"user accounts" DB for a website or something. In those cases, Postgres has some overhead
which MySQL does not.

Of course, in the more complex cases, such as multiple DB users, multiple databases, complex
queries involving subselects or joins on multiple tables, or anything that needs proper ACID
compliance, PostgreSQL wins big on speed and correctness. From my perspective as a database
admin, MySQL's really kind of the "Access" of free databases, with PostgreSQL being more in the
"Oracle" class.

I know

Posted Nov 24, 2005 2:01 UTC (Thu) by dlang (guest, #313) [Link] (3 responses)

true, although I wonder how big the difference would be.

MySQL took the approach, make it fast, then make it right

PostgreSQL took the approach, make it right, then make it fast

Postgres has done a LOT of stuff in the last few releases to speed things up, and as such even the simple cases may not be as obvious a win for MySQL as they used to be. there was a post three weeks ago when 8.1 was released about an application that spead up from 52 min to 8 min when moving from 8.0.4 to 8.1. Now, nobody is saying that everyone will get speedups like that, but this was on an app designed to run on MySQL so it's not takeing advantage of the higher-end functionality that MySQL doesn't offer, but the same smarts that speed up that stuff can speed up the simple stuff as well. ( http://lwn.net/Articles/159062/ ) for the post

I know

Posted Nov 24, 2005 2:45 UTC (Thu) by flewellyn (subscriber, #5047) [Link] (2 responses)

Yes, but more importantly is the fundamental software architectural problem: making it right
first, and then speeding up later, is ALWAYS easier than making it fast initially, but only later
working to make it right.

Consider integers: you can handle them properly from the get-go with some multiprecision code
akin to GNU MP, or you can let them overflow like C ints will. The first case may initially be
slower, but you can tune and optimize the MP code. The second case will still be a bit faster,
sometimes, but leads to the very incorrect situation where A + B is NOT always greater than A or
B. This is...completely unacceptable in a database, and yet (last I checked) MySQL still had this as
the default behavior for integer types.

I know

Posted Nov 24, 2005 15:09 UTC (Thu) by khim (subscriber, #9252) [Link] (1 responses)

Yes, but more importantly is the fundamental software architectural problem: making it right first, and then speeding up later, is ALWAYS easier than making it fast initially, but only later working to make it right.

Hmm... May be but "real life" contradicts: in all cases where "we'll do it correct then we'll do it right" vs "we'll do it fast then we'll do it more-or-less correct" question was considered the version "we'll do it fast then we'll do it more-or-less correct" won. Lisp vs C, HURD vs Linux, etc. May be after another 100 years will pass situation will be different but 30 years was not enough for first case, for example...

The second case will still be a bit faster, sometimes, but leads to the very incorrect situation where A + B is NOT always greater than A or B.

This situation is always possible - GMP or not GMP. What about A and B equal to 10^(10^10) ? Or (if not enough) 10^(10^(10^10)). At some point you'll get incorrect result or crash since computer is finite system.

This is...completely unacceptable in a database, and yet (last I checked) MySQL still had this as the default behavior for integer types.

If you think it's "unacceptable in a database" then you can not use any database. I think it's perfectly acceptable as long as I do know what values will produce correct result and what values will not do so (there are always exist values where this is incorrect, after all). Sometimes range offered by MySQL is unsufficient - but more often then not it's not a problem at all.

I know

Posted Nov 25, 2005 4:59 UTC (Fri) by zblaxell (subscriber, #26385) [Link]

"At some point you'll get incorrect result or crash since computer is finite system."

Uhhh, no, at some point you get a "integer out of range" error, and the transaction aborts (and all evidence of your mathematical explorations is purged from the database). A + B (where A > 0 and B > 0) is always greater than A or B in this particular finite system. Escape from the system itself is not permitted. ;-)

I know

Posted Nov 24, 2005 5:14 UTC (Thu) by stumbles (guest, #8796) [Link] (7 responses)

Actually I've heard something to that effect before. I'm no database expert but it
sounds like one of those oft repeated truisms. Does anyone know of a test
scenario between the two, running on the same machine, same compiled
optimizations, build tree, blah blah?

Do people really select a database just for performance?

Posted Nov 24, 2005 9:17 UTC (Thu) by hippy (subscriber, #1488) [Link] (6 responses)

All this talk of comparative performance got me thinking, do people really
select a database on performance alone?

We make use of both MySQL (GPL version) and PostgreSQL in different
projects. Performance has never been the primary selection criteria.
Generally performance is "good enough" from both contenders and selection
comes down to other factors.

The quick and dirty rule of thumb we follow is: if it is mostly simple
SELECT queries and very little UPDATES/INSERTS then go with MySQL, but if
it is complex queries and/or lots of UPDATES/INSERTS then look to
PostgreSQL. The decision is often influenced by the bindings available for
the implementation language, or more often by the skills of the developers
in the use of those language bindings. So if the language is Python we
will err towards PostgreSQL whereas with Perl we would favour MySQL.

There usually is not enough time to fully evaluate the DB against the
requirements for the system, so a quick decision must be made with out all
the facts. What criteria do others use to make these decisions?

Richard

Do people really select a database just for performance?

Posted Nov 24, 2005 12:37 UTC (Thu) by nicku (subscriber, #777) [Link] (1 responses)

The decision is often influenced by the bindings available for the implementation language, or more often by the skills of the developers in the use of those language bindings. So if the language is Python we will err towards PostgreSQL whereas with Perl we would favour MySQL.
Sorry, this went right over my head; Perl DBI makes it equally easy to use either PostgreSQL or MySQL. So were the developers making the mistake of not using DBI?

Dod you even read the text you quoted?

Posted Nov 25, 2005 9:21 UTC (Fri) by alextingle (guest, #20593) [Link]

"The decision is often influenced ... more often by the skills of the developers in the use of those language bindings."

Do people really select a database just for performance?

Posted Nov 25, 2005 5:50 UTC (Fri) by zblaxell (subscriber, #26385) [Link] (3 responses)

A database server that imposes so much overhead that it's noticeably slower than a filesystem-based implementation in Perl is doing something wrong. If I couldn't make a database run about as fast as a prototype doing the equivalent operations written in Perl (taking WAL and fsync into account), that means that the database is introducing significant overhead somewhere, or I need to look at my query to see what's wrong with it.

I wouldn't ordinarily use performance to choose one DBMS over another, but I would use a relatively low performance standard to reject potential DBMS candidates from my short list entirely.

Do people really select a database just for performance?

Posted Nov 26, 2005 0:55 UTC (Sat) by job (guest, #670) [Link] (2 responses)

That's a strange comment. Have you actually tried that? Parsing SQL isn't free, I would expect an fopen and read to be finished long before the query even reaches the actual database layer from the parser/optimizer. That would be a very bad file system if it is was slower than a full blown RDBMS.

Do people really select a database just for performance?

Posted Nov 26, 2005 1:13 UTC (Sat) by dlang (guest, #313) [Link]

you are both right and wrong, it all depends on how much parsing overhead the database causes vs how much searching through the data it saves.

finding the piece of data you need in a flat file can get costly as the file gets large, findign the right file on the filesystem can get costly if there are lots of files in the directory (the definitions of costly, large and lots vary from filesystem to filesystem)

a database engine eats up some ram and cpu time to save on the disk I/O time, your access patterns, data, and data structures all affect how smart the tradeoff is.

Do people really select a database just for performance?

Posted Nov 27, 2005 2:49 UTC (Sun) by zblaxell (subscriber, #26385) [Link]

I do the comparison quite often.

Just now, I get 5.3ms for a C program doing open()/fstat()/malloc()/read()/close() to read a one-line text file into dynamically allocated RAM and do nothing with it, compared to 4.3ms for a somewhat more complex SQL query returning the same data from a table, except it's a row with all the field values broken into columns of various data types. I didn't bother with a prepared query, so the SQL execution time includes parsing, permission table checks, and so on. The execution time on repeated runs varies by as much as 3ms for both, so sometimes the filesystem is faster and sometimes the RDBMS is faster.

There are a number of qualifications on the test:

* The comparison is between a SQL RDBMS and a storage layer written in Perl on top of a filesystem. Both are interpreted languages with loads of CPU and data type conversion overhead, and lots of algorithmic tricks to speed up useful special cases (including caching and precompilation).

* The data in the filesystem case is kept in human-readable form, so that it can be easily manipulated by a wide variety of existing tools. This matches the wide variety of SQL-based data manipulating tools as well (SQL is a bit of a data manipulating tool in itself).

* The subset of transaction support, network transparency, and disaster recovery required by the application must be implemented in the Perl/filesystem layer, and must also be present in the RDBMS. The filesystem-based implementation must call fsync() and/or fdatasync() to ensure updates are stable on disk just like the RDBMS does. If the application needs to roll back part-way through modifiying the data, then the filesystem-based implementation must support that too. If the application didn't need data integrity, and didn't need other RDBMS features, I wouldn't be using *any* RDBMS, and the comparison would be meaningless.

* Pushing a procedure into the RDBMS for execution (e.g. to rearrange a large recursive structure) is permitted. If the RDBMS can't support that, it's unlikely to win this race in cases where a lot of round-trips are required.

* Small and constant-factor differences don't matter. I don't care if the RDBMS requires 10ms for every query and runs 10% slower than filesystem--as long as those numbers stay constant. I can fix this kind of problem by waiting a few months, then buying a 10% faster machine. If RDBMS performance relative to a filesystem implementation of the same schema degrades faster than the data size grows, there's a problem in the RDBMS (assuming both implementations are correct).

Why is the competition fair? Why do RDBMSes win, and why do filesystems win?

* Filesystems are bad at storing billions of named objects of a few dozen bytes each (especially if they round up the size to 4K). RDBMS systems are bad at storing a few dozen named objects of gigabytes each.

* Filesystems share a lot of algorithms and infrastructure with RDBMS systems. Both have RAM caches and do I/O in blocks to disks. The number of disk page I/O operations (which is where most of the real execution time is tied up) is similar. An RDBMS is in some sense a specialized filesystem that runs in user-space. Since filesystems vary in performance by as much as an order of magnitude depending on workload, it follows that RDBMS will vary as well, and some RDBMS can be as fast or faster than some filesystems.

* Some filesystems *are* bad. If the filesystem in question is NFS, at least one network round-trip is required for every file, and the RDBMS can win just by having query processing on the server side combined with streaming data over TCP. Reading 100,000 small text files over NFS is much slower than reading the equivalent size of row data from an SQL query.

Do your customers notice when you lose their data?

Posted Nov 24, 2005 7:33 UTC (Thu) by davidw (guest, #947) [Link] (3 responses)

The more I look at Mysql pre-5 with MyISAM tables, the more incredulous I am that people would actually put important data into it. It's not a bad system for forking out a lot of data quickly, but it's not a "database". It's good to see that more recent versions are gaining database features, but there are a *lot* of people out there running on those crufty old systems. I was ranting about this some yesterday, too:

http://journal.dedasys.com/articles/2005/11/23/mysql-ranting

Do your customers notice when you lose their data?

Posted Nov 24, 2005 15:20 UTC (Thu) by khim (subscriber, #9252) [Link] (2 responses)

The more I look at Mysql pre-5 with MyISAM tables, the more incredulous I am that people would actually put important data into it. It's not a bad system for forking out a lot of data quickly, but it's not a "database".

100% true. On the other hard in 90% cases people need what what MySQL (even old one) offers - they do not need "real database" at all. Think about C/C++: it'll not even warn you if you'll overflow array! While "real language" (like Ada) will do it for you. Now. How many programs written in C/C++ are you using ? How many programs written in Ada ?

Sorry but for real life problems MySQL is quite good - unless you expect for it to have to have features it does not offer from any database. About your rant: why you executed portentially wrong request on real-life database without debugging it first ? You expected to have rollback - and it's not there. So what ? It's your problem, not MySQL's one: sometimes rollback is actually needed, yes, but in 99% cases I've seen rollback is used to cover programmers laziness, not for any real need.

Do your customers notice when you lose their data?

Posted Nov 24, 2005 20:23 UTC (Thu) by davidw (guest, #947) [Link]

I respect the fact that not everyone needs a real database and its accompanying features, but I think your numbers are pulled out of the air. Most of the clients I've seen with Mysql would have been better off with Postgresql in the end, because they dealt with things that involved some money. That's likely to happen sooner or later if you're tying your business into a system.

Do your customers notice when you lose their data?

Posted Dec 1, 2005 13:38 UTC (Thu) by nix (subscriber, #2304) [Link]

In my experience rollback is used to cover `oh shit we assumed {blah} was available and now it isn't.'

I suppose you could call this `laziness', but I'd prefer to call it `the tool making it unecessary to bother checking for all the piddly little things yourself'; the tool does it for you.

Databases are supposed to be a labour-saving tool, y'know... using a database that doesn't save you work is silly.

(That's why I like the user-defined datatypes in PostgreSQL so much, too.)

I know

Posted Nov 24, 2005 8:07 UTC (Thu) by hofhansl (guest, #21652) [Link] (4 responses)

unfortunately, customers would immediately notice the huge performance hit...

This seems to be a very persistent misconception. MySQL maybe faster if all you do is using count(*) queries *and* use the MyISAM engine (in the case of count(*) a transactional system either has to do locking of all rows changed/added/deleted by other transactions, or do a full table scan to find the correct count for current transaction). So benchmarks that only do that will find that MySQL is faster.

MySQL also has faster connection startup time (PostgreSQL has to start a new process, MySQL starts a new thread), so if you open a new database connection for every query, MySQL will a constant time advantage (then again if you do that you do not deserve good performance to begin with)

For *all* other queries, simple selects, updates, concurrent updates, "big" joins, "small" joins, etc, etc, etc, PostgreSQL is faster or the two are equal. Try it yourself.

(I tested MySQL 4.1.x and MySQL 5.0 against PostgreSQL 8.0 on Linux - in fact PostgreSQL 8.1 is even faster. I make no claim that my tests are scientific, but my application requires lots of different types of queries)

I know

Posted Nov 24, 2005 10:28 UTC (Thu) by dlang (guest, #313) [Link]

I seem to remember seeing something about an optimization to count(*) in the leadup to 8.1 so it may have drasticly improved there (along with all the other improvements) so it may be worth another round of testing.

I know

Posted Nov 25, 2005 4:44 UTC (Fri) by zblaxell (subscriber, #26385) [Link] (1 responses)

MySQL is also faster when you have zillions of clients asking a server very similar simple queries when no writes have occurred, e.g. 'select some_column from some_table where id = some_value'. There is a shared hash cache of $query_string => $result in the server to do this. Simple select-one-row-from-table-matching-indexed-column queries might be slightly faster too, if you could measure per-query CPU overhead and client-server latency separately.

MySQL is also faster for things like sequential scans of MyISAM tables with fixed-width rows of non-NULL columns...but only because the MyISAM format special-cases fixed-width rows and uses fewer bytes per row than PostgreSQL, so your disk can push more rows per second through the CPU. Doing actual work with the data (e.g. sum(), average()) is slower in MySQL.

Of course there are also the count(*), min() and max() special cases that are often pointed out (often with the PostgreSQL workarounds).

*Every* other workload I've ever given a MySQL server is the same speed or slower than PostgreSQL. For a while I had to support both servers in parallel, until I started keeping an up to date sign on my whiteboard that said "It has been [94] days since a MySQL query from the regression test suite ran faster than the equivalent PostgreSQL query" which drove home to the powers that be that we needed to stop wasting time on MySQL support. Well, that, and most of the regression tests were MySQL-specific, like datestamps with invalid field values.

It would be really nice if PostgreSQL could be told "I don't *need* transactions on this one table, just put a big multi-reader-single-writer lock on it, and reverse the WAL log for the transaction on ROLLBACK." It's rather silly to have an overhead of ~24 bytes per row for MVCC (and counting! The row header has been growing slowly for years...) if you know in advance you're never going to make an UPDATE on some table that you're not going to COMMIT. If I could magically copy features from MySQL to PostgreSQL, that would be the first. The second would be enumeration types, just because I'm too damn lazy to join on "char" columns and INSERT with subqueries all the time. ;-)

I know

Posted Nov 27, 2005 5:37 UTC (Sun) by tgl1 (guest, #34120) [Link]

"The row header has been growing slowly for years"?

Just for grins, I dug through the CVS history at
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/incl...
and the way it stacks up is:
As the code was released from Berkeley: 49 bytes + alignment overhead
PG 6.3 (after removal of "time travel"): 35 bytes + alignment
PG 6.5 (don't store duplicate copy of tuple length): 31 bytes + alignment
PG 7.3 (remove OID from required fields, overlay two transaction fields): 23 bytes + alignment +
OID if you want it
PG 8.0 (addition of subtransactions required backing out the overlay optimization): 27 bytes +
alignment + OID if you want it

And that's where it stands today, although a proposal has been posted for a way to get back
down to 23 bytes at the cost of more state kept inside individual transactions.

I wouldn't bother to call you on this, except I think it's a fine illustration of the actual
development trends in Postgres: we've been gradually improving the performance over time.

I know

Posted Dec 1, 2005 13:44 UTC (Thu) by nix (subscriber, #2304) [Link]

Er, PostgreSQL doesn't need to lock anything or do full table scans for *any* operation involving only SELECTs, nor has it for as long as MVCC has existed. That's the point of MVCC: that because physical rows are in effect immutable entities (because any update creates a new physical row), the transaction doing the selects can proceed in complete disregard of the existence or otherwise of any other transactions, *even at the database-implementation level*. Not even the RDBMS backend handling those SELECTs needs to care about concurrent updates.

MVCC is definitely one of those ideas which had me kicking myself and wishing I'd thought of it first.

Not with their business model

Posted Nov 23, 2005 19:17 UTC (Wed) by man_ls (guest, #15091) [Link] (23 responses)

How would their current business model work, though? Right now, either you use it for free under the GPL, or pay. If you redistribute it under the GPL, you have to publish all your code, but if you pay you don't have to.

If they gave you the option of a license under a BSD-like license, then nobody would ever pay! You would be able to use it freely in closed projects.

Not with their business model

Posted Nov 23, 2005 19:43 UTC (Wed) by smurf (subscriber, #17840) [Link] (11 responses)

You forget that the rest of MySQL's code is dual-licensed too (GPL+proprietary). So, you might be able to clone Postgres for use in your proprietary product, but not the MySQL code using it.

Looking at the heap of disparate storage engines MySQL carries, linking into PostgreSQL (and hacking it to understand all the nice SQL additions and compatibility workarounds MySQL came up with) may not be *that* difficult to do...

Interesting

Posted Nov 23, 2005 19:56 UTC (Wed) by man_ls (guest, #15091) [Link] (3 responses)

Ah, true. Is it possible to link GPL code with BSD-like code, or does it depend on the exact details of the license? I read that the Apache Software License was incompatible, even the new 2.0 license. And that is a BSD-like license, or at least used to be.

Interesting

Posted Nov 23, 2005 20:18 UTC (Wed) by smurf (subscriber, #17840) [Link] (2 responses)

It's not possible in general.

But if you hold copyright to one part of the equation, you can do whatever you like, as long as you adhere to the tenets of the "other" license.

Whether the PostgreSQL people would be happy about such a move is anybody's guess, but their license (disclaimer: I haven't read it, I only heard that it's BSDish) should allow it.

I also don't know whether the result would be GPL compatible; I tend to leave these issues for people with more free time than me (or different priorities) to ponder.

Interesting

Posted Nov 23, 2005 20:51 UTC (Wed) by dlang (guest, #313) [Link]

the PostgreSQL folks are suggesting it. it is definantly allowed by the license.

in part they are getting a large laugh out of the thought that their code could end up driving MySQL (and warming up the responses to the accusations that PostgreSQL is so much slower then MySQL when MySQL would then use the PostgreSQL mvcc engine :-)

BSD not BSDish

Posted Nov 23, 2005 22:26 UTC (Wed) by GreyWizard (guest, #1026) [Link]

PostgreSQL is not offered under a BSDish license, it is offered under the BSD license, minus the advertising clause. This is GPL compatible.

Not with their business model

Posted Nov 24, 2005 7:05 UTC (Thu) by kleptog (subscriber, #1183) [Link] (6 responses)

Looking at the heap of disparate storage engines MySQL carries, linking into PostgreSQL (and hacking it to understand all the nice SQL additions and compatibility workarounds MySQL came up with) may not be *that* difficult to do...

You can't bolt on transactions. Just getting MySQL to read and write PostgreSQL tables would be easy, but transactions are not just a feature, they're a way of life.

Things like strict write order for Write-Ahead Logging, so you know your data is safe even if someone pulls the power out. Transaction and subtransaction logs to make sure you only see the data you're supposed to. The hierarchical memory allocator so that locks and memory are all freed correctly on transaction failure. Once you've sorted that out, you've basically got PostgreSQL with a MySQL shim over the top. If someone wanted that they could download the PostgreSQL source code and change the grammer.

The reason some features that people want are slow in coming is because they're hard to do fast and right. Just because MySQL solves it by locking the entire table doesn't make it a suitable solution for a data warehouse.

PostgreSQL has solid, powerful features and keeps getting faster every release. I'd prefer that to MySQLs it's-fast-but-the-features-are-still-coming.

Not with their business model

Posted Nov 24, 2005 7:35 UTC (Thu) by smurf (subscriber, #17840) [Link] (3 responses)

MySQL has had transactions and proper ACID for a while now, _because_ they integrated the InnoDB engine really well.

My experience is: MySQL gives you lots of choices, and they listen to people. My experience with the PostgreSQL people has been "You want that nonstandard SQL syntax? It's not standard. Go away.". *Shrug*

You want non-standard SQL syntax?

Posted Nov 25, 2005 9:34 UTC (Fri) by alextingle (guest, #20593) [Link] (2 responses)

Why? Why not just fix your code?

When Microsoft give their customers non-standard extensions everyone starts accusing them of
trying to embrace and extend. Now you're criticising the Postgres people for refusing to do the
same!

You want non-standard SQL syntax?

Posted Nov 25, 2005 10:15 UTC (Fri) by smurf (subscriber, #17840) [Link] (1 responses)

Closed-source clients use nonstandard SQL. I can't fix the closed-source clients. I can get the database to understand the stuff.

MySQL has implemented numerous Microsoft and Oracle extensions to their SQL parser so that I can rip out MSSQL and replace it with something sensible. The PostgreSQL people have not, so I can't (unless I want to hack the PG parser wiht something that'll not be accepted upstream).

You want non-standard SQL syntax?

Posted Nov 25, 2005 11:17 UTC (Fri) by kleptog (subscriber, #1183) [Link]

If that's the only problem you need to look on pgfoundary for packages that implement various parts of other database's non-standard extensions. PostgreSQL is flexible, it allows you to create new operators and types so you can match other databases without changing a single line of core code.

Oracle: http://pgfoundry.org/projects/orafce/
MSSQL: http://pgfoundry.org/projects/mssqlsupport/
MySQL: http://pgfoundry.org/projects/mysqlcompat/

There are tools which will autoconvert your entire database schema to standard SQL so it can be loaded into PostgreSQL.

If you're really desparate, there's a company whose business is to maintain a version of PostgreSQL that looks like Oracle so you don't have to change your stored procedures. That this doesn't exist for other databases indicates to me that the demand isn't there...

Not with their business model

Posted Nov 27, 2005 18:31 UTC (Sun) by dps (guest, #5725) [Link]

When it comes to features, you sometimes find postgresql is lacking but MySQL is not. In particular MySQL has had column level access controls for a long time and I want this featyre to limit the access granted to database for many identities. Being able to tie an identity to a particular originating host is nice too.

Non-update views are not good enough if you want the ability to select on multiple columns and update on only 1. Indcidently columns level priviledges are a feature of ANSI SQL, so I can not be accused of wanting non-standard SQL on that front.

There is one instance of MAXIMUM and MINIMUM of a pair of columns in the matched data which is, or at least was, not supported by postgresql.

Do not get me wrong: postgresql is without doubt a good database server. My code will readily trade the features it does not need for what it needs being fast.

Not with their business model

Posted Dec 1, 2005 13:52 UTC (Thu) by nix (subscriber, #2304) [Link]

Personally I want to get primary keys and triggers inherited properly. Right now they're not which makes table inheritance mostly useless for most of the things I'd like to use it for...

... this might yet suck me into PostgreSQL development, if I can find the time.

MySQL is still GPL

Posted Nov 23, 2005 19:45 UTC (Wed) by ncm (guest, #165) [Link] (10 responses)

Ah, but the MySQL front-end and the interface code to PG's storage engine (and whatever else they used) would still be GPL. The danger, of course, is that users would take the step of bypassing MySQL and cutting MySQL AB out entirely. However, they can already do that now -- that is, port to PostgreSQL and ditch MySQL AB -- so it's not a big danger. Anybody doing business with MySQL AB has already made their choice about whether they want to pay for support.

Perhaps the real danger is when users discover MySQL with a PG back-end is faster than what they had with the Innobase back-end, and that without the MySQL front-end it is faster yet.

MySQL is still GPL

Posted Nov 23, 2005 20:29 UTC (Wed) by ringlord (guest, #6309) [Link] (9 responses)

To use the words of a previous poster, why? Prove it.

MySQL is still GPL

Posted Nov 23, 2005 23:38 UTC (Wed) by ncm (guest, #165) [Link] (8 responses)

Prove that customers' exposure to PostgreSQL would be a danger to MySQL AB?

Time will tell.

MySQL is still GPL

Posted Nov 24, 2005 7:02 UTC (Thu) by man_ls (guest, #15091) [Link] (7 responses)

No, prove that PostgreSQL is faster as a backend than InnoDB; it would be enough to prove that PostgreSQL is faster than MySQL.

MySQL is still GPL

Posted Nov 24, 2005 8:22 UTC (Thu) by Wol (subscriber, #4433) [Link] (6 responses)

I've come across the odd comment that PostgreSQL's back end is more MultiValue than Relational. And given that MultiValue absolutely smokes Relational for access speed, it seems that PostgreSQL could indeed be fast.

A little comparison :-) When a company ported from UniVerse (a MultiValue database) to Oracle, the consultants proudly announced to management that, after six months work, they had finally managed to squeeze a 10% performance improvement out of their Oracle query over the old UniVerse query (this was a job that took 5 mintues). Their pride lasted just long enough for the UniVerse admin to point out that the Oracle query was running on a twin-Xeon-800. The system they were so proud of beating was running on a Pentium 90.

Cheers,
Wol

MySQL is still GPL

Posted Nov 24, 2005 10:03 UTC (Thu) by man_ls (guest, #15091) [Link] (4 responses)

It's now when the lawyer shouts "anecdotal evidence!".

MySQL is still GPL

Posted Nov 24, 2005 14:31 UTC (Thu) by Wol (subscriber, #4433) [Link] (3 responses)

But consider the following ...

Relational theory strictly separates the db engine implementation from the theory from the app running over the theory. Why do relational databases have things like optimisers, query caches, and all that crap? It's to get over the disconnect caused by relational's insistence on separating the logical from the physical.

MV doesn't do that. Which makes it dead easy for me to prove that my MV app can't be optimised - there's no room for improvement within the db engine. Bear in mind that it's always disk speed that's the killer (unless you have the entire database in RAM...), and an MV engine typically, from being asked by the app for the data to it passing the data back takes about 1.05 disk reads. Where's the room for improvement in that?

(There's more than that, but that would be really digging into db engine system programming stuff :-)

Cheers,
Wol

Relational vs Multivalue

Posted Nov 24, 2005 14:42 UTC (Thu) by man_ls (guest, #15091) [Link] (2 responses)

I can see two problems with your reasoning.

First, you tell us that in MultiValue there is almost no room for improvement. Ok, I believe it (but it would be better if you posted any references); still you don't show that relational databases should be much worse.

Second, PostgreSQL is (as its name implies) relational. Without using non-standard SQL, I don't see how it could be faster than MySQL based only on theoretical principles.

Relational vs Multivalue

Posted Nov 25, 2005 0:51 UTC (Fri) by huffd (guest, #10382) [Link] (1 responses)

Second, PostgreSQL is (as its name implies) relational. Without using non-standard SQL, I don't see how it could be faster than MySQL based only on theoretical principles.

That is kind of a interesting since MySQL has been posting their speed comparison ratings for years against the top competitors. Nobody has come forward to refute their findings. Typically the competitor's response is "fine but you don't have feature x,y or z", never mind that 90% of the people don't need those features.

MySQL has certainly taken a load of lip service from the other vendors that still haven't published their own or independent tests. Why is that?

Lastly the Day of Reckoning is approaching for MySQL AB because none of the features that should have been there all along are available WITHOUT the InnoDB engine. The talk is cheap coming from MySQL, when they say it's a "pluggable database", but without InnoDB you get NONE of the features that they've spent the last 3 years getting right...is that the fat lady warming up in the Green Room??

I can just make out the lettering on her sweat shirt throught her guttural reverberations, I see a P - G something SQL...

Relational vs Multivalue

Posted Nov 25, 2005 6:12 UTC (Fri) by dlang (guest, #313) [Link]

to be fair to both sides, the MySQL benchmarks haven't been refuted becouse they don't cheat on the benchmarks (not on the results, you could argue differently on their selection of the benchmark itself).

on the other hand there have been numerous other benchmarks that have picked more complex queries that have shown other databases to be faster in them (even if licenses prevent identification of some of the vendors, version numbers like '8i' have given strong hints :-)

the problem is that there are strengths for both MySQL and PostgreSQL and the benchmarks done by each side are done to emphisise their strengths.

if someone was to propose a set of benchmarks that both sides could agree on I'm sure that people could be found to run it on different hardware.

the benchmarks that I have seen that show MySQL to be the speed winner are almost always several years old, and none that I have seen have used the table types that allow for the full range of features. meanwhile, I have been useing Postgres and seen it's speed increase drasticly over the last several releases so even if MySQL was faster a couple years ago it's an open question as to which is faster now.

I work in security and one of the thigns we see all the time is vendors bragging about the speed of their equipment (especially with packet filter based firewalls), what we find is that the advertised speeds are under such stripped down configs and special conditions that when you actually turn on all the features that they offer and hit them with realistic (let alone worst case) traffic patterns, they frequently end up being slower then the proxy based firewalls that they are declaring obsolete and comparing their speed to.

I have some equipment that I will be setting up databases on in the next couple of months that I could probably sneak some time in on to run benchmarks so if someone can define some good benchmarks to run I'm willing to try to run some. I've got dual-Opteron machines, some with 8G ram and 2x15k rpm smallish scsi drives, some with 16G ram and 16x7200 rpm large SATA drives + 2x15k smallish scsi drives.

give me benchmarks to run and who to contact to help tune the databases and systems (I can do a reasonable job for Linux and Postgres, but have never run MySQL)

the machines were purchased for data warehouse use (storing and analysing logs) so I definantly want some tests that work that end of things (they would help justify the use of the machines), but I would also like to run some tests tilted towards the other end of things (OLTP and web servers) as well. I would like to see some tests that are small enough to fit entirely in ram, as well as other tests that have to hit the disks extensivly (possibly by artificially limiting the available ram). If possible I will do both 32 bit and 64 bit tests (and ideally include 64 bit kernel with 32 bit userspace)

the base system I would prefer to use would be Debian Sarge, but I would definantly compile the databases from source.

MySQL is still GPL

Posted Nov 25, 2005 4:53 UTC (Fri) by jamesh (guest, #1159) [Link]

Postgres implements Multiversion concurrency control. This is essentially a method of providing concurrent transactions with snapshot views of the database, and a way of checking for conflicts on commit.

This is quite different to what databases like UniVerse, which is essentially a mechanism to provide heirarchical view of the data, hanging sub-tables off individual records in a parent table.


Copyright © 2005, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds