|
|
Subscribe / Log in / New account

Development

Alternatives to SQL Databases

April 14, 2009

This article was contributed by Ian Ward

Traditional SQL databases with "ACID" properties (Atomicity, Consistency, Isolation and Durability) give strong guarantees about what happens when data is stored and retrieved. These guarantees make it easier for application developers, freeing them from thinking about exactly how the data is stored and indexed, or even which database is running. However, these guarantees come with a cost.

Bob Ippolito presented a talk titled "Drop ACID and think about data" at PyCon 2009, which gave an overview of number of non-traditional databases. These alternatives compromise one or more of the ACID properties and expose the particulars of that data store's implementation in exchange for improved performance or scalability. Each also has its own limitations. This article will look at the more mature open source options Ippolito mentioned.

A number of companies have developed their own in-house data stores, including Amazon's Dynamo and Google's Bigtable. While none of the open source options are exactly like Dynamo or Bigtable, there are a number of high-performance, reliable and scalable options available.

Alternative Database Language Support
* means language support is pending
Cassandra: C++, C#, Java, Perl, Python, PHP, Erlang, Ruby
Memcached: C/C++, C#, Java, Perl, Python, PHP, Ruby, Lua, OCaml, Common LISP
Tokyo Cabinet: C/C++, Java, Perl, Ruby, Lua
Redis: C/C++, Java*, Perl, Python, PHP, Erlang, Ruby, Lua, Tcl
CouchDB: C#, Java, Perl, Python, PHP, Erlang, Ruby, Haskell, JavaScript, Common LISP
MongoDB: C++, Java, Python, PHP, Erlang*, Ruby

Cassandra

Cassandra is a data store written in Java that was open-sourced by Facebook and is now part of the Apache Incubator. Cassandra was originally designed to solve Facebook's in-box searching problem. Email reverse indexes were growing much faster than their databases could keep up with and they needed a affordable way to continue to grow.

Cassandra is designed to scale inexpensively with commodity networking equipment and servers, possibly in multiple data centers. Scalability and high availability are achieved by automatically "sharding" and replicating data across the servers and data centers.

A single Cassandra instance stores a single table, and each row is accessed with a key string. Every row of this table can have its own structure, storing a huge number of (key, value, time-stamp) tuples and/or nested columns. This makes Cassandra much more flexible than a simple key-value store, but not as general as a document database.

Although in heavy use by Facebook, Cassandra is early in development and still lacks some polish and documentation.

Memcached

Perhaps the simplest key-value store is Memcached. Memcached is widely used to to speed up web applications by caching dynamic content. Part or all of the web pages are served from the cache instead of generating them at each request. Unlike in-process or shared memory caches, Memcached listens on a network socket and can be shared by many servers. Memcached may also be run on multiple servers and it will spread the keys across those servers and transparently fall back to servers that are still available when one goes down.

Memcached keys and values are always strings. In addition to storing, retrieving and deleting values, it allows atomic appending/prepending string data to stored values and addition to/subtraction from 64-bit integer values stored as decimal strings.

Memcached's data store is a fixed size and resides entirely in-memory. Data may be stored with an expiration time. Memcached will actively throw out data when the cache is full or when the data is set to expire.

Tokyo Cabinet

For a key-value data store that won't throw out data, Tokyo Cabinet is a good choice. Like Berkeley DB, it uses either a hash table, B+ tree or a array of fixed-length records to store data on disk, but Tokyo Cabinet performs better and is thread safe. Tokyo Cabinet also promises to never corrupt data even in a "catastrophic situation". Tokyo Cabinet is actively maintained, and data stored is not limited by system RAM.

Tokyo Cabinet clients are separated into readers and writers. When a writer is accessing the database all other clients are blocked. This will result in poor performance for write-heavy workloads.

Tokyo Cabinet supports appending data to values stored. When using a B+ tree layout Tokyo Cabinet provides a cursor object to efficiently move forward and backward through the keys. B+ tree pages may also be compressed on disk with zlib or bzip2. Compressing data not only saves disk space, but can also increase performance on I/O-bound systems.

Redis

Redis is a disk-backed, in-memory key-value store with a number of additional features. Redis supports master-slave replication for redundancy, but not sharding, so all data must fit in a single system's RAM. Redis values may be binary strings, lists or sets. Redis provides atomic addition to/subtraction from integer values stored as decimal strings and push/pop/replacement of values in lists. The intersection of set values stored may also be calculated.

Redis can asynchronously save the database on request by forking the server process and writing out data in the background. The last successful save time may be queried to check when the changes have made it to disk. This design allows for good performance with the ability to save data when it makes sense for the particular application, but the application is responsible to make sure data is properly saved.

When using any key-value store as a cache care must be taken to invalidate values when the data changes or inconsistency will be introduced. Choosing a memory-only cache will be faster once it is populated, but there is a cost associated with filling an empty cache on restart. Key-Value stores are ideal for storing data that is not deeply nested and does not require complicated locking.

CouchDB

Document databases are designed to store large blocks of semi-structured data. The data is not restricted to a particular schema, so new versions of data can be stored alongside old versions without the need for migrations. Documents can be very large and deeply nested.

CouchDB is a JSON-based document database written in Erlang. CouchDB gives access to the database over HTTP with a RESTful API. Views of the database may be created on demand using Javascript to collect and filter document contents and are updated as documents change. Indexes are not maintained outside of views, so there is a start-up cost associated with constructing a new view.

CouchDB documents are stored with a sequence number and are never overwritten, this way partial writes will never result in data corruption. Readers are never blocked by writers and will always see a consistent snapshot of the database while reading data. The data is periodically compacted by writing out a new data file and deleting the original once it is no longer being accessed.

CouchDB uses peer based asynchronous replication. Documents may be updated on any peer, allowing for good write throughput. Conflicts will occur when two clients update the same document, and multiple conflicting documents may coexist in the database. A deterministic method is used to decide which document will be treated as the latest version. This lets CouchDB leave conflict resolution to the application. Once a conflict is resolved the new version is stored in the database as usual.

MongoDB

MongoDB is a document database written in C++. MongoDB uses a binary-encoded JSON format that shrinks the data size and allows for faster searching and indexing. Large binary data, such as video files, can also be stored more efficiently in this format. Data is updated in place and MongoDB will automatically run a repair procedure on the database in the event of an unclean shutdown.

MongoDB documents may be nested or include references to other documents. References will be replaced with the value of the referenced document when queried. MongoDB supports persistent single or compound key indexes. Indexes are implemented as B-Trees and queries will automatically take advantage of all indexes available. Queries may include common conditional operators, membership testing and values in embedded documents.

MongoDB has auto-sharding support, splitting documents across many servers so that data stored is not limited by the capacity of a single server. MongoDB also supports asynchronous replication for high availability.

Choosing a Data Store

The best data store for an application depends in large part on how deeply nested the values stored will be. If the application needs to only store strings and integers then a simple key-value store like Memcached, Tokyo Cabinet or Redis would be best. If the values can be represented as lists and sets of simple values then Tokyo Cabinet, Redis or Cassandra would be good options. If the application needs nested lists and hashes then choose Cassandra, CouchDB or MongoDB. Finally, if the values contain deeply nested data then only a document database like CouchDB or MongoDB will do.

Once a data store has been chosen and and the application optimized for it, switching to a completely different API will not be easy. It is worth investing time evaluating the remaining options by writing code to simulate the application's usage patterns before making a choice.

Comments (19 posted)

System Applications

Audio Projects

PulseAudio 0.9.15 released

Version 0.9.15 of the PulseAudio sound server has been announced, Bluetooth support is now complete. See the change log for details. Also, version 0.9.8 of PulseAudio Volume Control has been announced.

Comments (none posted)

Rivendell 1.4.0 released

Version 1.4.0 of Rivendell has been announced, several new capabilities have been added. "Rivendell is a full-featured radio automation system targeted for use in professional broadcast environments. It is available under the GNU General Public License."

Full Story (comments: none)

Database Software

PostgreSQL 8.4 Beta

The beta release of PostgreSQL 8.4 is out; the PostgreSQL developers are looking for testers to help find the remaining bugs. There's a lot of new features in this release; see the announcement (click below) for the list of the most important additions.

Full Story (comments: none)

PostgreSQL Weekly News

The April 12, 2009 edition of the PostgreSQL Weekly News is online with the latest PostgreSQL DBMS articles and resources.

Full Story (comments: none)

SQLite release 3.6.13 announced

Version 3.6.13 of the SQLite DBMS has been announced, several bugs have been fixed.

Comments (none posted)

Device Drivers

libshcodecs 0.9.5 released

Version 0.9.5 of libshcodecs has been announced. "libshcodecs is a library for controlling SH-Mobile hardware codecs. The [SH-Mobile] processor series includes a hardware video processing unit that supports MPEG-4 and H.264 encoding and decoding. libshcodecs is available under the terms of the GNU LGPL."

Full Story (comments: none)

Embedded Systems

BusyBox 1.14.0 and 1.13.4 released

Stable version 1.13.4 and unstable version 1.14.0 of BusyBox have been announced, many changes have been added. "Most of growth is in hush. The rest shrank a bit."

Comments (none posted)

Filesystem Utilities

Announcing Tahoe-LAFS version 1.4

Version 1.4 of Tahoe-LAFS has been announced. "The allmydata.org team is pleased to announce the release of version 1.4.1 of "Tahoe", the Lightweight-Authorization Filesystem. This is the first release of Tahoe-LAFS which was created solely as a labor of love by volunteers -- it is no longer funded by allmydata.com (see [1] for details). Tahoe-LAFS is a secure, decentralized, fault-tolerant cloud storage system. All of the source code is publicly available under Free Software, Open Source licences."

Full Story (comments: none)

LDAP Software

LDAP Account Manager: 2.6.0 released (SourceForge)

Version 2.6.0 of LDAP Account Manager has been announced. "PHP based tool for managing various account types (Unix, Samba, Kolab, ...) in an LDAP directory. This release adds support for the management of NIS netgroups and EDU person accounts. The LAM admin users can now be searched in LDAP instead of providing a static list. LAM Pro allows you to adopt it to your corporate design."

Comments (none posted)

python-ldap 2.3.7 released

Version 2.3.7 of python-ldap has been announced, it includes numerous bug fixes. "python-ldap provides an object-oriented API to access LDAP directory servers from Python programs. It mainly wraps the OpenLDAP 2.x libs for that purpose. Additionally it contains modules for other LDAP-related stuff (e.g. processing LDIF, LDAPURLs and LDAPv3 schema)."

Full Story (comments: none)

Security

announcing ClamAV 0.95.1

Version 0.95.1 of the ClamAV virus scanner has been announced. "0.95.1 is a bugfix release, please see the ChangeLog at http://freshmeat.net/urls/7065abfc92b936d016260efff5f9c67f for details."

Full Story (comments: none)

Web Site Development

GreasySpoon: release 0.5.4 available (SourceForge)

Version 0.5.4 of GreasySpoon has been announced. "GreasySpoon is an scripting solution allowing to easily manipulate HTTP traffic: access control, content filtering, cross-domain management, mashups, etc. GreasySpoon relies on ICAP protocol and is designed to work with any ICAP compatible proxy like Squid. Version 0.5.4 fixes several bugs and includes some optimisations."

Comments (none posted)

Midgard2 9.03.0 RC released

Version 9.03.0 RC of the Midgard2 web content management system has been announced. "The Midgard Project has released first Release Candidate of Midgard2 9.03 "Vinland" - the new generation of the Midgard content repository."

Full Story (comments: none)

mod_wsgi 2.4 announced

Version 2.4 of mod_wsgi is available. "The mod_wsgi package consists of an Apache web server module designed and implemented specifically for hosting Python based web applications that support the WSGI interface specification. Examples of major Python web frameworks and applications which are known to work in conjunction with mod_wsgi include CherryPy, Django, MoinMoin, Pylons, Trac, TurboGears, Werkzeug and Zope. Version 2.4 of mod_wsgi is principally a bug fix update. It fixes memory leaks, configuration corruption, truncation of request/response data and other minor issues. A small number of other minor improvements have also been made."

Full Story (comments: none)

Pyjamas 0.5p1 released

Version 0.5p1 of Pyjamas has been announced. "Pyjamas 0.5p1 is a bug-fix release. Pyjamas is a Web Widget Framework, written in python, that is compiled to javascript using its stand-alone python-to-javascript compiler. It began as a port of GWT, to python."

Full Story (comments: none)

Desktop Applications

Audio Applications

Amarok 2.1 Beta 1 'Nuliajuk' released (KDEDot)

KDE.News covers the first beta release of the Amarok 2.1 music player. "Featuring one of the longest ChangeLogs in Amarok history, this beta release showcases what is possible when building on the strong technical foundations that have been laid with Amarok 2.0. Nearly all parts of Amarok have received attention, and while not a final release, it is already very usable and quite stable."

Comments (none posted)

CAD

pythonOCC 0.2 released

Version 0.2 of pythonOCC has been announced. "We're really proud to announce this new release, which is really a huge step far from the previous one and closer to our main goal: provide an easy-to-use, deploy, maintain and industrial quality python CAD package. Here are the main changes: 'wo' stands for 'Whole OpenCascade': almost the whole OCC API (about 90%) is now covered by the wrapper. Pleat read this file to lists available modules, *memory leaks were fixed *pythonOCC comes now with a complete set of sample scripts *the licence was moved to the GNU General Public License v3 *many bugfixes and improvements."

Full Story (comments: none)

Desktop Environments

GNOME 2.26.1 released

Stable version 2.26.1 of the GNOME desktop environment has been announced. "This is the first update to GNOME 2.26. It contains the usual mixture of bug fixes, translations updates and documentation improvements that are the hallmark of stable GNOME releases, thanks to our wonderful team of GNOME contributors! The next stable version of GNOME will be GNOME 2.26.2, which is due on May 20. Meanwhile, the GNOME community is actively working on the development branch of GNOME that will become GNOME 2.28 in late September 2009."

Full Story (comments: none)

GNOME Software Announcements

The following new GNOME software has been announced this week: You can find more new GNOME software releases at gnomefiles.org.

Comments (none posted)

KDE Software Announcements

The following new KDE software has been announced this week: You can find more new KDE software releases at kde-apps.org.

Comments (none posted)

Xorg Software Announcements

The following new Xorg software has been announced this week: More information can be found on the X.Org Foundation wiki.

Comments (none posted)

Electronics

GNU Radio 3.2 release candidate 2 is available

Release candidate 2 of GNU Radio 3.2, a software-defined radio platform, has been announced. "This release contains all the features, fixes, and bugs of the GNU Radio development trunk as of 4/14/09. Our goal is for this to be the last release candidate before the formal 3.2 release."

Full Story (comments: none)

Graphics

JessyInk 1.0 released

Version 1.0 of JessyInk has been announced, this is the first stable release. "JessyInk is a JavaScript that can be incorporated into an Inkscape SVG image containing several layers. Each layer will be converted into one slide of a presentation. Current features include: slide transitions, effects, an index sheet, a master slide and auto-texts like slide title, slide number and number of slides."

Comments (none posted)

Interoperability

Wine 1.1.19 announced

Version 1.1.19 of Wine has been announced. Changes include: "Support for Visual C++ project files in winemaker. Improvements to the Esound driver. Many Direct3D code cleanups. Fixes to OLE clipboard handling. OpenBSD compilation fixed. Various bug fixes."

Comments (none posted)

Medical Applications

Updated open source releases within OpenVista project (LinuxMedNews)

LinuxMedNews covers the latest release of OpenVista. "Medsphere today announced the release of open source code for recently developed components of the comprehensive OpenVista electronic health record (EHR) solution. Medsphere’s open source release, now available for download at www.medsphere.org, includes the OpenVista Interface Domain (OVID), the OpenVista Clinical Information System (CIS) 1.0RC1, and OpenVista Server 1.5SP1."

Comments (none posted)

Multimedia

Oggz 0.9.9 released

Version 0.9.9 of Oggz, a library and tool set for working with Ogg files, has been announced. "Features: 0.9.9 adds Dirac support, security fixes, improved low-memory behaviour, and a new 'oggz' wrapper tool with bash completion."

Full Story (comments: none)

Music Applications

FluidSynth 1.0.9 released

Version 1.0.9 of FluidSynth has been announced. "On behalf of the FluidSynth team I'm very happy to announce the release of version 1.0.9 - "A Sound Future". The nickname for this release was chosen based on the recent development interest in the project and the addition of new team members. This last development cycle is the result of contributions from many individuals. The future of FluidSynth is looking bright and we are working to create a more regular release schedule and plan for innovation and improvement."

Full Story (comments: none)

gdigi 0.1.7 released

Version 0.1.7 of gdigi has been announced. "Control your Digitech effect pedal under Linux! gdigi is tool aimed to provide X-Edit functionality to Linux users". See the change log for more information on this release.

Comments (none posted)

Office Applications

KeepNote 0.5.2 announced

Version 0.5.2 of KeepNote has been announced, many new features have been added. "KeepNote is a simple cross-platform note taking program implemented in Python. I have been using it for my research and class notes, but it should be applicable to many note taking situations."

Full Story (comments: none)

Science

EAS3 version 1.6.7 released (SourceForge)

Version 1.6.7 of EAS3, the Eingabe Ausgabe System, has been announced. "EAS3 is a toolkit for writing and manipulating IEEE binary floating point data. It provides libraries and a comand line interface for data manipulation. The file format stores 5-dim. arrays plus additional data. It is already used for num. flow solvers. In version 1.6.7 "Faxe", literal constants are defined with _rk. Additionally a minor bug in the interpolation module is corrected."

Comments (none posted)

OBO-Edit: 2.0 is available (SourceForge)

Version 2.0 of OBO-Edit has been announced, it includes a number of new capabilities. "OBO-Edit is an open source, platform independent application for viewing and editing ontologies developed by the Berkeley Bioinformatics and Open Source Projects Group at Lawrence Berkeley National Lab."

Comments (none posted)

Web Browsers

SeaMonkey 1.1.16 security release

Version 1.1.16 of SeaMonkey has been announced. "Today, the SeaMonkey project released a new version of its all-in-one internet suite. SeaMonkey 1.1.16 closes a few critical security vulnerabilities found in previous versions. With that, SeaMonkey offers the same level of security as its sibling Firefox 3, which has issued updates for the same problems recently as well."

Full Story (comments: none)

Languages and Tools

C

GCC 4.4.0 Status Report

The April 14, 2009 edition of the GCC 4.4.0 Status Report has been published. "Release Candidate 1 has been released today. The branch remains open under the usual release branch rules; it is open for regression and documentation fixes only, but please be very conservative at this point in deciding what changes are needed before the 4.4.0 release and what can wait until after that release."

Full Story (comments: none)

IDEs

Pydev 1.4.5 released

Version 1.4.5 of Pydev, an Eclipse IDE plugin for Python and Jython, has been announced, numerous improvements have been made.

Full Story (comments: none)

Libraries

FreeImage: 3.12.0 released (SourceForge)

Version 3.12.0 of FreeImage has been announced, it includes several new capabilities. "FreeImage is an Open Source library project for developers who would like to support popular graphics image formats like PNG, BMP, JPEG, TIFF and others as needed by today's multimedia applications. FreeImage is easy to use, fast, multithreading safe, compatible with all 32-bit versions of Windows, and cross-platform (works both with Linux and Mac OS X)."

Comments (none posted)

Version Control

GIT 1.6.2.3 released

Version 1.6.2.3 of the GIT distributed version control system has been announced. "The latest maintenance release GIT 1.6.2.3 is available at the usual places".

Full Story (comments: none)

Page editor: Forrest Cook
Next page: Linux in the news>>


Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds