LWN.net Logo

VDC: the Virtual Data Center

The Virtual Data Center is: an operational, open-source, digital library to enable the sharing of quantitative research data. The project acknowledgments include a long list of authors and contributors working at the Harvard-MIT Data Center. The project is being funded by the National Science Foundation's Digital Libraries Initiative.

[VDC] The project description gives a deeper description of what VDC can be used for:

VDC provides a a complete open-source, digital library system for the management, dissemination, exchange, and citation of virtual collections of quantitative data The VDC functionality provides everything necessary to maintain and disseminate an individual collection of research studies: including facilities for the storage, archiving, cataloging, translation, and dissemination of each collection. On-line analysis is provided, powered by the R Statistical environment. The system provides extensive support for distributed and federated collections including: location-independent naming of objects, distributed authentication and access control, federated metadata harvesting, remote repository caching, and distributed ”virtual” collections of remote objects.

Uses of VDC include:

  • Study preparation for format conversion of data.
  • Study management for data archiving and cataloging.
  • Interoperability with data in a number of standard research formats.
  • Dissemination of data including downloading, format conversion, and subset generation.
  • On-line analysis for generating statistics and graphics.
  • Distribution and federation for making the data available widely.
  • Replication for creating and managing persistent dataset identifiers.
VDC is being used by a number of fairly high-profile projects including a social science data archive at the Harvard-MIT Data Center, TheDataWeb: a collaboration between the U.S. Census Bureau and the Centers for Disease Control, Harvard University's Library Digital Initiative, and the Henry A. Murray Research Center. You can take an online test drive of VDC at the HMDC VDC Server Virtual Data Center Site, a large collection of research papers are available.

The final version 1.0 of the Virtual Data Center (VDC) was released this week. "Release 1.0 provides all core features and contains no known bugs. Supported standards and protocols and formats include: DDI, Dublin Core, and MARC for metadata; R,SPSS, SAS,ASCII, and STATA for data; OAI and Z39.50 for queries; UNF's and Handle's for naming/citation.".

For further reading, the VDC Documentation page contains a number of papers and other reference material about the project.

The code is available for download here, packages are currently available for Red Hat Linux 9, Red Hat Advanced Server 3 and Fedora Core 1. Packages for SUSE are on the to-do list. Digging through the source code repository for VDC reveals a large collection of Perl code, shell scripts, and R code. The project Design Overview white paper (PDF) is a good starting point for more detailed information on the project's architecture. VDC has been released under version 2 of the GNU General Public License (GPL).


(Log in to post comments)

VDC: the Virtual Data Center

Posted Oct 1, 2004 12:07 UTC (Fri) by jello (subscriber, #6083) [Link]

That's Gnu _General_ Public License

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds