The Virtual Data Center
is: an operational, open-source, digital library to
enable the sharing of quantitative research data
include a long list of authors and contributors
working at the Harvard-MIT Data Center. The project is being
funded by the National Science Foundation's Digital Libraries Initiative.
project description gives a deeper description of what
VDC can be used for:
VDC provides a a complete open-source, digital library system for the management, dissemination, exchange, and citation of virtual collections of quantitative data The VDC functionality provides everything necessary to maintain and disseminate an individual collection of research studies: including facilities for the storage, archiving, cataloging, translation, and dissemination of each collection. On-line analysis is provided, powered by the R Statistical environment. The system provides extensive support for distributed and federated collections including: location-independent naming of objects, distributed authentication and access control, federated metadata harvesting, remote repository caching, and distributed virtual collections of remote objects.
Uses of VDC include:
- Study preparation for format conversion of data.
- Study management for data archiving and cataloging.
- Interoperability with data in a number of standard research formats.
- Dissemination of data including downloading, format conversion, and subset generation.
- On-line analysis for generating statistics and graphics.
- Distribution and federation for making the data available widely.
- Replication for creating and managing persistent dataset identifiers.
VDC is being used by a number of fairly high-profile
including a social science data archive at the Harvard-MIT Data Center,
TheDataWeb: a collaboration between
the U.S. Census Bureau and the Centers for Disease Control,
Harvard University's Library Digital Initiative, and
the Henry A. Murray Research Center. You can take an online test drive
of VDC at the
HMDC VDC Server Virtual Data Center Site
a large collection of research papers are available.
The final version 1.0 of the Virtual Data Center (VDC)
was released this week.
"Release 1.0 provides all core features and contains no known bugs. Supported standards and protocols and formats include: DDI, Dublin Core, and MARC for metadata; R,SPSS, SAS,ASCII, and STATA for data; OAI and Z39.50 for queries; UNF's and Handle's for naming/citation.".
For further reading, the VDC
Documentation page contains a number of papers and other
reference material about the project.
The code is available for download
packages are currently available for Red Hat Linux 9, Red Hat
Advanced Server 3 and Fedora Core 1.
Packages for SUSE are on the to-do list.
Digging through the source code repository for VDC reveals a
large collection of Perl code, shell scripts, and R code.
white paper (PDF) is a good starting point for more detailed
information on the project's architecture.
VDC has been released under version 2 of the GNU General Public License (GPL).
to post comments)