Case Study 6

The Chronopolis digital network: the economics of long-term preservation

David Minor and Ardys Kozbial

Abstract.

The case study describes the experience of the Chronopolis Digital Network after the initial funding was exhausted and considers the various funding models that are most likely to enable the activity to achieve sustainability in the longer term. A layered funding approach is seen as the most appropriate way forward.

Keywords

Chronopolis Digital network

economics

funding models

preservation

sustainability

About Chronopolis – digital preservation across space and time

Spanning academic institutions and disciplines, the Chronopolis digital preservation network provides services for the long-term preservation and curation of digital materials. Because of the ephemeral nature of digital information, it is critical to organise and preserve the digital assets that represent society’s intellectual capital – the core seeds of knowledge that are the basis of future research and education.

The San Diego Supercomputer Center (SDSC) and University of California San Diego (UCSD) Libraries, with the National Center for Atmospheric Research (NCAR) and the University of Maryland Institute for Advanced Computer Studies (UMIACS), have created Chronopolis to address these issues. Originally funded by the Library of Congress, the Chronopolis digital preservation network has the capacity to preserve hundreds of terabytes (TB) of digital materials – data of any type or size, with minimal requirements on the data provider. The project leverages high-speed networks, mass-scale storage capabilities and the expertise of its partners to provide a geographically distributed, heterogeneous, highly redundant archive system. Features of the project include:

image three geographically distributed copies of all data;

image curatorial audit reporting;

image development of best practices for data packaging and sharing.

‘Chronopolis is part of a new breed of distributed digital preservation programs,’ says Brian E.C. Schottlaender, a Principal Investigator for Chronopolis and the Audrey Geisel University Librarian at UCSD. ‘We are using a virtual organizational structure in order to assemble the best expertise and framework to provide data longevity, durability and access well into the next century.’1

Chronopolis in depth

Chronopolis is a digital preservation data grid framework developed by the San Diego Supercomputer Center at UCSD, the UC San Diego Libraries and their partners at the National Center for Atmospheric Research (NCAR) in Colorado and the University of Maryland’s Institute for Advanced Computer Studies (UMIACS). The Chronopolis network provides cross-domain collection management for long-term preservation. Using existing high-speed educational and research networks and mass-scale storage infrastructure investments, the network leverages the data storage capabilities at SDSC, NCAR and UMIACS to provide a preservation data grid that emphasises heterogeneous and highly redundant data storage systems. Each Chronopolis member organisation operates a grid node containing at least 250 TB of storage capacity for digital collections. For reference, just one terabyte of information would use up all the paper made from about 50,000 trees. The Chronopolis data grid provides a minimum of three geographically distributed copies of its data collections, while enabling curatorial audit reporting and access for preservation clients. The key underlying technology for managing data within Chronopolis is the integrated Rule-Oriented Data System (iRODS), a preservation middleware software package that allows for robust management of data.2 The Chronopolis partnership is also developing best practices for the worldwide preservation community for data packaging and transmission among heterogeneous digital archive systems.

Chronopolis has concentrated on housing a wide range of content that is not tied to a single community. Currently, there are five significant collections housed in Chronopolis:

1. Data from the North Carolina Geospatial Data Archiving Project, a joint project of the North Carolina State University Libraries and the North Carolina Center for Geographic Information and Analysis. It is focused on the collection and preservation of digital geospatial data resources from state and local government agencies in North Carolina.

2. Scripps Institution of Oceanography at UC San Diego (SIO) has one of the largest academic research fleets in the world, with four research vessels and the research platform FLIP. Since 1907, Scripps oceanographic vessels have played a critical role in the exploration of our planet, conducting important research in all the world’s oceans. SIO houses data from several decades of data from its cruises.

3. The California Digital Library (CDL) is storing content from its ‘Web-at-Risk’ collections. Web-at-Risk is a multi-year effort led by CDL to develop tools that enable librarians and archivists to capture, curate, preserve and provide access to web-based government and political information. The primary focus of the collection housed in Chronopolis is state and local government information, but may include web documents from federal and international government as well as nonprofit sources.

4. The UCSD Libraries Digital Library has recently added its complete holdings to Chronopolis and plans to continue to add more content as it becomes available. This content is a full backup of its digital library holdings, representing many decades of important cultural artifacts.

5. The UCSD Research Cyberinfrastructure Curation Program3 will be adding more than 150 TB of curation research datasets into Chronopolis. These are data that have been determined to be scientifically important for the intellectual future of the University. Chronopolis will form the preservation environment for this programme.

Initial funding

As has been noted, Chronopolis began as a grant-funded programme under the auspices of the Library of Congress’s National Digital Information Infrastructure and Preservation Program (NDIIPP).4 This initial funding, which totalled more than $2 million over approximately three years, provided the seed funding that allowed Chronopolis to purchase and install the hardware infrastructure, install and configure the software and middleware components, ingest data from a variety of providers, and maintain and update all components as necessary. This funding also provided the necessary staffing for all of the above work, including management of the system. As with all NDIIPP projects, this work was intended to benefit NDIIPP and its partners. Thus the initial collections housed in Chronopolis came from NDIIPP partners. The approximate breakdown of responsibility among the Chronopolis partners was:

image SDSC provided project management, hardware and software installation and maintenance, financial management of the system: 5 full-time equivalent (FTE).

image NCAR provided hardware and software installation and maintenance, aided in the development of user-facing services, including a web portal and ingest tools: 1 FTE.

image UMIACS provided hardware and software installation and maintenance, and developed two of the core tools used in the Chronopolis network for moving and monitoring data: 1 FTE.

Funding: the next generation

It was always explicit during the NDIIPP funding period that one day Chronopolis would need to stand on its own: to look for its own funding and customers. NDIIPP funding for Chronopolis began to wind down in the first half of 2011. The Chronopolis partners had already begun the process of looking for new sources of funding, as this transition had been expected. The Chronopolis management team identified several possible sources:

image work with paying customers – that is, customers who would pay an annual fee, likely per terabyte, that would maintain the Chronopolis costs;

image work with researchers or research groups to prepare them to enter Chronopolis into their current and future research programmes;

image continue to look for grant funding, from a variety of sources;

image seek long-term institutional support from at least one of the Chronopolis home institutions.

Work with paying customers

SDSC, the financial management arm of the Chronopolis team, worked for several months to develop a recharge model. This model, based on a per terabyte per year charge, was designed to offset the costs for the basic maintenance of the Chronopolis infrastructure. This financial model has many components, including:

image FTE necessary at the three Chronopolis partner sites;

image costs to continue maintenance of current hardware infrastructure as well as costs to upgrade it on a reasonable time schedule;

image costs to cover adding infrastructure in a timely manner if the capacity of the system demands it;

image costs if necessary are added for ‘overhead’, which is required in the university setting.

Work with researchers or research groups to prepare them to enter Chronopolis into their current and research programmes

Another closely related customer opportunity comes in the form of individual researchers, or research groups, writing Chronopolis as their designated preservation environment into grants and long-term data management planning. The need for this kind of service has become particularly relevant because several large funding agencies in the United States, including the National Institute of Health and the National Science Foundation, are requiring these kinds of services be added into the narrative of all grants. Using the recharge described above, Chronopolis can provide a ready-made infrastructure that researchers wouldn’t have to develop or maintain on their own.

Continue to look for grant funding, from a variety of sources

In order to keep the recharge amount at the most reasonable level, it was decided that it would not include funding for research within the Chronopolis team. This research has always been one of the core pieces of the enterprise. In order to fund this, Chronopolis will be investigating and applying for its own grant funding from a variety of services. This will provide a layered approached to funding in the future. Options appropriate for grant funding would include activities such as: further enhancement of the digital object monitoring capabilities, further enhancement of high-speed research networks and examination of new data management tools. Grants of this type would allow the production system to continue running on stable software with a stable work flow. At the same time, the Chronopolis team would be able to experiment and research without compromising current data that are under Chronopolis care.

Seek long-term institutional support from at least one of the Chronopolis home institutions

Finally, and perhaps most importantly, all of the above funding scenarios are short term, that is they provide the funding needed to maintain Chronopolis for finite amounts of time. They do not address the central question of long-term sustainability. In order to address this issue, Chronopolis has sought the support of an institution which itself is committed to long-term existence. In the spring of 2011, Chronopolis achieved this level of funding when it was chosen as the preservation service for UCSD’s new Research Cyberinfrastructure (RCI) Data Curation service. This service is funded from central campus funds, because it is designated as a core service that the campus needs to supply in order to protect and enhance its intellectual capital. The RCI programme offers UC San Diego researchers computing, network and staffing to create, manage and share data. Campus researchers are encouraged to use the campus’s RCI in addressing federal sponsors’ existing and new data management requirements.

Funding: a layered approach

Given the funding sources outlined above, Chronopolis now has a layered approach to funding which can be represented roughly as in Figure CS6.1. Note that while the sizes of the pyramid sections above are indicative of scale, they are not themselves to scale.

image

Figure CS6.1 Chronopolis – layered approach to funding

This layered approach represents Chronopolis’s long-term economic strategy. It is important to note that the specific layers listed do not represent specific institutions, organisations or customers. Nor are they exclusive: it should be the case that no one institution provides the base funding; instead, a more sustainable operation requires a mix of institutions dedicated to maintaining digital preservation.

Lessons learned

The Chronopolis team has learned several lessons in the half-decade of development and service.

image Begin work on planning for sustainability as soon as possible. There are many practical concerns regarding sustainable funding that may not be immediately apparent. For example, for Chronopolis, the proof of concept and start-up phases were well within the mission of all of its partners. However, once Chronopolis began seeking paying data providers, it became more like a vendor, an entity that provides a service for a price. Is this service provider role included in any of the partners’ missions? Fortunately for Chronopolis, providing service at a recharge rate is in SDSC’s mission. This may not be true for many educational institutions.

image Begin work on service level agreements (SLAs) and memoranda of understanding(MOUs)early. Executing written partnership agreements, such as MOUs, among institutions can take significant amounts of time to complete, especially when the legal departments of the institutions must get involved. SLAs, such as those that Chronopolis has with its data providers, pose issues of responsibility and money-taking. Which entity takes responsibility for the data? Which partner entity can receive recharge payments? Don’t wait until the concept has been proved to begin asking these questions.

In sum, the Chronopolis digital preservation system was founded with a strong belief that technology could be harnessed to provide long-term archiving solutions. This belief has been demonstrated to be true. The economic sustainability of the project requires at least as much attention as the technical, however. It has many more open questions in both the present and the future. Chronopolis has sought to answer these questions using a layered approach to funding sources.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset