Case Study 10

EZID: a digital library data management service

Joan Starr

Abstract.

In order to pursue a cost recovery approach for its EZID service, the California Digital Library (CDL) service developed a pricing plan based on an annual subscription fee-for-service. The case study describes the four-part process of: service definition, cost identification, market study and creation of the pricing model.

Keywords

California Digital Library

digital content

EZID

persistent identifiers

pricing plan

sustainability

Introducing EZID

EZID (‘easy-eye-dee’) is a service offered by the California Digital Library1 (CDL) to simplify the process of obtaining and managing long-term (or ‘persistent’) identifiers for digital content. An identifier is an alphanumeric combination assigned to an object, and if the assignment is managed and the object is made available over time, the identifier becomes a highly reliable way of keeping track of the object. EZID makes creating and managing identifiers easy. EZID has both a user interface2(UI) and an application programming interface3(API), which means that it can fit seamlessly into an automated data management workflow. CDL introduced EZID to the University of California (UC) campuses in the fall of 2010 and opened the service to outside organisations a few months later.

EZID supports two globally unique, persistent identifier schemes: Digital Object Identifiers (DOIs) and Archival Resource Keys (ARKs), and will add other schemes over time, including academic domain-specific schemes such as Life Science Resource Names (LSRNs).4DOIs are identifiers originating from the publishing world and are in widespread use for journal articles. CDL is able to offer DOIs by virtue of being a founding member of DataCite,5an international consortium established to provide easier access to scientific research data on the Internet. ARKs are identifiers originating from the library, archive and museum community. ARKs have certain features distinct from DOIs that make them an attractive option for unpublished materials as well as datasets that researchers want to cite at an extremely granular level.6Researchers may want to take advantage of both identifier schemes at different points in the lifecycle of the dataset, and EZID makes this possible.

The CDL and DataCite missions

As suggested above, DataCite’s purpose is to facilitate dataset citation. By promoting the use of DOIs for datasets, DataCite hopes to extend to datasets the kind of recognition and exposure that scholarly journal articles have achieved. There are direct benefits to researchers from this exposure. They can look for increased citations7and a positive impact on tenure and promotion processes. Moreover, there is evidence showing that data sharing enables new discoveries8and greatly increases the number of publications tied to research data.9

Beyond CDL’s participation in the DataCite mission, with EZID, it has a separate and specific calling to serve the UC libraries, which are facing two new pressures in today’s climate. First, the concept of the scholarly record is expanding to include datasets, which means that librarians are required to gain new skills and engage with clients at a new level. Second, US funding agencies and major philanthropic organisations are issuing mandates for data management plans,10so university research offices look for a campus entity to meet this need. It makes sense for libraries to fill this role because it is an extension of their historic charge as stewards of institutional assets. Data management tools such as EZID are part of a portfolio of services that research libraries can now provide to their patrons.

Development of the EZID pricing plan

To sustain a data management service portfolio at CDL, the organisation has pursued a cost recovery strategy. For EZID, this has taken the form of an annual subscription fee-for-service. Development of the fee model took several months and involved four basic steps: definition of the service offering, identification of costs, study of the market and, finally, creation of the pricing model itself.

Service definition was the first task, because it meant looking at the system as a product in a market or markets and not simply as a library service. The desired outcome was an explicit statement of EZID’s features and benefits, with special focus on any unique offerings. The reader has seen a version of EZID’s service definition at the beginning of this study. A brief recapitulation aimed at the researcher client is as follows:

EZID is a service that makes it simple for researchers and others to obtain and manage long-term identifiers (both DOIs and ARKs) for their digital content. With EZID, you can assign identifiers to anything: scientific datasets,technical reports, audio files, and digital photographs, for example. EZID helps you to take control of the management and distribution of your research, share and get credit for it, and build your reputation through its collection and documentation. EZID is available to individuals, groups and institutions, and can be a valuable tool for data management throughout the research life cycle.

The next step, capturing cost information, began with a scoping statement. It was decided that the EZID revenue model should cover operational expenses, as distinct from development costs.11 This meant collecting estimates of salaries and benefits, supplies and expenses, and technological infrastructure costs at the expected levels for full operation. The category of ‘supplies and expenses’ included project supplies, telephone charges, general liability, employee practices liability and DataCite membership.

The last step prior to working on the pricing itself was to understand the market for the service or product. There were two aspects to this: identifying the target clientele and understanding the alternatives they may have to meet their needs. The primary EZID clients were (and are) the UC campus libraries and the researchers they serve. CDL has multi-channelled communications with the campuses, from representation on the Council of University Librarians12 and numerous committees to status updates via newsletters and listservs. In connection with data management issues and services including EZID, CDL maintains a network of identified data management liaison contacts at each campus for information exchange and service support.

In addition to the UC clients, fellow partners in DataONE,13 an international network supported by the US National Science Foundation, were also considered for early adoption, and one such partner, Dryad,14became the earliest user. There was additional interest in the possibility of offering the service even more broadly over time. Not only does CDL have other network-level relationships as a collaboration-based organisation, but also a broad client base would increase the likelihood of recovering costs.

The potential library clients do have an alternative source for DOIs: CrossRef,15 a publisher-run organisation which has been offering DOI services since 2000, mostly for scholarly publications. In the last eleven years, CrossRef has built a large client-base from among its member publishers and can afford to offer inexpensive accounts to libraries. Originally, CrossRef did not issue DOIs for datasets but has recently moved into this market. Clearly, EZID could not compete with CrossRef on price, so the response would have to be feature-based, in that EZID offers more than DOIs. The inclusion of ARKs and other identifier schemes introduce an opportunity to support the full lifecycle of research.

Libraries and other cultural organisations also have an alternative for ARKs, although it is neither an off-the-shelf nor a hosted option. Any cultural memory organisation can become a registered ARK assignment authority16 and then establish and operate a server17 to generate and support the ARKs themselves. This is a group of tasks that requires a certain level of technical resource capacity. The EZID response to this option would be to emphasise the technical complexity and responsibility of managing a full identifier service.

The conclusions drawn from the market analysis were multiple:

image The primary client targets are the UC campus libraries and the researchers they serve.

image A secondary target group is comprised of the partners in networks of which CDL is a member.

image There is value in offering EZID more broadly. Because EZID accounts are available to non-profit organisations, for-profit commercial businesses and governmental agencies, there are clearly potential clients that do not have viable alternatives for creating and maintaining persistent identifiers.

When all the pieces were in place for the development of the pricing model itself, it was time to consider if the expertise was available in-house. The team concluded that outside assistance would be very helpful and contacted the UC Office of the President, Chief Financial Officer’s Division. Lisa Baird, Associate Director, Strategic Initiatives, took the service and market definitions and the cost information and created a working pro forma.18 It showed five prospective years of cash flow with a particular, yet configurable, pricing scheme. Various other inputs, including staffing and expense costs, were also configurable. This made for an extremely flexible tool that could be used as a scenario planner and price model tester.

During the following three months, two changes to the environment occurred which caused the price model tester to be used repeatedly. First, the original pricing plan had been to charge on a per-identifier basis, consistent with the fees imposed by the International DOI Federation (IDF)19 and following CrossRef’s model. There would be volume-based bands with price breaks at certain levels. When the IDF announced, in the late fall of 2010, an intended policy change rejecting per-identifier fees, CDL management felt that it would be inappropriate to base charges on a non-existent price pressure. So, an entirely new approach had to be imagined and then tested using the pro forma mechanism. The new model was keyed to organisation size and level of research, based on the notion that larger effort is required to support more active research endeavours. The plan also built in a consortial discount arrangement. Fellow DataCite partner Purdue University20and CDL worked out parallel pricing categories using a very simplified version of the Carnegie Classifications for educational institutions.21

The second change was the addition to DataCite of a third member in the US. As noted, the original founding members included two US institutions: CDL and Purdue University Libraries. In 2011, the Office of Strategic and Technical Information,22 US Department of Energy (OSTI) also joined. It was clear that a mutual understanding was needed for operational concerns. Some of the joint agreements have already been described. In addition, a relatively common challenge arises when a prospective client does not make an initial contact directly to one of the three partners, but instead contacts the DataCite central organisation first. It was decided that the following factors would help determine which partner takes the initiative: geographic location, existence of prior relationship with the prospective client and client type. (CDL is the only DataCite partner that works with commercial clients.)

Early experiences

Prior to gaining approval of the EZID business proposal, CDL invited organisations to try EZID free of charge, with the understanding that a payment plan would be imposed at some point in the future. UC researchers, UC campus libraries, several non-profit entities, as well as a few government research teams and one commercial venture took advantage of this opportunity. When the approval came for EZID to implement the fee-for-service model, about half of the groups and organisations that had done free trials moved into paying accounts within the first four months. CDL’s relationships with some of the others evolved. For example, in connection with the individual UC researchers, the emphasis has gone from individual accounts into working with the campus libraries as EZID service providers. A number of UC campus libraries have now purchased university-wide subscriptions so they can provide identifier services to their entire research community.

To date, EZID has acquired attention and new clients beyond the UC campuses without actual marketing, although effort has gone into outreach in several other forms, including webinars, conference speaking engagements, social networking and word of mouth. The team also gets referrals from the DataCite main office in Hanover, Germany, because the DataCite website attracts responses. These may be directed to CDL based on the factors mentioned above. A marketing campaign of some kind is a highly desirable next step, pending resource availability.

Lessons learned

EZID’s path toward cost recovery was motivated by the exigencies of budgetary constraints and it may be that these constraints were not only tangible but also attitudinal. With repeated messages related to budgetary reductions being given at the organisation level, it was relatively difficult for individuals to think completely freely about how to implement the business plan. Thus, when selecting a methodology for handling the financial exchanges between CDL and its clients, options that might have represented real expenditures were not explored. This led to a reliance on the existing library licensing apparatus and the inhouse billing system. These have functioned adequately without doubt, and they work well for CDL’s services that involve asset or content storage. They may be more complex than necessary, however, for a more ‘lightweight’ product like EZID. Adopting the readily available method was easy and fast, and from that perspective quite beneficial to the start-up needs of the project. Still, there are times when the idea of potential clients clicking on an online licensing agreement and then paying with PayPal sounds very appealing.

Six months into the implementation of the pricing model, it is clear that the carefully crafted subscription categories are a better fit for some client segments than others. Medium and large educational institutions and governmental agencies seem to find the sizing to be appropriate. Small schools and small non-profit organisations are more difficult to serve with the EZID cost model as it currently stands. These entities may be in a difficult position, because they often do not have adequate resources to run their own identifier service. Perhaps at some point in the future, if the cost recovery strategy has been highly successful, it will be possible to revisit the rate structure in such a way as to better accommodate these smaller institutions. A further observation is that there really is no single ‘academic community’, but rather an array of academic communities with different constraints and opportunities. Some of these communities are geographic and some are academic domain specific. This suggests that some flexibility and an attitude embracing continuous learning are important assets for libraries and librarians implementing a cost recovery solution.

Looking ahead

The EZID team looks ahead to future developments that fall into three categories: feature extensions, infrastructure improvements and service delivery. As a user-facing service, EZID has a development roadmap for enhancements aimed at improving the services offered to clients. The general themes underway at the time of this writing include a redesigned user interface with support for activity reporting and browsing, identifier services that increase support for the research lifecycle such as tombstone pages,23 and the addition of other identifier schemes. Provision of support for domain-specific schemes will allow researchers to work with the identifier scheme(s) they prefer for the majority of the research cycle, only shifting to a DOI upon submission for publication, for example.

Behind the scenes, the EZID infrastructure is being strengthened. As the service matures, clients expect to see a professional-level business continuity plan in place. Therefore a primary goal at present is to establish a robust network for replication of the services underpinning the ARK service as well as the EZID management layer. The DOI service is already built upon an internationally distributed infrastructure.

Lastly, an important direction for the financial sustainability of EZID is to extend the service delivery model. Software systems providers can generate revenue in more than one way. The hosted solution model may be the primary revenue channel for EZID, but it was always the intention of the team to offer a second channel loosely based on the Red Hat24 idea. In other words, for an additional annual charge, this ‘super’ client would get a branded version of EZID and the ability to sign up clients of their own (along with related administrative abilities). Certain arrangements and requirements have to be met by such a client, to be sure, but this mechanism represents an opportunity for strengthening existing partnerships and creating new bonds where none existed before.

Running a library service while recovering its costs can be a daunting enterprise. But being budget driven can provide some real advantages. EZID’s economic imperative has been as a strong teacher about marketing, teaching the team to talk about library services in new ways. Equally, it has given priority to the formal improvements in procedures, documentation and infrastructure that are necessary for business continuity under any circumstances. And, it has opened doors to new kinds of partnerships with new kinds of partners. EZID is in the process of tracking costs and income so that, over the next couple of years, it can determine whether or not the service can indeed pay for itself, but, meanwhile, the process of trying has already generated returns.


1.The CDL was founded by the University of California in 1997. It is part of the Office of the President and serves all ten campuses of the university. For more information, see: http://www.cdlib.org/.

2.http://n2t.net/ezid

3.Application programming is needed for two software programs to communicate with one another. For EZID’s API documentation, see: http:// n2t.net/ezid/doc/apidoc.html.

4.For more information about LSRNs, see: http://lsrn.org/.

5.For more information about DataCite, see: http://datacite.org/.

6.ARKs can be deleted, unlike DOIs. In addition, the ARK server has a feature called ‘suffix pass-through’. Assume that a large object, A, has 10,000 small components arranged in a hierarchy, as in A/B/C. It is possible to get an ARK for the large object, registering its location with EZID, and have the rest of the hierarchy pass through. For example, if A is mapped to the location L, then component A/B/C will automatically be mapped to L/B/C. The DOI resolver does not do this.

7.H.A. Piwowar, R.S. Day and D.B. Fridsma (2007) Sharing detailed research data is associated with increased citation rate’, PLOS ONE, 2 (3): e308. Online at: http://www.plosone.org/article/info:doi/10.1371/journal. pone.0000308.

8.G. Kolata (2010) ‘Sharing of data leads to progress on Alzheimer’s’, New York Times, 13 August.

9.A.M. Pienta, G.C. Alter and J.A. Lyle (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data. Paper presented at ‘The Organisation, Economics and Policy of Scientific Research’ workshop, Torino, Italy, April. Online at: http://hdl.handle.net/2027.42/78307.

10.For example, see NSF (2010) ‘Scientists seeking NSF funding will soon be required to submit data management plans’. Online at: http://www.nsf. gov/news/news_summ.jsp?cntn_id=116928.

11.Development costs have been partially defrayed by generous support from the Gordon and Betty Moore Foundation (see: http://www.moore.org/).

12.The Council of University Librarians is composed of the University Librarians from each of the ten campuses and the Executive Director of CDL. For more information, see: http://libraries.universityofcalifornia.edu/about-uc-libraries.

13.DataONE’s aim is ‘to ensure the preservation and access to multi-scale, multi-discipline, and multi-national science data’ (see: https://www.dataone.org/).

14.Dryad is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences (see: http://www.datadryad.org).

15.For information about CrossRef, see: http://www.crossref.org/.

16.This process is called getting a Name Assignment Authority Number, or NAAN. For more information about NAANs, see the CDL Curation wiki entry at: https://wiki.ucop.edu/display/Curation/NAANs.

17.The ARK server is known as NOID, which stands for Nice Opaque IDentifier minter and resolver. For more information, see https://wiki.ucop.edu/display/Curation/NOID.

18.’Pro forma describes a presentation of data, typically financial statements, where the data reflect the world on an "as if" basis.’ See: http://economics. about.com/od/economicsglossary/g/proforma.htm.

19.For information about the IDF, see: http://www.doi.org/.

20.For more information about the Purdue University Libraries, see: http:// www.lib.purdue.edu/.

21.For a description of the Carnegie Classifications, see: http://classifications. carnegiefoundation.org/descriptions/basic.php. The CDL size breakdown for commercial clients is based on the Bloomsbury Business Library – Business & Management Dictionary categories for commercial firms.

22.For more information about OSTI, see: http://www.osti.gov/. At the time of writing, it is understood that OSTI will be providing services to certain US government agencies.

23.A tombstone page is a web page returned for a resource no longer found at its target location of record. The tombstone may provide ‘last known’ metadata, including the original owner.

24.For a discussion of the merits of this approach, see: N. Munga, T. Fogwill and Q. Williams (2009) ‘The adoption of open source software in business models: a Red Hat and IBM case study’, SAICSIT Conference 2009, pp. 112–21. Online at: http://www.informatik.uni-trier.de/~ley/db/conf/saicsit/saicsit2009.html#MungaFW09.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset