Borealis, the Canadian Dataverse Repository, is a digital research data repository available to researchers at participating Canadian universities and research organizations with infrastructure hosted by the University of Toronto Libraries (UTL) in Ontario, Canada. The Borealis Preservation Plan outlines the objectives, roles and responsibilities, strategies, and actions for preserving the digital files uploaded by users and stored in the repository. Borealis uses the open-source Dataverse software, developed and maintained by the Institute of Quantitative Social Science (IQSS) at Harvard University and community members from around the world. The Preservation Plan complements strategies, policies and procedures for curation and preservation that Participating Institutions may also have in place for datasets in institutional collections within Borealis.
Archivematica: an open source, standards-based processing tool for creating well-formed packages for preservation storage. Archivematica performs signature-based file format identification, validation and characterization functions; can normalize copies of files to preservation and access formats; and creates preservation metadata files using the METS and PREMIS standards.
BagIt: a set of formatting conventions that guide creating checksums for, and verifying the fixity of, collections of files. Files contained in a BagIt-formatted directory (commonly called a “bag”) include a manifest of checksums that can be used to ensure that the contents of the directory have retained fixity after transfer or in storage.
Bit-level preservation: one type of digital preservation strategy. This strategy is focused on ensuring that files retain fixity in storage and that files are stored in multiple locations to protect against accidental loss or corruption. Bit-level preservation does not guarantee any form of future usability/accessibility based on the contents or format of the files in question.
Checksum: a unique numeric or alphanumeric string produced by running a checksum-generating algorithm against a file. When the contents of the file are altered in any way, the checksum value will change, indicating that the file no longer has fixity and therefore should be replaced from a good copy. Checksum algorithms include MD5, SHA-1 and SHA-256.
Dataset: a dataset is a container for a group of related files. For example, a dataset can include the original source data, code, and/or documentation related to a single study or publication. A dataset must also include metadata added by the user to describe the files, including a title, author(s), description and subject.
Dataverse: the open-source research data repository software application with which the Borealis repository is hosted and operated. Dataverse is developed by the Institute for Quantitative Social Science (IQSS) at Harvard University.
Digital preservation: “the series of managed activities necessary to ensure continued access to digital materials for as long as necessary” (DPC Glossary). Digital preservation activities can include active and ongoing monitoring of files and formats, regular fixity checks, and refreshing of storage media.
Fixity: the quality of knowing that a digital file has not been altered or changed. Fixity is established via computing a checksum. Fixity information can help establish the integrity of files via evidence that files have remained physically unchanged over time.
Ontario Library Research Cloud (OLRC): a five-node community cloud storage network maintained by Scholars Portal which Borealis makes use of as part of its operations. The OLRC uses the OpenStack Swift software to connect five storage nodes located at the University of Toronto, the University of Guelph, the University of Ottawa, York University, and Queen’s University. All data stored in the OLRC is replicated across three of the five nodes for reliability and integrity. If one of these copies becomes unreadable, a new copy is created by the system from the two remaining good copies. The OLRC service also includes access to the DuraCloud software for advanced preservation management for packages stored in the OLRC. Additional information about the security of the OLRC is contained in Technology Infrastructure and Security Information.
Permafrost: a hosted digital preservation service offered by Scholars Portal to members of the Ontario Council of University Libraries (OCUL). Permafrost pairs Archivematica with the OLRC to provide access to a technical infrastructure, support, and training to enable OCUL members to actively process digital objects for long-term preservation and access.
The objectives of UTL’s preservation activities for the Borealis repository are as follows:
Participating Institutions: responsible for administering the use of Borealis at their institution. Institutions subscribe to Borealis via consortial agreements and are allocated storage space and administrative rights for staff to manage their institutional collection within the Borealis repository. Institutions are responsible for oversight of the data uploaded to their institutional collection by setting collections policies and deposit guidelines, administering users and user rights, and handling takedown and copyright decisions. Institutions may also validate data deposits for quality and completeness via curation activities, including determining preferred file formats for deposit or supporting depositors with advice on file format conversions. Institutions for which Borealis collection deposits comprise a part of their institutional collections may also design and implement additional preservation policies and procedures for their collection or for selected sub-collections or datasets within that collection.
University of Toronto Libraries: responsible for the technical maintenance and administration of the Borealis repository software and service. UTL ensures the Dataverse application is functional, secure, and updated. UTL also maintains the connected storage infrastructure for datasets, liaises with designated contacts at Participating Institutions, and makes available guides and training to Participating Institutions. UTL maintains no oversight over the quality, completeness, or format of files uploaded by users but will assist in identifying and remediating fixity issues in collaboration with Participating Institutions as they arise.
Description: The first level of preservation combines two broad sets of activities: bit-level preservation via regular independent fixity checking and safe storage in the OLRC, and maintaining the preservation-supporting features that are part of the Dataverse application. As technical service provider, UTL is not directly responsible for validating the contents or quality of user-uploaded files. This level of preservation addresses Objectives 1, 2 and 3: that user-uploaded files are safe from loss and that minimum level preservation functions are run as a necessary precursor to additional preservation strategies.
Scope: All data files deposited by registered users to Borealis. This includes files associated with draft and restricted datasets and different versions of files uploaded by users. It does not include files generated by the Dataverse application itself, such as derivatives, thumbnails and citation metadata files.
Term: UTL will maintain Level 1 preservation activities for as long as an institution is a subscriber to the Borealis service. As designated in the Access and Service agreement signed between the University of Toronto and Participating Institutions, UTL commits to maintaining data deposits for 6 months after termination of the agreement. However, UTL will support any processes for dataset export as required by subscribers.
Description: This level of preservation is intended for Participating Institutions who require advanced preservation processing and/or the export of independent preservation packages for inclusion in institutional collections and storage in additional preservation environments. Advanced preservation functions may be conducted when Borealis is paired with the Archivematica workflow tool for preservation processing. Archivematica can create independent preservation packages of datasets in any Dataverse repository, and its workflow includes additional functions such as signature-based file format identification, file format validation, characterization and normalization. Independent packages created by Archivematica would then be sent to a preservation storage location of choice. Alternatively, Institutions may opt to create and accept exports of packages from any Dataverse application in BagIt format. Additional information on these features and functionality is described below.
Scope: Participating Institutions are responsible for determining which datasets are eligible for additional processing and export. Administrators, curators or other designates at Participating Institutions may select the complete contents of their institutional collections or a subset as guided by internal appraisal and selection criteria.
Thank you to the former Dataverse North Policy Working Group for creating an initial policy framework for Borealis which informed the structure and approach for this document. The Alliance RDM Preservation Expert Group’s report Preservation for Dataverse in Canada: Recommendations provided key requirements for the preservation strategies outlined above. Additional sources of inspiration were the Texas Digital Library Digital Preservation Policy and the Harvard Dataverse Preservation Policy.