Borealis, the Canadian Dataverse Repository, is a digital research data repository available to researchers at participating Canadian universities and research organizations with infrastructure hosted by the University of Toronto Libraries (UTL) in Ontario, Canada. UTL commits to maintaining an information technology (IT) environment that appropriately protects the availability, privacy, confidentiality, and integrity of all content and personal information. The Borealis Technology Infrastructure and Security Information document outlines general information, technical infrastructure, application security, and storage and backup details.
Any member of the public is able to search, view, and download unrestricted data without a user account.
Users who wish to view and download restricted data must have an account.
Users who wish to create collections and/or datasets and upload data and metadata must have a user account affiliated with a participating institution.
Users without an institutional affiliation can create a standard account on the service. Users with an institutional affiliation can create a standard account, or create an institutional account if their institution has registered for the Research & Scholarship Entity Profile through the Canadian Access Federation (CAF), an identity management service for Canadian research institutions run by CANARIE. Institutional accounts use Shibboleth (single-sign on) login architecture.
UTL has agreements with regional university library consortia to provide Borealis as a service to over 50 participating Canadian post-secondary institutions.
Participating institutions are:
UTL will continue to provide all services outlined in each participating institution’s service level agreement up to and including six (6) months after the termination of the agreement.
The open-source, web analytics software platform Matomo has been installed in order to track and analyze traffic. An IP-restricted detailed dashboard is available to Borealis staff, which provides real-time and/or longitudinal information regarding traffic to and across the service provider’s websites. Matomo analytics data are stored on local servers and are not shared with any third-party.
In addition to the dashboard, Matomo generates a monthly analytics report, which includes detailed summaries on visits to the main webpage, plus visits to the Metrics page, the Data Curation tool, and the Data Explorer tool.
In addition to user analytics provided by Matomo, a metrics report can be viewed by any user. The metrics report includes the number of downloads per month, the number of datasets within the top 15 collections, the size of the top 15 collections, the distribution of file types uploaded, and the distribution of the subject categories used to describe datasets. These metrics can be viewed for the entire service, or by institutional collection and can be downloaded into a spreadsheet for further analysis.
For more information on how personal information is collected, stored, and deleted, please refer to the Privacy Statement.
All metadata associated with published collections and datasets are harvestable by other digital repositories and global search engines, as per the Terms of Use. The Dataverse platform supports open APIs and the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), which allows published collections and datasets to be harvested by other digital repositories and systems for the purpose of global data discovery.
Borealis metadata is regularly crawled and published by the Federated Research Data Repository (FRDR), Google Dataset Search, DataCite, Web of Science Data Citation Index, Mendeley Data, and the Harvard Dataverse.
Borealis also works with the Shared Access Research Ecosystem (SHARE) to integrate public datasets into open web discovery services. The SHARE notification system is a higher education-based initiative that strengthens the effort to identify, discover, and track research output.
The Dataverse software for Borealis is hosted by UTL using a suite of technologies including:
The Dataverse software is supported and developed by the Institute for Quantitative Social Science (IQSS) at Harvard University. A dedicated team supports the continuous development of the application, alongside support from community developers, experts in data curation and data preservation, and users.
New releases of the Dataverse software are continuous. The software’s development is informed by a strategic roadmap including and incorporating feedback from community members.
The current version of the Dataverse software for Borealis can be found in the bottom right-hand corner of every page on the platform. Specific information about the current version of Dataverse can be found on GitHub.
Borealis employs a variety of widely used community standards for metadata export:
Additional standards for application functionality and data access/deposit employed:
All Dataverse installations are guided by the instructions in the Securing your installation and Network ports sections of the Dataverse installation guide, among others dealing with the security of the application. These pages include documentation on securing Solr and API endpoints, forcing HTTPS, and using proxies, all to ensure the application is adequately secured from external threats. Borealis staff promptly act on Dataverse security advisory notices sent by IQSS.
The Borealis installation of the Dataverse software is located on servers at the University of Toronto. Data centres at the University of Toronto follow both the Policy on Information Technology and the Policy on Information Security and the Protection of Digital Assets. All digital assets at the University of Toronto are required to follow the Information Security Standard, which provides a set of baseline controls and minimum standards for information security at the University. These standards are endorsed by the University’s Information Security Council and are aligned with the National Institute of Standards and Technology (NIST) 800-171 for the protection of data. These standards also include an Incident Security Response Plan.
The Information Security and Enterprise Architecture department at the University of Toronto, as per the policy for digital assets, has also developed a procedure for reporting an information security incident or event and a set of guidelines for the U of T community to mitigate risks associated with information security. These guidelines include recommendations and requirements for the protection of data centres at the University of Toronto.
Borealis has both remote and local authentication methods enabled.
To communicate security-related issues regarding Borealis, notify the Service Provider.
The Ontario Library Research Cloud (OLRC) is a collaboration between Ontario’s university libraries, through the Ontario Council of University Libraries (OCUL), to build a high-capacity, geographically-distributed storage and computing network using proven and scalable open-source cloud technologies.
More information about OLRC hardware and software can be found on the OLRC User Guide. At any time, all data contained within Borealis is stored on at least three (3) of the five (5) OLRC nodes to ensure continuous access and efficient recovery of data due to technical issues, natural disasters, or other damaging events.
Data stored in the OLRC can only be accessed via specific, designated IP addresses. Only systems administration staff have direct access to the Borealis data stored in the OLRC.
The OLRC uses ORION (the Ontario higher-education research network) and GTANet (the research, education, health and public sector community network in the greater Toronto area) to connect the five (5) storage nodes via a virtual private network. Access to the OLRC is controlled by proxy servers located at the University of Toronto, via ORION. All proxy server connections use SSL, are authenticated, and are restricted to authorized IP addresses.
Ownership of data in the OLRC follows the same Terms of Use as Borealis. In other words, while data published via Borealis is stored on the OLRC, ownership remains with the user(s) who posted it as per the licensing terms they provided.
The OLRC has five (5) data storage nodes at York University, the University of Guelph, Queen’s University, the University of Ottawa, and the University of Toronto. Each of these universities have set up an OLRC node within one of their existing institutional data centres. Security associated with each data centre is based on best practices and IT policies created and enforced by host institutions. Each data centre also has redundant power and cooling systems to prevent data loss or damage due to power supply issues.
At a minimum, each data centre is only accessible, via a secure keycard, to qualified and approved IT institutional support personnel. Each data centre has also implemented standard security protocols such as firewalls to limit inbound and outbound traffic to specific ports and to/from specific domains. All data stored on the OLRC, including all data from Borealis, is encrypted at rest.
The data stored on the OLRC is contained within a private VLAN that connects the configured nodes (i.e., the current five (5) institutions). The VLAN is operated by ORION and only they can add/remove access to the private network on the direction of system administrators.
System administrators keep all OLRC software and operating systems up-to-date and regularly refresh hardware. They receive regular alerts regarding security threats and critical security patches are applied as soon as possible. All software updates are tested in a development environment before being deployed in production.
All data stored in Borealis is synced to a Network File System (NSF) disk nightly. From that location, all data is sent to the Tivoli Storage Manager (TSM) tape storage at the University of Toronto Data Centre. The TSM backup policy stipulates:
For information about data preservation strategies and activities, please see the Borealis Preservation Plan.