NIH requires that scientific data be deposited in a data repository and strongly encourages the use of established repositories that have certain core characteristics that help make data findable, accessible, interoperable, and reusable (FAIR).
In its "Selecting a Data Repository" and "Data Sharing Approaches" guidance, NIH outlines how researchers should choose a repository in the following general scenarios.
Follow the links below to the sections on this page with guidance, tools, and repository lists related to each scenario.
NIH Funding Opportunity Announcements (FOAs) and/or the policies of NIH Institutes, Centers, and Offices (ICOs) may specify a particular repository for preserving and sharing data (e.g. "Specifically, for human data, the NICHD encourages the use of the Data and Specimen Hub (DASH), a centralized resource for researchers to store and access de-identified data from studies funded by NICHD...").
In these cases, the specified repositories should be used for depositing project data.
Sometimes NIH Funding Opportunity Announcements (FOAs) and/or the policies of NIH Institutes, Centers, and Offices (ICOs) don't specify a particular repository for preserving and sharing data. Data sharing language in these FOAs may simply refer researchers generally to the NIH Data Management and Sharing Policy (e.g. "Consistent with the NIH Policy for Data Management and Sharing, when data management and sharing is applicable to the award, recipients will be required to adhere to the Data Management and Sharing requirements as outlined in the NIH Grants Policy Statement.")
In these cases, researchers must select a repository from among a number of possible options. The guidelines and tools below can support the selection process. You may also contact JHU Data Services for help.
If possible, researchers should select an established repository that houses data similar to their own, either with regards to discipline, data type, or both. Using this criterion supports effective data discovery and reuse.
For example, a researcher might select the Gene Expression Omnibus as the repository for sharing mouse gene expression data. This repository is well-established, contains 1.5 million+ mouse gene expression studies, and is experienced with housing and describing gene expression files.
Sometimes NIH Funding Opportunity Announcements (FOAs) and/or the policies of NIH Institutes, Centers, and Offices (ICOs) don't specify a particular repository for preserving and sharing data. Data sharing language in these FOAs may simply refer researchers generally to the NIH Data Management and Sharing Policy (e.g. "Consistent with the NIH Policy for Data Management and Sharing, when data management and sharing is applicable to the award, recipients will be required to adhere to the Data Management and Sharing requirements as outlined in the NIH Grants Policy Statement.")
In these cases, researchers must select a repository from among a number of possible options.
The guidelines and tools below can support the selection process. You may also contact JHU Data Services for help.
If discipline or data-type specific repositories don't exist for the kinds of scientific data that your project will be generating, then you will need to select a generalist repository. Generalist repositories house data regardless of type, format, content, or subject matter.
The NIH Generalist Repository Ecosystem Initiative (GREI) has a webinar series that introduces generalist repositories, describes best practices for sharing data through them, and may help researchers choose between repositories.
Note that most of the generalist repositories do not provide controlled access. Data in most generalist repositories must be fully de-identified for public access.
The following list comes from NIH's Generalist Repositories page. Some key features of the repositories are described below.
A more in-depth summary of repository features is captured in the General Repository Comparison Chart produced by the NIH Workshop on the Role of Generalist Repositories to Enhance Data Discoverability and Reuse held 11-12 February 2020.
Discipline-specific and generalist data repositories may offer controlled- or restricted-access provisions suitable for protecting human participant data. Such repositories have in place both the security and administration to deliver data only to approved requestors. This is in contrast to "Open Access" repositories which can be accessed publicly, often with no user registration.
In controlled-access scenarios, researchers must apply for data access. Access is generally governed by a data use agreement (DUA) and additional licensing requirements, such as approval for any secondary reuse. Some repositories ask requesters to provide IRB approval from their institutions for sensitive datasets, such as those containing extensive protected health information (PHI) or direct personally identifying information (PII). Requesters may download data directly or, for more sensitive data, remotely through a secure data enclave.
Many controlled-access data repositories still require depositors to de-identify data, removing PHI/PII sufficient to meet HIPAA Safe Harbor de-identification requirements at minimum.
This non-exhaustive list of controlled-access repositories is derived from keyword searches of the NIH-Supported Scientific Data Repositories tool and the lists compiled by the journal Scientific Data.
You may consult the data submission guidelines of other repositories to determine if they have controlled-access provisions or contact JHU Data Services for assistance in locating an appropriate controlled-access repository.
Established data repositories may also offer restricted access management features designed to safeguard human participant data, including de-identified human data. These features may include:
NIH's "Selecting a Data Repository" guidance provides additional important considerations when choosing a data repository.