Skip to Main Content

NIH Data Management and Sharing Policy

This guide supports researchers' efforts to comply with the NIH Data Management and Sharing Policy. It offers step-by-step guidance on how to plan for data sharing and write a data management and sharing plan to meet NIH requirements.

NIH Guidance on Data Repositories

NIH requires that scientific data be deposited in a data repository and strongly encourages the use of established repositories that have certain core characteristics that help make data findable, accessible, interoperable, and reusable (FAIR).

In its "Selecting a Data Repository" and "Data Sharing Approaches" guidance, NIH outlines how researchers should choose a repository in the following general scenarios.

Follow the links below to the sections on this page with guidance, tools, and repository lists related to each scenario.

When NIH Specifies a Repository

NIH Guidance

NIH Funding Opportunity Announcements (FOAs) and/or the policies of NIH Institutes, Centers, and Offices (ICOs) may specify a particular repository for preserving and sharing data (e.g. "Specifically, for human data, the NICHD encourages the use of the Data and Specimen Hub (DASH), a centralized resource for researchers to store and access de-identified data from studies funded by NICHD...").

In these cases, the specified repositories should be used for depositing project data.

Tools for Connecting to Information About NIH-Specified Repositories

When You Must Choose a Specialized Repository

NIH Guidance

Sometimes NIH Funding Opportunity Announcements (FOAs) and/or the policies of NIH Institutes, Centers, and Offices (ICOs) don't specify a particular repository for preserving and sharing data. Data sharing language in these FOAs may simply refer researchers generally to the NIH Data Management and Sharing Policy (e.g. "Consistent with the NIH Policy for Data Management and Sharing, when data management and sharing is applicable to the award, recipients will be required to adhere to the Data Management and Sharing requirements as outlined in the NIH Grants Policy Statement.") 

In these cases, researchers must select a repository from among a number of possible options. The guidelines and tools below can support the selection process. You may also contact JHU Data Services for help.

Selecting a Discipline or Data-Type Specific Repository

If possible, researchers should select an established repository that houses data similar to their own, either with regards to discipline, data type, or both. Using this criterion supports effective data discovery and reuse.

For example, a researcher might select the Gene Expression Omnibus as the repository for sharing mouse gene expression data. This repository is well-established, contains 1.5 million+ mouse gene expression studies, and is experienced with housing and describing gene expression files.

Tools for Finding Discipline or Data-Type Specific Repositories
Tools Designed or Recommended by NIH
Other Tools

When You Must Choose a Generalist Repository

NIH Guidance

Sometimes NIH Funding Opportunity Announcements (FOAs) and/or the policies of NIH Institutes, Centers, and Offices (ICOs) don't specify a particular repository for preserving and sharing data. Data sharing language in these FOAs may simply refer researchers generally to the NIH Data Management and Sharing Policy (e.g. "Consistent with the NIH Policy for Data Management and Sharing, when data management and sharing is applicable to the award, recipients will be required to adhere to the Data Management and Sharing requirements as outlined in the NIH Grants Policy Statement.") 

In these cases, researchers must select a repository from among a number of possible options.

The guidelines and tools below can support the selection process. You may also contact JHU Data Services for help.

Selecting a Generalist Repository

If discipline or data-type specific repositories don't exist for the kinds of scientific data that your project will be generating, then you will need to select a generalist repository. Generalist repositories house data regardless of type, format, content, or subject matter.

The NIH Generalist Repository Ecosystem Initiative (GREI) has a webinar series that introduces generalist repositories, describes best practices for sharing data through them, and may help researchers choose between repositories.

Note that most of the generalist repositories do not provide controlled access. Data in most generalist repositories must be fully de-identified for public access.

NIH's Listing of Generalist Repositories

The following list comes from NIH's Generalist Repositories page. Some key features of the repositories are described below.

A more in-depth summary of repository features is captured in the General Repository Comparison Chart produced by the NIH Workshop on the Role of Generalist Repositories to Enhance Data Discoverability and Reuse held 11-12 February 2020.

A JHU-Specific Generalist Repository

When to Choose a Controlled-Access Repository

How Controlled-Access Works

Discipline-specific and generalist data repositories may offer controlled- or restricted-access provisions suitable for protecting human participant data. Such repositories have in place both the security and administration to deliver data only to approved requestors. This is in contrast to "Open Access" repositories which can be accessed publicly, often with no user registration. 

In controlled-access scenarios, researchers must apply for data access. Access is generally governed by a data use agreement (DUA) and additional licensing requirements, such as approval for any secondary reuse. Some repositories ask requesters to provide IRB approval from their institutions for sensitive datasets, such as those containing extensive protected health information (PHI) or direct personally identifying information (PII). Requesters may download data directly or, for more sensitive data, remotely through a secure data enclave.

Many controlled-access data repositories still require depositors to de-identify data, removing PHI/PII sufficient to meet HIPAA Safe Harbor de-identification requirements at minimum.

NIH and JHU Guidance
NIH Guidance
JHM Data Trust Guidance
JHU Data Services Guidance
  • Often the deciding factor for choosing a controlled-access repository is the anticipated feasibility of fully de-identifying data for public access.
  • Full anonymization of large or complex datasets to meet HIPAA's "expert determination" standard requires planning and budgeting. Costs may include the services of a specialized statistician or a data de-identification provider.
  • Data use agreements and policies may limit public access for certain data, such as licensed secondary data, data under waivers of consent, or restrictions on unrestricted release of data derived from medical records.
  • When preparing data for controlled access repositories, plan on de-identifying direct PHI/PII to meet HIPAA Safe Harbor criteria, or HIPAA Limited Dataset criteria if full dates and certain other PHI are allowable. De-identification to these HIPAA levels is often feasible for study teams without specialized expertise.
Examples of Controlled-Access Repositories

This non-exhaustive list of controlled-access repositories is derived from keyword searches of the NIH-Supported Scientific Data Repositories tool and the lists compiled by the journal Scientific Data.

You may consult the data submission guidelines of other repositories to determine if they have controlled-access provisions or contact JHU Data Services for assistance in locating an appropriate controlled-access repository.

Elements to Look for in a Data Repository

General Characteristics of Established Data Repositories
  • Long-term, stable data preservation
  • Indexing and searching features to make data easy to find
  • Aggregation of similar data in one place to facilitate data recombination and reuse
  • A stable location to which researchers can refer those wishing to use their data
Characteristics of Established Data Repositories Related to Controlled/Restricted Access Data

Established data repositories may also offer restricted access management features designed to safeguard human participant data, including de-identified human data. These features may include:

  • Procedures to align access to participant consent
  • Systems for communicating and enforcing data use restrictions
  • Processes for reviewing data access requests
  • Protocols for addressing terms-of-use violations

NIH's "Selecting a Data Repository" guidance provides additional important considerations when choosing a data repository.