Skip to Main Content

Finding Datasets for Secondary Analysis

Recommended Data Repositories

The following data repositories are recommended to help you begin your search for secondary use data. If you need help finding data for your research topic, the Data Informationist at Welch Medical Library may be able to help! To learn more, navigate to this website

Data repositories generally fall into two categories: open access and controlled access (sometimes referred to as restricted access).

  • Open Access: These repositories are freely available online, allowing you to download data directly from the repository, after signing a data use agreement.
  • Controlled Access: These repositories typically store sensitive data (e.g., patient information or data related to rare medical conditions). Unlike open access data, controlled access data requires approval before use. Before gaining access to the data, researchers typically must:
    • Submit an application that includes a statement of intended use
    • Provide a copy of your CV (listing your institutional credentials)
    • Complete training on ethical handling of the data

Below are some examples of repositories to get you started on your search for data that can be used for secondary analysis.

Medicine

To search for a repository that specializes in a particular subject area (e.g., clinical research, neuroscience, sequence biology), you may use the following platforms to find secondary use data.  

NIH-Supported Data Sharing Resource

If you’re looking for data within a particular subject area (e.g., clinical research, imaging, neuroscience, sequence biology, behavioral and social sciences), this resource allows you to search for repositories that store both human and non-human data.

ClinicalTrials.gov

This is a resource that allows you to search for clinical research data by condition/disease, intervention/treatment, location, study status, among many other filters. To learn how to find studies with results, please click here to learn more.

Synapse

If you’re looking for biomedical data, synapse is a repository that allows you to access or query data using its API. You may also use a traditional search for locating relevant datasets that may be used for secondary research.

Healthcare Cost and Utilization Project (HCUP)

HCUP provides access to large-scale hospital data, including (but not limited to) inpatient stays and emergency department visits. This is a controlled-access resource so researchers must be aware of the access process, detailed under the “Controlled Access” section on this page. For more details, click here

Vivli

If you're looking for patient-level data from clinical trials, this is a platform that provides access to anonymized datasets. As a controlled-access repository, it requires researchers to apply for access and pay a fee before you may access the data.

Public Health & Nursing

Generalist data repositories can be great for searching for data on public health and nursing research topics. Use the following platforms to begin your search.

Harvard Dataverse

This is an open access data repository that houses a wide range of subject areas, including (but not limited to) social sciences, arts & humanities, medicine/health/life sciences, law, business/management.

Inter-University Consortium for Political and Social Research

This repository houses both open and controlled access data across the social and behavioral sciences.

Dryad

This is a general-purpose data repository that accepts data in any format and from any research field, with a particular focus on data from the biological sciences.

Dimensions

This is an academic database, similar to others you would use to search for academic publications. The difference is that Dimensions allows you to search for datasets that are connected to publications. It also offers analytical views (e.g., visualizations of geographic trends on a research topic and research impact) that you may download.

Statista: The Statistics Portal

This is a platform that provides statistics and visualizations across topics, including (but not limited to) business, finance, and politics. The data is presented in aggregate form and does not include access to data from primary research studies.