Skip to Main Content

Finding Datasets for Secondary Analysis

About This Guide

Welcome to the Finding Secondary Datasets for Research Guide.

Secondary data analysis is analysis of existing data collected where its original purpose may not be for research. Its advantage is economy, breadth of data -- a lot of which is available for public use -- which may not be available to small research teams, particularly student researchers.

A secondary dataset should be examined carefully, however, to confirm that the data defined and coded allows for the desired analysis. Also, you need to appraise the quality of data carefully by reading documentation of data collection procedures, missing data, code book.


If you need to find a statistics such as total number of hospital bed in the United States in 2005-2015, consult the Finding Health Statistics guide.

First Steps for Finding Datasets

Ask Important Questions

  • What was the original purpose for which the data was collected?
  • What kind of data is it, and how was it collected?
  • What cleaning and/or recording procedures have been applied to the data?
Define the question

Example:

  • How does the experience of racism affect an individual's health?
Specify the population

Examples:

  • Children, adults, or all ages
  • National or smaller areas
  • Race or ethnicity
  • Recency of data or range of data
Specify the variables to include in analysis

Examples:

  • race
  • age
  • gender
  • income
  • education level
Specify the type of data most appropriate to answer the question

Examples:

  • national survey
  • examination of claims
  • phone interviews
Create a list of datasets that include the information needed This is an iterative process. You may have to revise your research question based on available data.