Unless otherwise noted, these guides are licensed under a Creative Commons Attribution-NonCommercial 2.0 Generic License attributable to the Welch Medical Library, Johns Hopkins University. Image and Icon Attributions: Icons8, licensed under CC BY-ND 4.0; WebFont Medical Icons Project, designed by Hablamos Juntos; Font Awesome 4.7
This guide is designed to help you find both statistics and datasets. Some of the resources provided are limited to the Johns Hopkins domain and will require you to log in with your JHED ID and password in order to use them; however, many data sources that are available from government agencies, think tanks, non-profits, etc. are open access. This guide is not exhaustive, but designed to provide you with a solid starting point on possible data sources for health and medical research.
A dataset is data that are presented in a manipulable format for analysis. The dataset could be in a table, text file, set of images, and spread across multiple files. Raw data are often synonymous with datasets though "raw" typically signifies that substantial clean-up may need to be done prior to analysis. Some datasets do come readily available in analytic formats or formats easily migrated into analytic tools such as R, Stata, Tableau, and others.
Statistics are generated from data. They are presented in an aggregate form already, such as a a table of frequencies, means, rates, or visualized as a chart or graph. Statistics are readily available pieces of information, from which a conclusion may be drawn. Below is a graph from the Big Cities Health Coalition on children's blood lead levels in 2013.