Skip to Main Content

Finding Variants Using dbSNP

This guide introduces the dbSNP database from NCBI and provides a workflow for using it to find variants.

Step 1: Identify Core Concepts

Before running a search in dbSNP, you want to think carefully about your information need, since this can impact the effectiveness of your search. You may even want to write your information need down, both to keep it front of mind and to use it for building your search strategy.

For example, by writing down 

I want to look for variants within human genes in the SUMO pathway. These variants must have functional consequences (i.e., change an amino acid in the protein coded by a gene),"

you can clearly see two main ideas or concepts (indicated in bold) that you should include in your strategy for searching dbSNP. Again, it may be helpful to write these concepts down, both to help you optimize your current search and to record what you've done for future reference.

The concepts for this information need can be written as

  • Concept 1: Human Genes in the SUMO Pathway

  • Concept 2: Variants with Functional Consequences

The next step in the searching process is to generate search terms corresponding to these concepts.

Step 2: Generate Search Terms

Human Genes in the SUMO Pathway

The resources PubChem and HGNC can be used to generate search terms for the human SUMO pathway genes.

PubChem and the SUMO Pathway

While perhaps best known as a clearinghouse of knowledge about chemicals, PubChem is also a trusted resource for finding information on biological pathways. PubChem aggregates data from multiple pathway sources, including BioCyc and Reactome. A number of SUMO pathway records are found in PubChem. These records include descriptions of the genes involved. 

HGNC and Approved Symbols for Human Genes

When generating search terms for gene names in dbSNP, a good practice is to use approved symbols. Approved symbols represent the most up-to-date ways of expressing gene names, so they help ensure that your search will be both comprehensive and efficient. Approved symbols are also shorter and less complex (e.g., SAE1) than corresponding approved gene names (e.g., SUMO1 activating enzyme subunit 1).

The gene symbols found in PubChem can be confirmed as being approved symbols by using HGNC, a resource for approved human gene nomenclature developed by the HUGO Gene Nomenclature Committee.

Variants with Functional Consequences

Like gene names, elements of biological sequence, including variant types, can be described in a standardized way. The standardized approach for describing sequence features and attributes, called sequence ontology, includes uniformly-applied terminology and logically-organized relationships.

You can use the Sequence Ontology Browser to find universally-accepted terms for sequence features like variants. You can run keyword searches in the browser and/or navigate through its hierarchically arranged structure to find individual sequence features or categories of sequence features. For example, searching with "start" will lead you to the "start_lost" variant record. Alternatively, you can find other variants with functional consequences by navigating the browser's hierarchical structure to the "protein_altering_variant" category.  

dbSNP uses sequence ontology terms in its description of variants with functional consequences, so using these terms can make your search more focused and efficient.

Once you have identified the correct term for variants with potential consequences for the function of translated proteins, you will want to use the "Function Class" field tag in dbSNP (see "Step 3: Use Field Tags" below).

Step 3: Use Field Tags

Field tags can now be applied to the search terms generated for each concept as a way to build an efficient and focused search. See the table below for the search terms, their corresponding field tags, and the corresponding search string for each concept.

Concept

Search Terms

Field Tags

Concept Search String

Human Genes in the SUMO Pathway

SAE1 OR UBA2 OR SUMO1 OR SUMO2 OR SUMO3 OR SENP1 OR SENP2 OR SENP3 OR ...

[Gene Name]

SAE1[Gene Name] OR UBA2[Gene Name] OR SUMO1[Gene Name] OR SUMO2[Gene Name] OR ...

Variants with Functional Consequences

"coding sequence variant" OR "frameshift variant" OR "inframe deletion" OR "inframe indel" OR "inframe insertion" OR "initiator codon variant" OR ...

[Function Class]

"coding sequence variant"[Function Class] OR "frameshift variant"[Function Class] OR "inframe deletion"[Function Class] OR ...

Note: An efficient way to incorporate desired functionally-relevant variants is to use the "Show index list" feature and select multiple variant types by holding down the "Ctrl" key while clicking. For more about using the "Show index list" feature in this guide, see the "Tool 6: dbSNP's Index List Feature" section of the "Six Core Tools for Searching" box.

Note: For more about field tags in this guide, see the "Tool 2: Field Tags" section of the "Six Core Tools for Searching" box.

Step 4: Use the Advanced Search Builder

Now that you have the proper search strings for each of your concepts, you can enter them in dbSNP's Advanced Search Builder.

A good practice is to run the searches for each of your concepts one-at-a-time by using the "Add to history" feature. As illustrated in the screenshot below, this enables you to check for errors. Low numbers of results may indicate a misspelling or other error. As expected, we retrieve large numbers of results for the 20 genes in our human SUMO pathway concept (search #1) and many more for our variants with functional consequences concept (search #2).

By using the "Add" feature, we can combine the individual search strings (searches #1 and #2) to determine where they intersect using the AND Boolean operator. This is our final result set (search #3). Clicking on the "Items found" number (13,143) will show us the details of these results and allow us to work with them.  


Screenshot of dbSNP's advanced search with the following highlighted: the "Add to history" feature, the "Add" feature, and the "Items found" link


Note: When pasting your search strings into the search builder, you can leave the default on "All Fields," since you are using field tags on your search terms. Also note that when using the "Add" feature, the Boolean AND is the default for combining search concepts.

Note: For more about the Advanced Search Builder in this guide, see the "Tool 5: dbSNP's Advanced Search Builder" section of the "Six Core Tools for Searching" box.

Step 5: Work with Search Results

After generating a set of results in dbSNP, you can work with individual records or with the result set as a whole.

Working with Individual Records

Opening an individual record from your set of search results allows you to explore the details of a particular variant, including alleles, allele frequency, clinical significance, and associated publications. These options are highlighted in the following screenshot.


Screenshot of an individual dbSNP record with the following highlighted: alleles, frequency, clinical significance, and publications


Working with Result Sets as a Whole

In addition to working with individual search records, you can work with all of your search results at once by exporting them as a single file. In the screenshot shown below, the variant details for all 13,143 search results are being downloaded as a file in XML format by using dbSNP's "Send to" feature.

This XML file can be imported into Excel for further analysis.


Screenshot of the dbSNP results page with the "Send to" feature highlighted and "File" and "XML" selected 

Welch Medical Library Support

If you have questions about searching dbSNP, other NCBI databases, EMBL-EBI databases, or web-based bioinformatics databases from other sources, please contact:

Rob Wright, Basic Science Informationist, rwrigh32@jhmi.edu.