The searching example outlined in multiple steps on the companion section of this guide (see "A Search Example in Five Steps") demonstrates an approach for building targeted, multi-component searches in dbSNP.
This approach involves first representing your research question as a series of major concepts. These concepts are then translated into formatted search terms and then incorporated into a search query using dbSNP's Advanced Search Builder.
Elements of the Advanced Search Builder include (1) a search fields list, (2) the "show index list" feature, and (3) built-in Boolean operators. These elements can improve your search efficiency in dbSNP by helping you incorporate multiple focusing concepts at the same time.
Conceiving of your research question as a set of related main ideas or concepts is a helpful way to begin the searching process. By expressing the essence of your question in your search strategy, you insure that your search results will all have these minimum essential components.
For example, consider the following research question:
What single nucleotide variants from a set of cell cycle pathway genes linked to 5-fluorouracil‑based chemotherapy resistance meet the following criteria?
- Have an allele frequency of at least 0.05
- Are 3 prime UTR, 5 prime UTR, or missense variants
- Are based on 1000 Genomes or gnomAD data"
That what are desired are human genomic variants is clear. This desired attribute of our search results is addressed by our choice of database. By searching dbSNP, we are assured that our results will be human variants. However, since dbSNP includes variants that may involve more than one nucleotide, such as small-scale insertions and deletions, we need to specify single nucleotide variants (snvs) in our search.
The following concepts in our search strategy will ensure that snvs and other critical elements are represented.
Concept 1: Single Nucleotide Variants
Concept 2: Cell Cycle Pathway Genes Linked to 5-Fluorouracil‑Based Chemotherapy Resistance (Derived from Experimental Data)
Concept 3: Allele Frequency
Concept 4: Variant Type
Concept 5: Source of Variant Data
Note: This research question is taken from Askari, M., Mirzaei, E., Navapour, L. et al. Integrative bioinformatics analysis: Unraveling variant signatures and single-nucleotide polymorphism markers associated with 5-FU-based chemotherapy resistance in colorectal cancer patients. J Gastrointest Canc 55, 1607–1619 (2024). https://doi.org/10.1007/s12029-024-01102-x
The generation of search concepts may to some degree be informed by a knowledge of the structure of dbSNP records and the searchable elements of that structure. dbSNP records have over 25 types of elements or fields, most of which are directly searchable. Fields in dbSNP records include those for reference SNP ID (i.e., rs number), base position, gene name, and clinical significance.
Field tags, represented as field names in brackets, can be used in a dbSNP query to search these fields. Two commonly used fields tags, [SNP Class] and [Function Class], directly relate to the example search concepts above. Search terms related to search concepts 1 and 4 that are formatted with field tags would be written as:
Field tag searching helps to avoid retrieving off-target results or not finding any results at all. For example, the use of the [GMAF] field tag for the search term 0.05 (i.e., 0.05[GMAF]) retrieves variants with a global minor allele frequency of 0.05 (11,232 results in dbSNP on 10/31/2024). However, using the term without the field tag (i.e., 0.05) finds no results.
dbSNP provides a list of field names, field tags, descriptions, and search examples.
Applying field tags is one way of formatting search terms. Such formatting provides dbSNP with specific instructions on how it should handle the search terms you are using. In the case of field tags, the instructions are to search for terms only in the fields specified (e.g., HDAC1[Gene Name] only searches for HDAC1 in the gene name part of the record).
Additional search term formatting can broaden your search by searching for term variants. Using an asterisk (*) at the end of a term will find all variants of that term. This process is called truncation, because it's from the shortened root of the term that the variant generation occurs. This can both save time and incorporate relevant terms you might otherwise not have thought of.
For example, if you want to find variants for both the human cyclin E1 and the cyclin E2 genes, you can use the single term CCNE*[Gene Name] instead of the two terms (CCNE1[Gene Name] OR CCNE2[Gene Name]).
Note: You should use truncation with caution. Doing a quick check of search results will let you know whether the truncation you've performed is inadvertently adding off-target terms to your search.
Note: You can't use quotation marks with truncation in dbSNP.
The OR Boolean operator is typically used to connect the similar elements in a search concept. For example the cell cycle pathway gene concept has twenty-three gene names. The first five of these gene names are joined by OR in a search string as follows:
The AND operator is typically used to connect two or more concepts to find where they intersect. To find only single nucleotide variants for the first five cell cycle genes mentioned above, you would combine the single nucleotide variant concept with the cell cycle gene concept using AND as follows:
dbSNP's Advanced Search Builder makes it easy to build and document complex queries through the following features:
Spaces for constructing multi-synonym search concepts and combining them with Boolean operators in complex queries
A drop-down listing of search fields that automatically attaches the appropriate field tag to your search term
An index list that shows possible terms
A downloadable search history
When unsure of the scope or spelling of a search term, you can partially type it into an advanced search box, select the appropriate field tag, and then click the "Show index list" link for an alphabetical list of options. For example, by typing "path," choosing "Clinical Significance" from the search field drop down, and clicking on "Show index list," you get multiple entries, starting with "pathogenic" and "pathogenic likely pathogenic."
You can also use the index list feature to check for available entry types for a search field. For example, the index list for the "Function Class" field shows many options for variant types (e.g., "2kb upstream variant," "3 prime utr variant," "5 primer utr variant," "500b downstream variant," "coding sequence variant," etc.) (see below).