Skip to Main Content

Finding Gene Attributes Using NCBI Gene

This guide introduces the NCBI Gene database and provides a workflow for using it to find gene attributes.

Building Multi-Component Searches

The searching example outlined in multiple steps on the companion section of this guide (see "A Search Example in Five Steps") demonstrates an approach for building targeted, multi-component searches in Gene.

This approach involves first representing your research question as a series of major concepts. These concepts are then translated into formatted search terms and then incorporated into a search query using Gene's Advanced Search Builder.

Elements of the Advanced Search Builder include (1) a search fields list, (2) the "show index list" feature, and (3) built-in Boolean operators. These elements can improve your search efficiency in Gene by helping you incorporate multiple focusing concepts at the same time.

Six Core Tools for Searching

Tool 1: A Concepts-Based Approach

Conceiving of your research question as a set of related main ideas or concepts is a helpful way to begin the searching process. By expressing the essence of your question in your search strategy, you insure that your search results will all have these minimum essential components.

For example, consider the following research question:

What genes on human chromosome 17 are associated with the spectrum of disorders known as Charcot-Marie-Tooth disease?"

That what are desired are genes is clear. However, this fact need not be reflected in the search strategy. This desired attribute of our search results is addressed by our choice of database. By searching Gene, we are assured that our results will be genes. Regarding the other desired attributes of our results, the following concepts in our search strategy will ensure that they are represented.

  • Concept 1: Human

  • Concept 2: Chromosome 17

  • Concept 3: Charcot-Marie-Tooth disease

Tool 2: Field Tags

The generation of search concepts may to some degree be informed by a knowledge of the structure of Gene records and the searchable elements of that structure. Gene records have over 30 types of elements or fields, most of which are directly searchable. Fields in Gene records include those for gene name, base position, exon count, and an expansive category called properties.

Field tags, represented as field names in brackets, can be used in a Gene query to search these fields. Two commonly used fields tags, [Organism] and [Chromosome], directly relate to the example search concepts above. Search terms related to search concepts 1 and 2 that are formatted with field tags would be written as "Human"[Organism] and "17"[Chromosome] respectively. The concept 3 search term would use the [Disease/Phenotype] field tag and be written as "Charcot-Marie-Tooth"[Disease/Phenotype]. See the "Tool 6" section below for why the disease name is shortened here.

Field tag searching helps to avoid retrieving off-target results. For example, the use of the [Organism] field tag for the search term "mouse" avoids retrieving human genes whose records mention that they have mouse orthologs.

The Gene Help Manual provides a list of Gene field tags and their definitions.

Tool 3: Search Term Formatting
Field Tags

Applying field tags is one way of formatting search terms. Such formatting provides Gene with specific instructions on how it should handle the search terms you are using. In the case of field tags, the instructions are to search for terms only in the fields specified (e.g., "Charcot-Marie-Tooth"[Disease/Phenotype] only searches for "Charcot-Marie-Tooth" in the "phenotypes" part of the record).

Truncation

Additional search term formatting can broaden your search by searching for term variants. Using an asterisk (*) at the end of a term will find all variants of that term. This process is called truncation, because it's from the shortened root of the term that the variant generation occurs. This can both save time and incorporate relevant terms you might otherwise not have thought of.

For example, if you are interested in finding the members of the human golgin gene family whose gene names start with GOLGA and GOLGB, you could use the formatted search term GOLG*[Gene Name]. Otherwise, you would need to include each individual gene name in a long search string as follows: "GOLGA2"[Gene Name] OR "GOLGA4"[Gene Name] OR "GOLGB1"[Gene Name] OR "GOLGA3"[Gene Name] OR "GOLGA5"[Gene Name]...

Note: You should use truncation with caution. Doing a quick check of search results will let you know whether the truncation you've performed is inadvertently adding off-target terms to your search.

Quotation Marks

It's generally a good idea when using databases from NCBI to use quotation marks around search terms to ensure that they are searched as written. This is particular true for phrases.

Note: You can't use quotation marks with truncation in Gene.

Tool 4: Boolean Operators
OR

The OR Boolean operator is typically used to connect the synonyms in a search concept. Sticking with the golgin example, the following is a search string for the human golgin gene family concept: GOLG*[Gene Name] OR "GORAB"[Gene Name] OR "LOC102724117"[Gene Name].

AND

The AND operator is typically used to connect two or more concepts to find where they intersect. Suppose that the aim of the search is to find non-coding RNA (ncRNA) golgin genes. In this case, a second concept related to sequence type could be combined with the concept related to the human golgin gene family using AND as follows: (GOLG*[Gene Name] OR "GORAB"[Gene Name] OR "LOC102724117"[Gene Name]) AND ("genetype ncrna"[Properties]).

Tool 5: Gene's Advanced Search Builder

Gene's Advanced Search Builder makes it easy to build and document complex queries through the following features:

  • Spaces for constructing multi-synonym search concepts and combining them with Boolean operators in complex queries 

  • A drop-down listing of search fields that automatically attaches the appropriate field tag to your search term

  • An index list that shows possible terms

  • A downloadable search history


NCBI Gene advanced search with these features circled: search field drop-down, index list, and search history download


Tool 6: Nucleotide's Index List Feature

When unsure of the scope or spelling of a search term, you can partially type it into an advanced search box, select the appropriate field tag, and then click the "Show index list" link for an alphabetical list of options. For example, by typing "Charcot-Marie-Tooth," choosing "Disease/Phenotype" from the search field drop down, and clicking on "Show index list," you can see that relevant diseases include those called "charcot marie tooth disease" and "charcot marie tooth neuropathy." By using "Charcot-Marie-Tooth" as your search term, you will find both ways of referring to this disease (see below).


NCBI Gene advanced search showing use of the index list feature

 

For additional searching tips, see the Gene Help Manual.