Before running a search in Gene, you want to think carefully about your information need, since this can impact the effectiveness of your search. You may even want to write your information need down, both to keep it front of mind and to use it for building your search strategy.
For example, by writing down
I have a list of differentially expressed genes from the rat and I need to know which of them have RNA-binding functions,"
you can clearly see a number of main ideas or concepts (indicated in bold) that you should include in your strategy for searching Gene. Again, it may be helpful to write these concepts down, both to help you optimize your current search and to record what you've done for future reference.
The concepts for this information need can be written as
Concept 1: Differentially expressed genes
Concept 2: Rat
Concept 3: RNA-binding
The next step in the searching process is to generate search terms corresponding to these concepts.
When generating search terms for gene names in Gene, a good practice is to use approved symbols. Approved symbols represent the most up-to-date ways of expressing gene names, so they help ensure that your search will be both comprehensive and efficient. Approved symbols are also shorter and less complex (e.g., Aacs) than corresponding approved gene names (e.g., acetoacetyl-CoA synthetase).
Approved gene symbols can be found by searching internationally-recognized resources for organism-focused genetic, genomic, phenotypic, and other biological data. The following table includes a sample of these resources.
Organism |
Approved Gene Symbol Resource |
---|---|
Drosophila |
|
Human |
|
Mouse |
|
Rat |
|
Yeast |
|
Zebrafish |
Since the organism of interest in this search is the rat, the Rat Genome Database (RGD) is an excellent source for approved gene symbols. In this case, the list of differentially expressed genes comes from an RNA sequencing (RNA-seq) experiment. RGD can be used to obtain gene symbols from gene names or to verify that the gene symbols generated by the sequencing technology are the approved gene symbols.
Rat genes can be retrieved using a filter on the search results page or they can be retrieved by applying the "organism" field tag (see "Step 3: Use Field Tags" below).
Like gene names, elements of gene function can be described using standardized terminology. This terminology, called gene ontology, describes the roles genes play in molecular functions and biological processes and the cellular components in which these roles play out. RNA-binding is an example of both a molecular function and a gene ontology term.
The Gene database uses gene ontology terms in its description of genes, so using these terms can make your search more focused and efficient. Just like with gene names, it's important to use the correct gene ontology term for the molecular function, biological process, or cellular component you are interested in.
You can use any of the approved gene symbol resources in the table above to find gene ontology terms. You can also find terms using the AmiGO 2 ontology search from the Gene Ontology resource.
Once you have identified the correct gene ontology term, you will want to use the "Gene Ontology" field tag (see "Step 3: Use Field Tags" below).
Field tags can now be applied to the search terms generated for each concept as a way to build an efficient and focused search. See the table below for the search terms, their corresponding field tags, and the corresponding search string for each concept.
Concept |
Search Terms |
Field Tags |
Concept Search String |
---|---|---|---|
Differentially expressed genes |
"Aacs" OR "Abcd4" OR "Acaca" OR "Acly" OR "Acsl1" OR "Adgre5" OR "Adk" OR "Ak2" OR...
|
[Gene Name] |
"Aacs"[Gene Name] OR "Abcd4"[Gene Name] OR "Acaca"[Gene Name] OR "Acly"[Gene Name] OR...
|
Rat |
"Rat" |
[Organism] |
"Rat"[Organism] |
RNA-binding |
"RNA-binding" |
[Gene Ontology] |
"RNA-binding"[Gene Ontology] |
Note: The organism field tag works with both common and scientific names.
Note: For more about field tags in this guide, see the "Tool 2: Field Tags" section of the "Six Core Tools for Searching" box.
Now that you have the proper search strings for each of your concepts, you can enter them in Gene's Advanced Search Builder.
A good practice is to run the searches for each of your concepts one-at-a-time by using the "Add to history" feature. As illustrated in the screenshot below, this enables you to check for errors. Low numbers of results may indicate a misspelling or other error. As expected, we retrieve very large numbers of results for our "rat" and "RNA-binding" concepts (searches #2 and #3) and many fewer for our differentially expressed genes concept (search #1).
By using the "Add" feature, we can combine the individual search strings (searches #1-3) to determine where they intersect using the AND Boolean operator. This is our final result set (search #4). Clicking on the "Items found" number (1) will show us the details of this result.
Note: When pasting your search strings into the search builder, you can leave the default on "All Fields," since you are using field tags on your search terms. Also note that when using the "Add" feature, the Boolean AND is the default for combining search concepts.
Note: For more about the Advanced Search Builder in this guide, see the "Tool 5: Gene's Advanced Search Builder" section of the "Six Core Tools for Searching" box.
After generating a set of results in Gene, you can work with individual records or with the result set as a whole.
Opening an individual record from your set of search results allows you to view gene ontology details, as well as an array of other gene attributes, including genomic context, expression data, associated biological pathways, and sequences. You can quickly navigate to these attributes using the table of contents sidebar. There is also the option to download associated datasets. See the screenshot below.
In addition to working with individual records from your search, you can work with all of your search results at once by exporting them as a single file. In the screenshot shown below, the records of 40 search results are being downloaded as a file in XML format by using Gene's "Send to" feature.
If you have questions about searching Gene, other NCBI databases, EMBL-EBI databases, or web-based bioinformatics databases from other sources, please contact:
Rob Wright, Basic Science Informationist, rwrigh32@jhmi.edu.