GetGeneList: A Function to Filter and Save Genomic Features from NCBI

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/GetGeneList.R


GetGeneList allows the user to access the NCBI database for the species specified using the secure ftp site, download feature information for the genome build specified, filter and save feature information for future use. After this function, no other access to NCBI or the internet is required. This function is not limited to only genes, but can also be used for other genomic features like RNA, UTR, and others available for the species specified by the user.


GetGeneList(Species, build, featuretype = c("GENE", "PSEUDO"), 
            savefiles = FALSE, destfile)



This term designates the species to be used in the function and is dependent on the scientific name. Options: Must include in quotation marks and can either separate the genus and species by a space or an underscore (e.g., "Bos taurus" or "Bos_taurus").


This term designates the species' current genome build to use in the analysis. Options: Must include in quotation marks for the entire build name (e.g., "BUILD.6.1" or "ANNOTATION_RELEASE.104"). These functions may not be compatible with earlier versions of genome builds. To see if a current build is available for a particular species, go to:


This specifies which feature type(s) the user wants to use and collect. Default setting includes genes and pseudo genes, but user can specify more or less. Each feature type must be in quotation marks. Using multiple feature types must be included as a list (e.g., c("GENE","RNA")). Options: Can choose to include any of the following: GENE, PSEUDO, RNA, CDS, and UTR.


Default is false. This term allows you to save the original feature list downloaded from the NCBI database as a text file as well as the filtered feature list produced from the function only if set to TRUE. Options: Must be either TRUE or FALSE.


This is the pathway to the folder in which files will be saved and must be specified using quotation marks (e.g., "C:/Temp/").


In running this function, the user will be prompted to enter feedback after the file downloads to specify the primary genome build to use (if multiple builds are present) as well as the primary feature the user wants to focus on in case there is duplicate information. While waiting for the function to run, if the user presses "Enter" prematurely, this will result in the function not running correctly.

If savefiles = TRUE, then both the original file from NCBI and the filtered file the user specified will be saved in the destfile location. Once the function has run, the user can choose to either use the information at that time or call it later using the saved file. In either case, the output from the filtered file can be used with marker data to run the MapMarkers function (see separate documentation) that is also a part of this package.

The file returned contains 15 columns based on the current NCBI file structure. Those column headings and descriptions are provided below in the "Value" section.


Column headings and descriptions of the file returned to the user from the "GetGeneList" function.


Taxonomy id of the species in NCBI.


The chromosome the feature is located on in the specified species, which can include mitochondrial DNA if applicable.


The start position of the feature on the chromosome.


The stop position of the feature on the chromosome.


The orientation of the feature on the chromosome (can be + or -).


The set of overlapping DNA fragments that represent the region of DNA with the same sequence containing the feature.


The start position of the feature on the contig specified.


The stop position of the feature on the contig specified.


The orientation of the feature on the contig specified (can be + or -).


The NCBI official abbreviation of the feature name.


The feature ID on the NCBI database.


The type of feature, which can be GENE, PSEUDO, RNA, CDS, and UTR.


The designated group label on the NCBI database.


The build in which the feature information is found on.


The evidence code or information, if given.


For issues or problems with this function, please contact Lauren Hanna at [email protected]


Lauren L. Hulsman Hanna and David G. Riley


Hulsman Hanna, L. L., and D. G. Riley. 2014. Mapping genomic markers to closest feature using the R package Map2NCBI. Livest. Sci. 162:59-65.

See Also

Function: MapMarkers


#Example 1: Run the following example and, when prompted, 
#choose [1], [n], and [1] to filter the build and feature 
#information. This example is interactive and requires 
#user input. Please note that pressing "Enter" prematurely 
#can cause the function to not run properly.
## Not run: 
GeneList = GetGeneList("Bos taurus",build="BUILD.6.1",savefiles=TRUE,destfile=path.expand("~/"))

## End(Not run)

Map2NCBI documentation built on May 29, 2017, 3:46 p.m.