get_GIs: Get GenInfo Identifier numbers

View source: R/downloads.R

get_GIsR Documentation

Get GenInfo Identifier numbers

Description

Retrieves NCBI sequence identifiers (GIs) for given organism name or taxon identifier.

Usage

get_GIs(
  org.name,
  db,
  n.start = 1,
  n.stop = NULL,
  step = 99999,
  return.vector = TRUE,
  check.result = FALSE,
  term = NULL,
  temp.dir = NULL,
  delete.temp = FALSE,
  verbose = TRUE
)

get_GIs_fix(
  gis.list,
  org.name,
  db,
  n.start = 1,
  n.stop = NULL,
  step = 99999,
  term = NULL,
  temp.dir = NULL,
  delete.temp = FALSE,
  verbose = TRUE
)

Arguments

org.name

character; scientific name or taxon identifier (written as "txid0000") of the organism/taxon.

db

character; NCBI database for search. See entrez_dbs() for possible values.

n.start

integer; download starting value. Default is 1.

n.stop

integer; download finishing value. Default is NULL, which provides retrieval of all available GIs.

step

integer; download increment value.

return.vector

logical; whether to return GI numbers as character vector (another variant is list of vectors).

check.result

logical; check if download was done correctly.

term

character; search query.

temp.dir

character; name and path of directory for downloaded temporary files (only for "Windows" OS)

delete.temp

logical; delete downloaded files (only for "Windows" OS, does not delete directory).

verbose

logical; show messages

gis.list

list of previously downloaded GIs vectors.

Details

This function sends the query to NCBI database and returns sequence identifiers according to the query. By default the query is organism, so the function returns GI numbers for all sequences that are associated with the requested organism. For example, if org.name = "Homo sapiens" the function will download GI numbers for all sequences that answer the query "Homo sapiens[Organism]". For any other query use parameter term.

The function downloads GI numbers by piecemeal, by several pieces in one block. The size of the block is defined by parameter step. It is useful if by any reason the download was interrupted, so later it is possible to reload only the missing blocks without the need to reload the entire amount of data. By default, all available GI numbers are downloaded, but you may also choose start and finish notes by specifying the parameters n.start and n.stop. The numeration starts with 1, not 0. At the end the resulting list of blocks (list of character vectors) is unlisted into one character vector. You may prevent this by setting return.vector = FALSE. Also, regardless of return.vector settings, the list of blocks is returned if the download was somehow compromised.

If download was corrupted you may use get_GIs_fix() function to reload the missing block. The corrupted list of blocks should be set in gis.list parameter. You may also check and reload data when get_GIs() function is running by specifying check.result = TRUE.

The function checks for user's OS type. For Windows temporal files are created while downloading, so temp.dir and delete.temp parameters should be set. This helps to solve the "routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version" problem by using curl instead of RCurl. However it slows down the function.If there is no temp.dir directory, it will be created and will not be removed (only temporal files will be deleted if delete.temp = TRUE).

In progress the functions turn off and on scientific notation.

Value

get_GIs() returns character vector of GI numbers. If return.vector = FALSE or there are missing data, list of character vectors is returned.

get_GIs_fix() returns list of character vectors.

Functions

  • get_GIs: Retrieves NCBI sequence identifiers (GIs) for given organism name or taxon identifier.

  • get_GIs_fix: Checks the downloads and tries to retrieve the compromised data.

Author(s)

Elena N. Filatova

Examples

gi.list<-get_GIs(org.name="txid9606", db="nucleotide",
                n.start=1, n.stop=3, step=1,
                return.vector = FALSE, check.result=TRUE,
                temp.dir = tempdir(),  delete.temp=TRUE)


disprose documentation built on March 19, 2022, 2:15 a.m.

Related to get_GIs in disprose...