query: To get a list of sequence names from an ACNUC data base...

View source: R/query.r

queryR Documentation

To get a list of sequence names from an ACNUC data base located on the web

Description

This is a major command of the package. It executes all sequence retrievals using any selection criteria the data base allows. The sequences are coming from ACNUC data base located on the web and they are transfered by socket. The command produces the list of all sequence names that fit the required criteria. The sequence names belong to the class of sequence SeqAcnucWeb.

Usage

query(listname, query, socket = autosocket(),
invisible = TRUE, verbose = FALSE, virtual = FALSE)

Arguments

listname

The name of the list as a quoted string of chars

query

A quoted string of chars containing the request with the syntax given in the details section

socket

an object of class sockconn connecting to a remote ACNUC database (default is a socket to the last opened database).

invisible

if FALSE, the result is returned visibly.

verbose

if TRUE, verbose mode is on

virtual

if TRUE, no attempt is made to retrieve the information about all the elements of the list. In this case, the req component of the list is set to NA.

Details

The query language defines several selection criteria and operations between lists of elements matching criteria. It creates mainly lists of sequences, but also lists of species (or, more generally, taxa) and of keywords. See https://doua.prabi.fr/databases/acnuc/cfonctions.html#QUERYLANGUAGE for the last update of the description of the query language.

Selection criteria (no space before the = sign) are:

SP=taxon

seqs attached to taxon or any other below in tree; @ wildcard possible

TID=id

seqs attached to given numerical NCBI's taxon id

K=keyword

seqs attached to keyword or any other below in tree; @ wildcard possible

T=type

seqs of specified type

J=journalname

seqs published in journal specified using defined journal code

R=refcode

seqs from reference specified such as in jcode/volume/page (e.g., JMB/13/5432)

AU=name

seqs from references having specified author (only last name, no initial)

AC=accessionno

seqs attached to specified accession number

N=seqname

seqs of given name (ID or LOCUS); @ wildcard possible

Y=year

seqs published in specified year; > and < can be used instead of =

O=organelle

seqs from specified organelle named following defined code (e.g., chloroplast)

M=molecule

seqs from specified molecule as named in ID or LOCUS annotation records

ST=status

seqs from specified data class (EMBL) or review level (UniProt)

F=filename

seqs whose names are in given file, one name per line (unimplemented use clfcd instead)

FA=filename

seqs attached to accession numbers in given file, one number per line (unimplemented use clfcd instead)

FK=filename

produces the list of keywords named in given file, one keyword per line (unimplemented use clfcd instead)

FS=filename

produces the list of species named in given file, one species per line (unimplemented use clfcd instead)

listname

the named list that must have been previously constructed

Operators (always followed and preceded by blanks or parentheses) are:

AND

intersection of the 2 list operands

OR

union of the 2 list operands

NOT

complementation of the single list operand

PAR

compute the list of parent seqs of members of the single list operand

SUB

add subsequences of members of the single list operand

PS

project to species: list of species attached to member sequences of the operand list

PK

project to keywords: list of keywords attached to member sequences of the operand list

UN

unproject: list of seqs attached to members of the species or keywords list operand

SD

compute the list of species placed in the tree below the members of the species list operand

KD

compute the list of keywords placed in the tree below the members of the keywords list operand

The query language is case insensitive.Three operators (AND, OR, NOT) can be ambiguous because they can also occur within valid criterion values. Such ambiguities can be solved by encapsulating elementary selection criteria between escaped double quotes.

Value

The result is directly assigned to the object listname in the user workspace. This is an objet of class qaw, a list with the following 6 components:

call

the original call

name

the ACNUC list name

nelem

the number of elements (for instance sequences) in the ACNUC list

typelist

the type of the elements of the list. Could be SQ for a list of sequence names, KW for a list of keywords, SP for a list of species names.

req

a list of sequence names that fit the required criteria or NA when called with parameter virtual is TRUE

socket

the socket connection that was used

Note

Most of the documentation was imported from ACNUC help files written by Manolo Gouy

Author(s)

J.R. Lobry, D. Charif

References

Gouy, M., Milleret, F., Mugnier, C., Jacobzone, M., Gautier,C. (1984) ACNUC: a nucleic acid sequence data base and analysis system. Nucl. Acids Res., 12:121-127.
Gouy, M., Gautier, C., Attimonelli, M., Lanave, C., Di Paola, G. (1985) ACNUC - a portable retrieval system for nucleic acid sequence databases: logical and physical designs and usage. Comput. Appl. Biosci., 3:167-172.
Gouy, M., Gautier, C., Milleret, F. (1985) System analysis and nucleic acid sequence banks. Biochimie, 67:433-436.

citation("seqinr")

See Also

choosebank, getSequence, getName, crelistfromclientdata

Examples

 ## Not run: 
 # Need internet connection
 choosebank("genbank")
 bb <- query("bb", "sp=Borrelia burgdorferi")
 # To get the names of the 4 first sequences:
 sapply(bb$req[1:4], getName)
 # To get the 4 first sequences:
 sapply(bb$req[1:4], getSequence, as.string = TRUE)
 
## End(Not run)

seqinr documentation built on May 29, 2024, 6:36 a.m.