View source: R/parse.INSDSeq.R
parse.INSDSeq | R Documentation |
Parse raw XML file downloaded from a gene database such as NCBI Genbank, containing sequence data and its accompanying metadata, ie: NCBI accession number and collection information about the sample from which the sequence came. Parsed sequences and metadata are outputted in a matrix.
parse.INSDSeq(xml_file, do = NA, includeSeqs = F, cores = 1, parse.specimens = T, qualsToUse = c("specimen_voucher", "country", "collection_date", "lat_lon", "note", "collected_by", "isolate", "pop_variant"))
xml_file |
The raw XML file to be parsed, already read into R as an object using WRITE MORE HERE. |
do |
WRITE MORE HERE |
includeSeqs |
An optional logical value indicating whether sequences should be written into the output (TRUE) or ignored during parsing (FALSE). |
cores |
Optional, the number of cores to use. Multithreading is only supported for Linux and MacOS builds. |
parse.specimens |
An optional logical value indicating whether the specimen field associated with each sequence should be parsed and written to output. |
qualsToUse |
A vector of the categories of metadata to be parsed from the XML data and included in the output. |
The default value for do
is NA
.
The default value for includeSeqs
is FALSE
.
The default value for cores
is 1.
The default value for parse.specimens
is TRUE
.
The default for qualsToUse
is c('specimen_voucher', 'country', 'collection_date', 'lat_lon', 'note', 'collected_by', 'isolate', 'pop_variant')
.
A matrix containing all parsed information, where each row contains the information associated with a single gene database entry. Columns are the categories of information parsed from the data.
Andrew Hipp and Kasey Pham
parse.specimen
, WRITE MORE HERE: MULTICORE SUPPORT
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.