View source: R/parse.INSDSeq.R
| parse.INSDSeq | R Documentation |
Parse raw XML file downloaded from a gene database such as NCBI Genbank, containing sequence data and its accompanying metadata, ie: NCBI accession number and collection information about the sample from which the sequence came. Parsed sequences and metadata are outputted in a matrix.
parse.INSDSeq(xml_file, do = NA, includeSeqs = F, cores = 1, parse.specimens = T, qualsToUse = c("specimen_voucher", "country", "collection_date", "lat_lon", "note", "collected_by", "isolate", "pop_variant"))
xml_file |
The raw XML file to be parsed, already read into R as an object using WRITE MORE HERE. |
do |
WRITE MORE HERE |
includeSeqs |
An optional logical value indicating whether sequences should be written into the output (TRUE) or ignored during parsing (FALSE). |
cores |
Optional, the number of cores to use. Multithreading is only supported for Linux and MacOS builds. |
parse.specimens |
An optional logical value indicating whether the specimen field associated with each sequence should be parsed and written to output. |
qualsToUse |
A vector of the categories of metadata to be parsed from the XML data and included in the output. |
The default value for do is NA.
The default value for includeSeqs is FALSE.
The default value for cores is 1.
The default value for parse.specimens is TRUE.
The default for qualsToUse is c('specimen_voucher', 'country', 'collection_date', 'lat_lon', 'note', 'collected_by', 'isolate', 'pop_variant').
A matrix containing all parsed information, where each row contains the information associated with a single gene database entry. Columns are the categories of information parsed from the data.
Andrew Hipp and Kasey Pham
parse.specimen, WRITE MORE HERE: MULTICORE SUPPORT
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.