parse.INSDSeq: Parse Raw Gene Database Data from XML File

View source: R/parse.INSDSeq.R

parse.INSDSeqR Documentation

Parse Raw Gene Database Data from XML File

Description

Parse raw XML file downloaded from a gene database such as NCBI Genbank, containing sequence data and its accompanying metadata, ie: NCBI accession number and collection information about the sample from which the sequence came. Parsed sequences and metadata are outputted in a matrix.

Usage

parse.INSDSeq(xml_file, do = NA, includeSeqs = F, cores = 1, parse.specimens = T, qualsToUse = c("specimen_voucher", "country", "collection_date", "lat_lon", "note", "collected_by", "isolate", "pop_variant"))

Arguments

xml_file

The raw XML file to be parsed, already read into R as an object using WRITE MORE HERE.

do

WRITE MORE HERE

includeSeqs

An optional logical value indicating whether sequences should be written into the output (TRUE) or ignored during parsing (FALSE).

cores

Optional, the number of cores to use. Multithreading is only supported for Linux and MacOS builds.

parse.specimens

An optional logical value indicating whether the specimen field associated with each sequence should be parsed and written to output.

qualsToUse

A vector of the categories of metadata to be parsed from the XML data and included in the output.

Details

The default value for do is NA. The default value for includeSeqs is FALSE. The default value for cores is 1. The default value for parse.specimens is TRUE. The default for qualsToUse is c('specimen_voucher', 'country', 'collection_date', 'lat_lon', 'note', 'collected_by', 'isolate', 'pop_variant').

Value

A matrix containing all parsed information, where each row contains the information associated with a single gene database entry. Columns are the categories of information parsed from the data.

Author(s)

Andrew Hipp and Kasey Pham

See Also

parse.specimen, WRITE MORE HERE: MULTICORE SUPPORT


andrew-hipp/morton documentation built on April 7, 2024, 12:15 p.m.