mineSeq: Read and download DNA sequences from GenBank

View source: R/mineSeq.R

mineSeqR Documentation

Read and download DNA sequences from GenBank

Description

An ape-based function to connect with the GenBank database, read nucleotide sequences using accession numbers, and write them in a fasta format file.

Usage

mineSeq(inputdf = NULL,
        gb.colnames = NULL,
        as.character = FALSE,
        verbose = TRUE,
        save = TRUE,
        dir = "RESULTS_mineSeq",
        filename = "GenBanK_seqs")

Arguments

inputdf

A dataframe object containing the taxon names in a 'Species' column, the voucher information in 'Voucher' column, and the GenBank accessions for each genes in separate columns named by the corresponding gene. If the columns 'Species' and 'Voucher' are not provided in the dataframe, then the function with consider the taxonomy of the retrieved sequences as originally available in GenBank.

gb.colnames

A vector with column names within the inputdf dataframe corresponding to each gene, where the GenBank accession numbers are listed.

as.character

a logical controlling whether to return the sequences as an object of class "DNAbin" (the default).

verbose

Logical, if FALSE, a message showing each step during the GenBank search will not be printed in the console in full.

save

Logical, if TRUE, the edited tree will be saved on disk.

dir

Pathway to the computer's directory, where the mined DNA sequences in a fasta format file will be saved provided that the argument save is set up in TRUE. The default is to create a directory named RESULTS_mineSeq and the sequences will be saved within a subfolder named after the current date.

filename

Name of the output file to be saved. The default is to create a file entitled GenBanK_seqs.

Value

A list of DNA sequences made of vectors of class 'DNAbin', or of single characters (if as.character = TRUE) with two attributes (species and description).

Author(s)

Domingos Cardoso

Examples

## Not run: 
library(catGenes)

data(GenBank_accessions)

mineSeq(inputdf = GenBank_accessions,
        gb.colnames = c("ETS", "ITS", "matK", "petBpetD", "trnTF", "Xdh"),
        as.character = FALSE,
        verbose = TRUE,
        save = TRUE,
        dir = "RESULTS_mineSeq",
        filename = "GenBanK_seqs")

## End(Not run)


domingoscardoso/catGenes documentation built on March 14, 2024, 9:21 p.m.