assemble_gene_set: Assemble a list of genes from annotated genbank files

assemble_gene_setR Documentation

Assemble a list of genes from annotated genbank files

Description

Sometimes genes may be repeated within a genome, such as chloroplast genes in the inverted repeat. If 'drop_dups' is set to TRUE, these will be excluded from the results (with a warning). This is useful for assembling chloroplast gene matrices of single-copy genes.

Usage

assemble_gene_set(accessions, genes, parallel = FALSE,
  drop_dups = TRUE)

Arguments

accessions

Vector of genbank accession numbers of (partial) genomes including the genes of interest

genes

Vector of gene names to assemble

parallel

Logical; should future_map be used to fetch genes in parallel?

drop_dups

Logical; should genes with duplicate copies be excluded from the results?

Details

When running in parallel ('parallel' option is set to TRUE), it may be necessary to set the parallel backend first using plan, or the code will still run sequentially.

Value

List. Each item in the list is a gene, which contains a list of sequences of class DNAbin.

Examples

## Not run: 
# KP136830 is the GenBank accession no. for the Cystopteris protrusa plastome
# https://www.ncbi.nlm.nih.gov/nuccore/KP136830

# KP136830 is the GenBank accession no. for the Diplazium striatum plastome
# https://www.ncbi.nlm.nih.gov/nuccore/KY427346

# Assemble a list of DNA sequences for three genes from these two species.
# Note that psbA is duplicated since it is in the Inverted Repeat.

assemble_gene_set(
  c("KP136830", "KY427346"),
  c("accD", "atpA", "psbA", "not_a_proper_gene_name")
  )

assemble_gene_set(
  c("KP136830", "KY427346"),
  c("accD", "atpA", "psbA", "not_a_proper_gene_name"),
  drop_dups = FALSE
  )

## End(Not run)


joelnitta/gbfetch documentation built on March 2, 2024, 7:03 p.m.