metascope_id: Identify which genomes are represented in a sample

View source: R/MetaScope_ID.R

metascope_idR Documentation

Identify which genomes are represented in a sample

Description

This function will read in a .bam or .rds file, annotate the taxonomy and genome names, reduce the mapping ambiguity using a mixture model, and output a .csv file with the results. Currently, it assumes that the genome library/.bam files use NCBI accession names for reference names (rnames in .bam file).

Usage

metascope_id(
  input_file,
  input_type = "bam",
  aligner = "subread",
  NCBI_key = NULL,
  out_file = paste(tools::file_path_sans_ext(input_file), ".metascope_id.csv", sep =
    ""),
  EMconv = 1/10000,
  EMmaxIts = 25,
  num_species_plot = NULL
)

Arguments

input_file

The .bam or .rds file that needs to be identified.

input_type

Extension of file input. Should be either "bam" or "rds". Default is "bam".

aligner

The aligner which was used to create the bam file. Default is "subread" but can also be set to "bowtie" or "other"

NCBI_key

(character) NCBI Entrez API key. optional. See taxize::use_entrez(). Due to the high number of requests made to NCBI, this function will be less prone to errors if you obtain an NCBI key. You may enter the string as an input or set it as ENTREZ_KEY in .Renviron.

out_file

The name of the .csv output file. Defaults to the input_file basename plus ".metascope_id.csv".

EMconv

The convergence parameter of the EM algorithm. Default set at 1/10000.

EMmaxIts

The maximum number of EM iterations, regardless of whether the EMconv is below the threshhold. Default set at 50. If set at 0, the algorithm skips the EM step and summarizes the .bam file 'as is'

num_species_plot

The number of genome coverage plots to be saved. Default is NULL, which saves coverage plots for the ten most highly abundant species.

Value

This function returns a .csv file with annotated read counts to genomes with mapped reads. The function itself returns the output .csv file name.

Examples

#### Align reads to reference library and then apply metascope_id()

## Assuming filtered bam files already exist

## Subread aligned bam file

## Create object with path to filtered subread bam file
bamPath <- system.file("extdata","subread_target.filtered.bam",
package = "MetaScope")

## Run metascope id with the aligner option set to subread
metascope_id(input_file = bamPath, aligner = "subread",
             num_species_plot = 0)

## Bowtie aligned bam file

## Create object with path to filtered subread bam file
bamPath <- system.file("extdata","bowtie_target.filtered.bam",
                       package = "MetaScope")

## Run metascope id with the aligner option set to bowtie
metascope_id(input_file = bamPath, aligner = "bowtie",
             num_species_plot = 0)

## Different or unknown aligned bam file

## Create object with path to unknown origin bam file
bamPath <- system.file("extdata","subread_target.filtered.bam",
                       package = "MetaScope")

## Run metascope id with the aligner option set to other
metascope_id(input_file = bamPath, aligner = "other",
             num_species_plot = 0)


compbiomed/MetaScope documentation built on Aug. 9, 2022, 10:41 a.m.