metascope_id | R Documentation |
This function will read in a .bam or .csv.gz file, annotate the taxonomy and genome names, reduce the mapping ambiguity using a mixture model, and output a .csv file with the results. Currently, it assumes that the genome library/.bam files use NCBI accession names for reference names (rnames in .bam file).
metascope_id(
input_file,
input_type = "csv.gz",
aligner = "bowtie2",
db = "ncbi",
db_feature_table = NULL,
NCBI_key = NULL,
out_dir = dirname(input_file),
tmp_dir = dirname(input_file),
convEM = 1/10000,
maxitsEM = 25,
update_bam = FALSE,
num_species_plot = NULL,
blast_fastas = FALSE,
num_genomes = 100,
num_reads = 50,
quiet = TRUE
)
input_file |
The .bam or .csv.gz file of sample reads to be identified. |
input_type |
Extension of file input. Should be either "bam" or "csv.gz". Default is "csv.gz". |
aligner |
The aligner which was used to create the bam file. Default is "bowtie2" but can also be set to "subread" or "other". |
db |
Currently accepts one of |
db_feature_table |
If |
NCBI_key |
(character) NCBI Entrez API key. optional. See taxize::use_entrez(). Due to the high number of requests made to NCBI, this function will be less prone to errors if you obtain an NCBI key. You may enter the string as an input or set it as ENTREZ_KEY in .Renviron. |
out_dir |
The directory to which the .csv output file will be output.
Defaults to |
tmp_dir |
Path to a directory to which bam and updated bam files can be saved. Required. |
convEM |
The convergence parameter of the EM algorithm. Default set at
|
maxitsEM |
The maximum number of EM iterations, regardless of whether
the convEM is below the threshhold. Default set at |
update_bam |
Whether to update BAM file with new read assignments.
Default is |
num_species_plot |
The number of genome coverage plots to be saved.
Default is |
blast_fastas |
Logical, whether or not to output fasta files for MetaBlast.
Default is |
num_genomes |
Number of genomes to output fasta files for MetaBlast.
Default is |
num_reads |
Number of reads per genome per fasta file for MetaBlast.
Default is |
quiet |
Turns off most messages. Default is |
This function returns a .csv file with annotated read counts to genomes with mapped reads. The function itself returns the output .csv file name. Depending on the parameters specified, can also output an updated BAM file, and fasta files for usage downstream with MetaBLAST.
#### Align reads to reference library and then apply metascope_id()
## Assuming filtered bam files already exist
## Create temporary directory
file_temp <- tempfile()
dir.create(file_temp)
#### Subread aligned bam file
## Create object with path to filtered subread csv.gz file
filt_file <- "subread_target.filtered.csv.gz"
bamPath <- system.file("extdata", filt_file, package = "MetaScope")
file.copy(bamPath, file_temp)
## Run metascope id with the aligner option set to subread
metascope_id(input_file = file.path(file_temp, filt_file),
aligner = "subread", num_species_plot = 0,
input_type = "csv.gz")
#### Bowtie 2 aligned .csv.gz file
## Create object with path to filtered bowtie2 bam file
bowtie_file <- "bowtie_target.filtered.csv.gz"
bamPath <- system.file("extdata", bowtie_file, package = "MetaScope")
file.copy(bamPath, file_temp)
## Run metascope id with the aligner option set to bowtie2
metascope_id(file.path(file_temp, bowtie_file), aligner = "bowtie2",
num_species_plot = 0, input_type = "csv.gz")
## Remove temporary directory
unlink(file_temp, recursive = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.