detect: Detect a reference sequence
In carlopacioni/amplicR: An R package to process amplicon data

detect

R Documentation

Detect a reference sequence

Description

detect takes in either the output of data.proc, or load it up from a .rda file, and compare the sequences with a reference sequence reporting the number of mismatch using srdistance from the package ShortRead.

Usage

detect(data = NULL, rda.in = NULL, dir.out = NULL, ref_seqs)

Arguments

`data`	The output from `data.proc`
`rda.in`	The fully qualified (i.e. including the path) name of the .rda file where the output from `data.proc` is saved
`dir.out`	The path where to save the results. If NULL and data is also NULL, the directory where the .rda file is located is used. If no file path is provided, an interactive windows is used to select the folder
`ref_seqs`	A named character vector with the reference sequence(s)

Details

The output from data.proc can be passed with data. If no data is passed to detect, then it will load the .rda file passed with rda.in. If rda.in=NULL, then an interactive window will open to select the location of the file.

If both dir.out=NULL and rda.in=NULL, then the path where to save the results will be asked with an interactive window. If the .rda file path is provided, then the folder where .rda is located will be selected as output folder.

A summary of the number of sequences found and the minimum number of mismatch within each sample is returned as matrix, for each reference sequence, with the same layout as the sequence table. There will be as many tables as the length of the character vector passed with ref_seqs. These results are also written to disk as text files along with the alignments of the sequences provided with the reference sequence (in the folder "Final_alns"). The alignments are built using PairwiseAlignments from the package Biostrings.

Lastly a detect_table is returned (and written to disk) where each row is a sequence with the number of mismatch with each reference sequence (columns). The first column ("nSeq_tot") is the total number of reads for each sequence.

Value

A list with four elements:

$detect_results A list with a result table, for each reference sequence, with the minimum mismatch count
$alns A list with an alignment, for each reference sequence as elements, of the sequences in the sequence table with the reference sequence
$detect_table A data.table with the sequence IDs as rows and a column with total sequence abundance. All the other columns are reference sequences. Values are the minimum number of differences with the reference sequence
$call: The function call

These results are also written to text files

Examples

# Select the directory where the example data are stored
example.data <- system.file("extdata", "HTJ", package="amplicR")
# Select a temporary directory where to store the outputs
out <- tempdir()
# Process raw data
HTJ.test <- data.proc(example.data, out, bp=140)
# Referece Mycobacteriumavium subspecies paratuberculosis sequence 
HTJ <- "CTGCGCGCCGGCGATGACATCGCAGTCGAGCTGCGCATCCTGACCAGCCGACGTTCCGATCTGGTGGCTGATCGGACCCGGGCGATCGAACCGAATGCGCGCCCAGCTGCTGGAATACTTTCGGCGCTGGAACGCGCCTT"
# Naming the reference sequence
names(HTJ) <- "HTJ" 

# Use 'detect' to verify the presence of Mycobacteriumavium subspecies 
# paratuberculosis
det <- detect(HTJ.test, dir.out=out, ref_seqs=HTJ)

# Clean up the temp directory
unlink(out, recursive=TRUE)

carlopacioni/amplicR documentation built on Aug. 19, 2023, 7:59 p.m.