RMSobject | R Documentation |
Constructs an RMS object with information about a set of genomes.
RMSobject(
genome.tbl,
frg.dir,
vsearch.exe = "vsearch",
identity = 0.99,
min.length = 30,
max.length = 500,
verbose = TRUE,
threads = 1,
tmp.dir = "tmp"
)
genome.tbl |
A table (data.frame or tibble) with genome information, see below. |
frg.dir |
Path to folder with fragment fasta files. |
vsearch.exe |
Text with the VSEARCH executable command. |
identity |
The sequence identity for clustering fragments (0.0-1.0). |
min.length |
Minimum fragment length (integer). |
max.length |
Maximum fragment length (integer). |
verbose |
Turn on/off output text during processing (logical). |
threads |
Number of threads to be used by |
tmp.dir |
Name of folder for temporary output, will be created if not already existing. |
The genome.tbl
has a row for each genome to include in the RMS database.
There must be a column named genome_file
, containing fasta filenames. These must be the
names of the fasta files containing the RMS fragments from each genome. Use getRMSfragments
to create these fasta files, ensuring the fasta headers follow the pattern
<genome.ID>_RMSx, where <genome.ID> is some text unique to each genome and x is some integer.
The genome.tbl
may contain other columns as well, but genome_file
is required.
The vsearch.exe
is the exact command to invoke the VSEARCH software. This is normally just "vsearch",
but if you run this as a singularity container (or any other container) it may be something like
"srun singularity exec <container_name> vsearch".
A list with the following objects: Cluster.tbl
, Cpn.mat
and Genome.tbl
.
The Cluster.tbl
is a tibble
with data about all fragment clusters.
It contains columns with data about each cluster, including the centroid Sequence
and its Header
, making it possible to write the table to a fasta file using
writeFasta
.
The Cpn.mat
is the copy number matrix, implemented as a sparse dgeMatrix from the
Matrix
package. It has one row for each fragment cluster and one column
for each genome. This is the central data structure for de-convolving the genome
content from read-count data, see rmscols
.
The Genome.tbl
is a copy of the argument genome.tbl
, but with columns
N_cluster
and N_unique
added, containing the
number of clusters and the number of unique fragment clusters to each genome.
Lars Snipen.
getRMSfragments
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.