Home

/

GitHub

/

larssnip/microRMS

/

RMSobject: Constructing an RMS object

RMSobject: Constructing an RMS object
In larssnip/microRMS: Processing and analysis of RMS data

View source: R/dbase.R

RMSobject

R Documentation

Constructing an RMS object

Description

Constructs an RMS object with information about a set of genomes.

Usage

RMSobject(
  genome.tbl,
  frg.dir,
  vsearch.exe = "vsearch",
  identity = 0.99,
  min.length = 30,
  max.length = 500,
  verbose = TRUE,
  threads = 1,
  tmp.dir = "tmp"
)

Arguments

`genome.tbl`	A table (data.frame or tibble) with genome information, see below.
`frg.dir`	Path to folder with fragment fasta files.
`vsearch.exe`	Text with the VSEARCH executable command.
`identity`	The sequence identity for clustering fragments (0.0-1.0).
`min.length`	Minimum fragment length (integer).
`max.length`	Maximum fragment length (integer).
`verbose`	Turn on/off output text during processing (logical).
`threads`	Number of threads to be used by `vsearch` (integer).
`tmp.dir`	Name of folder for temporary output, will be created if not already existing.

Details

The genome.tbl has a row for each genome to include in the RMS database. There must be a column named genome_file, containing fasta filenames. These must be the names of the fasta files containing the RMS fragments from each genome. Use getRMSfragments to create these fasta files, ensuring the fasta headers follow the pattern <genome.ID>_RMSx, where <genome.ID> is some text unique to each genome and x is some integer. The genome.tbl may contain other columns as well, but genome_file is required.

The vsearch.exe is the exact command to invoke the VSEARCH software. This is normally just "vsearch", but if you run this as a singularity container (or any other container) it may be something like "srun singularity exec <container_name> vsearch".

Value

A list with the following objects: Cluster.tbl, Cpn.mat and Genome.tbl.

The Cluster.tbl is a tibble with data about all fragment clusters. It contains columns with data about each cluster, including the centroid Sequence and its Header, making it possible to write the table to a fasta file using writeFasta.

The Cpn.mat is the copy number matrix, implemented as a sparse dgeMatrix from the Matrix package. It has one row for each fragment cluster and one column for each genome. This is the central data structure for de-convolving the genome content from read-count data, see rmscols.

The Genome.tbl is a copy of the argument genome.tbl, but with columns N_cluster and N_unique added, containing the number of clusters and the number of unique fragment clusters to each genome.

Author(s)

Lars Snipen.

larssnip/microRMS
Processing and analysis of RMS data

RMSobject: Constructing an RMS object
In larssnip/microRMS: Processing and analysis of RMS data

Constructing an RMS object

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Related to RMSobject in larssnip/microRMS...

R Package Documentation

Browse R Packages

We want your feedback!

larssnip/microRMS Processing and analysis of RMS data

RMSobject: Constructing an RMS object In larssnip/microRMS: Processing and analysis of RMS data

Constructing an RMS object

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Related to RMSobject in larssnip/microRMS...

R Package Documentation

Browse R Packages

We want your feedback!

larssnip/microRMS
Processing and analysis of RMS data

RMSobject: Constructing an RMS object
In larssnip/microRMS: Processing and analysis of RMS data