msc.seqs: Retrieve sequences

View source: R/msc.seqs.R

msc.seqsR Documentation

Retrieve sequences

Description

The msc.seqs function retrieves the DNA sequence of a Minicircle Sequence Classes (MSC) together with all its hit sequences from a FASTA file and a corresponding UC file. This function is useful for extracting and analyzing specific MSCs and their associated hit sequences.

Usage

msc.seqs(fastafile, ucfile, clustnumbers, writeDNA = TRUE)

Arguments

fastafile

the name of the FASTA file containing all minicircle sequences.

ucfile

the name of the UC file.

clustnumbers

a character vector containing the cluster numbers (in the format "C0", "C1", etc.) of the MSCs for which you want to retrieve the sequences. These cluster numbers specify the MSCs and their associated hit sequences that need to be extracted from the FASTA file and UC file.

writeDNA

a logical parameter that is set to TRUE by default. When set to TRUE, this parameter will write the extracted sequences into separate FASTA files in the current directory.

Value

a table that summarizes the number of hit sequences found in each MSC, the MSC names, and the samples where the MSCs are present. This table provides an overview of the extracted sequences and their distribution across samples.

one FASTA file per MSC with all its hit sequences. These FASTA files can be further used for downstream analyses or sequence comparisons.

Examples

data(exData)

### select a subset of MSC
Lpe <- which(exData$species == "L. peruviana")
specific <- msc.subset(matrices[[7]], subset = Lpe)

### run function
seq <- msc.seqs(fastafile = system.file("extdata", "all.minicircles.circ.fasta", package="rKOMICS"),
                ucfile = system.file("extdata", exData$ucs, package="rKOMICS")[7], 
                clustnumbers = specific$clustnumbers, writeDNA = FALSE)


rKOMICS documentation built on July 9, 2023, 7:46 p.m.