preprocess: Filtering of minicircle sequences

Description Usage Arguments Value Examples

View source: R/preprocess.R

Description

Assembling minicircle sequences with KOMICS generates individual fasta files (one per sample). The preprocess function allows you to filter the minicircle sequences based on sequence length (as the size of minicircular kDNA is species-specific and variable) and circularization success. The function will write filtered individual fasta files in the current working directory.

Usage

1
preprocess(files, groups, circ = TRUE, min = 500, max = 1500, writeDNA = TRUE)

Arguments

files

a character vector containing the fasta file names in the format sampleA.minicircles.fasta, sampleB.minicircles.fasta,... (output of KOMICS).

groups

a factor specifying to which group (e.g. species) the samples belong to. It should have the same length as the list of files.

circ

a logical parameter. By default non-circularized minicicle sequences will be excluded. If interested in non-circularized sequences as well, set the parameter to FALSE.

min

a minimum value for the minicircle sequences length. Default value is set to 500.

max

a maximum value for the minicircle sequences length. Default value is set to 1500.

writeDNA

a logical parameter. By default filtered minicircle sequences will by written in fasta format to the current working directory. Set to FALSE if only interested in other output values like plots and summary.

Value

samples

the sample names (based on the input files).

N_MC

a table containing the sample name, which group it belongs to and the number of minicirce sequences (N_MC) before and after filtering.

plot

a barplot visualizing the number of minicircle sequences per sample before and after filtering.

summary

the total number of minicircle sequences before and after filtering.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
require(ggplot2)
data(exData)

### setwd("")

### run function
table(exData$species)
pre <- preprocess(files = system.file("extdata", exData$fastafiles, package="rKOMICS"),
                  groups = exData$species,
                  circ = TRUE, min = 500, max = 1200, writeDNA = FALSE)

pre$summary 

### visualize results
barplot(pre$N_MC[,"beforefiltering"], 
        names.arg = pre$N_MC[,1], las=2, cex.names=0.4)

### alter plot
pre$plot + labs(caption = paste0('N of MC sequences before and after filtering, ', Sys.Date()))

rKOMICS documentation built on July 21, 2021, 5:07 p.m.