cdhit: A function to execute CD-HIT

View source: R/cdhit.R

cdhitR Documentation

A function to execute CD-HIT

Description

This function executes a ubuntu docker that cluster minION sequences using CD-HIT

Usage

cdhit(
  group = c("sudo", "docker"),
  scratch.folder,
  data.folder,
  identity.threshold = 0.9,
  memory.limit = 30000,
  threads = 0,
  word.length = 7
)

Arguments

group

a character string. Two options: sudo or docker, depending to which group the user belongs

scratch.folder

a character string indicating the path of the scratch folder

data.folder

a character string indicating the folder where input data are located and where output will be written

identity.threshold

sequence identity threshold, default 0.9, this is the default cd-hit's global sequence identity calculated as: number of identical bases in alignment divided by the full length of the shorter sequence

memory.limit

memory limit in MB for the program, default 30000. 0 for unlimitted

threads

number of threads, default 0; with 0, all CPUs will be used

word.length

7 for thresholds between 0.88 and 0.9 for other option see user manual cdhit

Value

Returns two files: a fasta file of representative sequences and a text file of list of clusters

Author(s)

Raffaele A Calogero, raffaele.calogero [at] unito [dot] it, University of Torino. Italy

Examples

## Not run: 
    #running fastq2fasta
    cdhit(group="docker", scratch.folder="/data/scratch", data.folder=getwd(), identity.threshold=0.90, memory.limit=8000, threads=0, word.length=7)

## End(Not run)


kendomaniac/docker4seq documentation built on Sept. 3, 2024, 6:42 p.m.