Collapse: Collapse reads into haplotypes and frequencies

Description Usage Arguments Details Value Author(s) References Examples

View source: R/Collapse.R

Description

Collapse summarizes aligned reads into haplotypes with their frequencies. Recollapse is used to update the collapse after some type of manipulation may have resulted in duplicate haplotypes.

Usage

1
2
Collapse(seqs)
Recollapse(seqs,nr)

Arguments

seqs

DNAStringSet or AAStringSet object with the sequences to collapse.

nr

Vector with the haplotype counts.

Details

Recollapse is used when haplotypes may become equivalent after some type of manipulation. It removes duplicate sequences and updates their frequencies.

Value

Collapse and Recollapse return a list with two elements.

nr

Vector of the haplotype counts.

hseqs

DNAStringSet or AAStringSet with the haplotype sequence.

Author(s)

Mercedes Guerrero-Murillo and Josep Gregori

References

Gregori J, Esteban JI, Cubero M, Garcia-Cehic D, Perales C, Casillas R, Alvarez-Tejado M, Rodríguez-Frías F, Guardia J, Domingo E, Quer J. Ultra-deep pyrosequencing (UDPS) data treatment to study amplicon HCV minor variants. PLoS One. 2013 Dec 31;8(12):e83361. doi: 10.1371/journal.pone.0083361. eCollection 2013. PubMed PMID: 24391758; PubMed Central PMCID: PMC3877031.

Ramírez C, Gregori J, Buti M, Tabernero D, Camós S, Casillas R, Quer J, Esteban R, Homs M, Rodriguez-Frías F. A comparative study of ultra-deep pyrosequencing and cloning to quantitatively analyze the viral quasispecies using hepatitis B virus infection as a model. Antiviral Res. 2013 May;98(2):273-83. doi: 10.1016/j.antiviral.2013.03.007. Epub 2013 Mar 20. PubMed PMID: 23523552.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Load raw reads.
filepath<-system.file("extdata","Toy.GapsAndNs.fna", package="QSutils")
reads <- readDNAStringSet(filepath)

# Collapse this reads into haplotypes
lstCollapsed <- Collapse(reads)
lstCorrected<-CorrectGapsAndNs(lstCollapsed$hseqs[2:length(lstCollapsed$hseqs)],
                lstCollapsed$hseqs[[1]])
#Add again the most abundant haplotype.
lstCorrected<- c(lstCollapsed$hseqs[1],lstCorrected)
lstCorrected
# Recollapse the corrected haplotypes
lstRecollapsed<-Recollapse(lstCorrected,lstCollapsed$nr)
lstRecollapsed

Example output

Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package:BiocGenericsThe following objects are masked frompackage:parallel:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked frompackage:stats:

    IQR, mad, sd, var, xtabs

The following objects are masked frompackage:base:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package:S4VectorsThe following object is masked frompackage:base:

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package:BiostringsThe following object is masked frompackage:base:

    strsplit

DNAStringSet object of length 34:
     width seq                                              names               
 [1]    50 TGACGCGCACAGAGTGCTGCTAA...TGGGTTACCCCGTCGTGGTCGC 1
 [2]    50 TGACGCGCACAGAGTGCTGCTAA...TGGGTTACCCCGTCGTGGTCGC 2
 [3]    50 TGACGCGCACAGAGTGCTGCTAA...TGGGTTACCCCGTCGTGGTCGC 3
 [4]    50 TGACGCGCACAGAGTGCTGCTAA...TGGGTTACCCCGTCGTGGTCGC 4
 [5]    50 TGACGCGCACAGAGTGCTGCTAA...TGGGTTACCCCGTCGTGGTCGC 5
 ...   ... ...
[30]    50 TGACGCGCACAGAGTGCTGCTAA...TGGGTTACCCCGTCGTGGTCGC 30
[31]    50 TGACGCGCACAGAGTGCTGCTAA...TGGGTTACCCCGTCGTGGTCGC 31
[32]    50 TGACGCGCACAGAGTGCTGCTAA...TGGGTTACCCCGTCGTGGTCGC 32
[33]    50 TGACGCGCACAGAGTGCTGCTAA...TGGGTTACCCCGTCGTGGTCGC 33
[34]    50 TGACGCGCACAGAGTGCTGCTAA...TGGGTTACCCCGTCGTGGTCGC 34
$nr
[1] 100

$seqs
DNAStringSet object of length 1:
    width seq                                               names               
[1]    50 TGACGCGCACAGAGTGCTGCTAA...CTGGGTTACCCCGTCGTGGTCGC 1

QSutils documentation built on Nov. 8, 2020, 7:42 p.m.