PacBioCmpH5-class: Class "PacBioCmpH5"

Description Objects from the Class Slots Extends Methods Examples

Description

The PacBioCmpH5 (pronounced PacBio Comp H5) class represents alignments to reference sequences. Additionally, pulse information as well as quality values can be stored within this file.

Objects from the Class

Objects can be created by calls of the form PacBioCmpH5(fileName).

Slots

AlnIndex:

Object of class "data.frame" The alignment index represents each alignment with a single row. The row gives information about the position, quality, and strand of the alignment. Additionally, information about how to access read/ref bases is stored in the row.

AlnGroup:

Object of class "data.frame" representing the alignment groups in the file. An alignment group can be used to partition the reads into different categories in a structured fashion, e.g., machines or movies. Typically, this represents reads coming from the same movie.

RefGroup:

Object of class "data.frame" representing the reference sequences with alignments in the file. This object contains information about which reads map to which references as well as some information about the reference sequence itself.

MovieInfo:

Object of class "data.frame" representing information about the movies used during the alignment.

RefInfo:

Object of class "data.frame" representing information about the references used during the alignment. This data.frame will contain all references which were used in the alignment process, whereas the RefGroup contains only those references which had one or more reads with an alignment.

isSorted:

Object of class logical

version:

Object of class "stringOrNull"

fileName:

Object of class "character"

ePtr:

Object of class "externalptr" points to the H5File.

Extends

Class "PacBioDataFile", directly. Class "H5File", by class "PacBioDataFile", distance 2. Class "H5Obj", by class "PacBioDataFile", distance 3. Class "H5ObjOrNull", by class "PacBioDataFile", distance 4.

Methods

head

signature(x = "PacBioCmpH5"): ...

nrow

signature(x = "PacBioCmpH5"): ...

show

signature(object = "PacBioCmpH5"): ...

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
## open a handle to a cmp.h5 file.
cmpH5 <- PacBioCmpH5(system.file("h5_files", "aligned_reads.cmp.h5", package = "pbh5"))

## print a short description of the data.
show(cmpH5)

## get the contents of the entire file.
contents <- listH5Contents(cmpH5)

## contents is a list with an element for each object (dataset, group)
names(contents)

## the alignment index is the core of the file. The alignment index
## contains information about each alignment. This information can be
## used to compute summary statistics on the data.
head(alnIndex(cmpH5))

## direct access to the alignmentIndex is usually not necessary as
## most a large number of functions are available for accessing data
## associated with an alignment, e.g.,
plot(density(getAccuracy(cmpH5)))

## plot density of read length
plot(density(getReadLength(cmpH5)), log = 'x')

##
## coverage plots
##
## coverage plots take a reference sequence name. These are sanitized
## strings representing the reference sequence. These will typically
## be chromosomes, but may be otherwise.
##
cvg <- getCoverageInRange(cmpH5,  1)

## the coverage vector will be the SequenceLength.
stopifnot(length(cvg) ==  refGroup(cmpH5)$Length)

## summarize coverage vector.
summary(cvg)

## plot coverage vector
plot(cvg, type = 'l', col = 'gray')
lines(supsmu(1:length(cvg), cvg, span = .01), col = 'red')

## retrieve all of the alignments
alns <- getAlignments(cmpH5)
head(alns[[1]])

## Compute mismatch-insertion-deletion
a <- do.call(rbind, alns)
mosaicplot(prop.table(table(read = a[,1], reference = a[,2])))

## Some of the most useful functionality is via the
## getAlignmentsWithFeatures function.
aAndF <- getAlignmentsWithFeatures(cmpH5, idx = 1, fxs = list(IPD = getIPD), collapse = TRUE)
head(aAndF)
boxplot(IPD ~ reference, data = aAndF, log = 'y', ylim = c(.05, 10))

PacificBiosciences/R-pbh5 documentation built on May 7, 2019, 11:54 p.m.