Description Details Constructors Methods Accessors Subsetting Author(s) References See Also Examples
S4 class for storing genotypes efficiently as column-oriented sparse matrices along with variant info
This class stores genotypes as a column-oriented sparse numeric
matrix, where rows correspond to samples and columns correspond to
variants. This is accomplished by extending the
dgCMatrix
class from which this class inherits
all slots. Information about variants is stored in an additional slot
named variantInfo
. This slot must be of class
VariantInfo
and have exactly as many elements as the
genotype matrix has columns. The variantInfo
slot has a
dedicated metadata column named “MAF” that contains the minor
allele frequencies (MAFs) of the variants. For convenience,
accessor functions variantInfo
and MAF
are available (see
below).
Objects of this class should only be created and manipulated by the constructors and accessors described below, as only these methods ensure the integrity of the created objects. Direct modification of object slots is strongly discouraged!
See help pages genotypeMatrix
and
readGenotypeMatrix
.
signature(object="GenotypeMatrix")
:
displays the matrix dimensions (i.e. the number of samples and variants)
along with some basic statistics of the minor allele frequency
(MAF).
signature(object="GenotypeMatrix")
:
returns variant information as a VariantInfo
object.
signature(object="GenotypeMatrix")
:
returns a numeric vector with the minor allele frequencies (MAFs).
Row and column names can be set and get as usual for matrix-like
objects with rownames
and
colnames
, respectively.
When setting the column names of a GenotypeMatrix
object, both the names of the variant info (slot variantInfo
) and the
column names of the matrix are set.
In the following code snippets, x
is a
GenotypeMatrix
object.
x[i, ]
: returns a
GenotypeMatrix
object that only
contains the samples selected by the index vector i
x[, j]
: returns a
GenotypeMatrix
object that only contains the variants selected by the index
vector j
x[i, j]
: returns a
GenotypeMatrix
object that only contains the samples selected by the index vector
i
and the variants selected by the index vector j
None of these subsetting functions support a drop
argument.
As soon as a drop
argument is supplied, no matter whether
TRUE
or FALSE
, all variant information is stripped off and a
dgCMatrix
object is returned.
By default, MAFs
are not altered by subsetting samples. However, if the optional
argument recomputeMAF
is set to TRUE
(the default is
FALSE
), MAFs are recomputed for the resulting
subsetted genotype matrix as described in
genotypeMatrix
. The ploidy for computing MAFs can
be controlled by the optional ploidy
argument (the default is 2).
Ulrich Bodenhofer bodenhofer@bioinf.jku.at
http://www.bioinf.jku.at/software/podkat
dgCMatrix
, VariantInfo
,
genotypeMatrix
, readGenotypeMatrix
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | ## create a toy example
A <- matrix(rbinom(50, 2, prob=0.2), 5, 10)
sA <- as(A, "dgCMatrix")
pos <- sort(sample(1:10000, ncol(A)))
seqname <- "chr1"
## variant with 'GRanges' object
gr <- GRanges(seqnames=seqname, ranges=IRanges(start=pos, width=1))
gtm <- genotypeMatrix(A, gr)
gtm
as.matrix(gtm)
variantInfo(gtm)
MAF(gtm)
## variant with 'pos' and 'seqnames' object
genotypeMatrix(sA, pos, seqname)
## variant with 'seqname:pos' strings passed through 'pos' argument
spos <- paste(seqname, pos, sep=":")
spos
genotypeMatrix(sA, spos)
## read data from VCF file using 'readVcf()' from the 'VariantAnnotation'
## package
if (require(VariantAnnotation))
{
vcfFile <- system.file("examples/example1.vcf.gz", package="podkat")
sp <- ScanVcfParam(info=NA, genome="GT", fixed=c("ALT", "FILTER"))
vcf <- readVcf(vcfFile, genome="hgA", param=sp)
rowRanges(vcf)
## call constructor for 'VCF' object
gtm <- genotypeMatrix(vcf)
gtm
variantInfo(gtm)
## alternatively, extract information from 'VCF' object and use
## variant with character matrix and 'GRanges' positions
## note that, in 'VCF' objects, rows correspond to variants and
## columns correspond to samples, therefore, we have to transpose the
## genotype
gt <- t(geno(vcf)$GT)
gt[1:5, 1:5]
gr <- rowRanges(vcf)
gtm <- genotypeMatrix(gt, gr)
as.matrix(gtm[1:20, 1:5, recomputeMAF=TRUE])
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.