getGenotype: Get genotype data

getGenotypeR Documentation

Get genotype data

Description

Get matrix of genotype values from a GDS object

Usage

## S4 method for signature 'SeqVarGDSClass'
getGenotype(gdsobj, use.names=TRUE, parallel=FALSE)
## S4 method for signature 'SeqVarGDSClass'
getGenotypeAlleles(gdsobj, use.names=TRUE, sort=FALSE, parallel=FALSE)
## S4 method for signature 'SeqVarGDSClass'
refDosage(gdsobj, use.names=TRUE, ...)
## S4 method for signature 'SeqVarGDSClass'
altDosage(gdsobj, use.names=TRUE, sparse=FALSE, parallel=FALSE, ...)
## S4 method for signature 'SeqVarGDSClass'
expandedAltDosage(gdsobj, use.names=TRUE, sparse=FALSE, parallel=FALSE)
## S4 method for signature 'SeqVarGDSClass,numeric'
alleleDosage(gdsobj, n=0, use.names=TRUE, parallel=FALSE)
## S4 method for signature 'SeqVarGDSClass,list'
alleleDosage(gdsobj, n, use.names=TRUE, parallel=FALSE)

Arguments

gdsobj

A SeqVarGDSClass object with VCF data.

use.names

A logical indicating whether to assign sample and variant IDs as dimnames of the resulting matrix.

parallel

Logical, numeric, or other value to control parallel processing; see seqParallel for details.

sort

Logical for whether to sort alleles lexographically ("G/T" instead of "T/G").

sparse

Logical for whether to return the alterate allele dosage as a sparse matrix using the Matrix package. In most cases, setting sparse=TRUE will dramatically reduce the size of the returned object.

n

An integer, vector, or list indicating which allele(s) to return dosage for. n=0 is the reference allele, n=1 is the first alternate allele, and so on.

...

Arguments to pass to seqBlockApply, e.g. bsize to set the block size.

Details

In getGenotype, genotypes are coded as in the VCF file, where "0/0" is homozygous reference, "0/1" is heterozygous for the first alternate allele, "0/2" is heterozygous for the second alternate allele, etc. Separators are "/" for unphased and "|" for phased. If sort=TRUE, all returned genotypes will be unphased. Missing genotypes are coded as NA. Only haploid or diploid genotypes (the first two alleles at a given site) are returned.

If the argument n toalleleDosage is a single integer, the same allele is counted for all variants. If n is a vector with length=number of variants in the current filter, a different allele is counted for each variant. If n is a list, more than one allele can be counted for each variant. For example, if n[[1]]=c(1,3), genotypes "0/1" and "0/3" will each have a dosage of 1 and genotype "1/3" will have a dosage of 2.

Value

getGenotype and getGenotypeAlleles return a character matrix with dimensions [sample,variant] containing haploid or diploid genotypes.

getGenotype returns alleles as "0", "1", "2", etc. indicating reference and alternate alleles.

getGenotypeAlleles returns alleles as "A", "C", "G", "T". sort=TRUE sorts lexographically, which may be useful for comparing genotypes with data generated using a different reference sequence.

refDosage returns an integer matrix with the dosage of the reference allele: 2 for two copies of the reference allele ("0/0"), 1 for one copy of the reference allele, and 0 for two alternate alleles.

altDosage returns an integer matrix with the dosage of any alternate allele: 2 for two alternate alleles ("1/1", "1/2", etc.), 1 for one alternate allele, and 0 for no alternate allele (homozygous reference).

expandedAltDosage returns an integer matrix with the dosage of each alternate allele as a separate column. A variant with 2 possible alternate alleles will have 2 columns of output, etc.

alleleDosage with an integer argument returns an integer matrix with the dosage of the specified allele only: 2 for two copies of the allele ("0/0" if n=0, "1/1" if n=1, etc.), 1 for one copy of the specified allele, and 0 for no copies of the allele.

alleleDosage with a list argument returns a list of sample x allele matrices with the dosage of each specified allele for each variant.

Author(s)

Stephanie Gogarten

See Also

SeqVarGDSClass, applyMethod, seqGetData, seqSetFilter, alleleFrequency

Examples

gds <- seqOpen(seqExampleFileName("gds"))
seqSetFilter(gds, variant.sel=1323:1327, sample.sel=1:10)
nAlleles(gds)
getGenotype(gds)
getGenotypeAlleles(gds)
refDosage(gds)
altDosage(gds)
expandedAltDosage(gds)
alleleDosage(gds, n=0)
alleleDosage(gds, n=1)
alleleDosage(gds, n=c(0,1,0,1,0))
alleleDosage(gds, n=list(0,c(0,1),0,c(0,1),1))
seqClose(gds)

smgogarten/SeqVarTools documentation built on Sept. 15, 2024, 1:08 p.m.