VcfStack-class: VcfStack and RangedVcfStack Objects

VcfStackR Documentation

VcfStack and RangedVcfStack Objects

Description

The VcfStack class is a vector of related VCF files, for instance each file representing a separate chromosome. The class helps manage these files as a group. The RangedVcfStack class extends VcfStack by associating genomic ranges of interest to the collection of VCF files.

Constructor

VcfStack(files=NULL, seqinfo=NULL, colData=NULL, index=TRUE, check=TRUE) Creates a VcfStack object.

files

A VcfFilelist object. If a VcfFile or character vector is given a VcfFileList will be coerced. The character vector should be files paths pointing to VCF files. The character vector must be named, with names correspond to seqnames in each VCF file.

seqinfo

A Seqinfo object describing the levels genome and circularity of each sequence.

colData

An optional DataFrame describing each sample in the VcfStack. When present, row names must correspond to sample names in the VCF file.

index

A logical indicating if the vcf index files should be created.

check

A logical indicating if the check across samples should be performed

RangedVcfStack(vs=NULL, rowRanges=NULL) Creates a RangedVcfStack object.

vs

A VcfStack object.

rowRanges

An optional GRanges object associating the genomic ranges of interest to the collection of VCF files. The seqnames of rowRanges are a subset of seqnames(vs). If missing, a default is created from the seqinfo object of the provided VcfStack.

Accessors

In the code below, x is a VcfStack or RangedVcfStack object.

dim(x)

Get the number of files and samples in the VcfStack object.

colnames(x, do.NULL=TRUE, prefix="col")

Get the sample names in the VcfStack.

rownames(x), do.NULL=TRUE, prefix="row")

Get the names of the files in VcfStack.

dimnames(x))

Get the names of samples and the names of files in VcfStack.

files(x, ...), files(x, ..., check=TRUE) <- value

Get or set the files on x. value can be a named character() of file paths or a VcfFileList. The return value will be a VcfFileList.

seqinfo(x), seqinfo(x, new2old = NULL, pruning.mode = c("error", "coarse", "fine", "tidy")) <- value

Get or set the seqinfo on x. See seqinfo<- for details on new2old and pruning.mode.

seqlevelsStyle(x) <- value

Set the seqlevels according to the supplied style. File names and rowRanges will also be updated if applicable. See seqlevelsStyle<- for more details.

colData(x), colData(x, ...) <- value

Get or set the colData on x. value is a DataFrame.

rowRanges(x), rowRanges(x, ...) <- value

Get or set the rowRanges on x. x has to be a RangedVcfStack object. value is a GRanges.

Methods

In the code below, x is a VcfStack or RangedVcfStack object. i is a GRanges object, character() vector of seqnames, numeric() vector, logical() vector, or can be missing. For a RangedVcfStack object, assay and readVcfStack will use the associated rowRanges object for i.

vcfFields(x)

Returns a CharacterList of all available VCF fields, with names of fixed, info, geno and samples indicating the four categories. Each element is a character() vector of available VCF field names within each category.

assay(x, i, ..., BPPARAM=bpparam())

Get matrix of genotype calls from the VCF files. See genotypeToSnpMatrix. Argument i specifies which files to read. BPPARAM is the argument to the bpmapply.

readVcfStack(x, i, j=colnames(x), param=ScanVcfParam())

Get content of VCF files in the VcfStack. i indicates which files to read. j can be missing or a character() vector of sample names (see samples) present in the VCF files. param is a ScanVcfParam object. If param is used i and j are ignored.

show(object)

Display abbreviated information about VcfStack or RangedVcfStack object.

Subsetting

In the code below, x is a VcfStack or RangedVcfStack object.

x[i, j]

Get elements from ranges i and samples j as a VcfStack or RangedVcfStack object. Note: for a RangedVcfStack, the rowRanges object will also be subset.

i can be missing, a character() vector of seqnames, numeric() vector of indexes, logical() or GRanges object. When i is a GRanges object, seqnames(i) is then used to subset the files in the VcfStack.

j can be missing, a character() vector of sample names, a numeric(), logical() vector.

Helpers

getVCFPath(vs, chrtok)

Deprecated. Use files(vs)[chrtok] instead.

paths1kg(chrtoks)

Translate seqnames chrtoks to 1000 genomes genotype VCF urls.

Author(s)

Lori Shepherd mailto:Lori.Shepherd@RoswellPark.org and Martin Morgan mailto:Martin.Morgan@RoswellPark.org

See Also

VcfFile, VcfFileList.

Examples

## ---------------------------------------------------------------------
## CONSTRUCTION
## ---------------------------------------------------------------------
## point to VCF files and add names corresponding to the sequence
## present in the file
extdata <- system.file(package="GenomicFiles", "extdata")
files <- dir(extdata, pattern="^CEUtrio.*bgz$", full=TRUE)
names(files) <- sub(".*_([0-9XY]+).*", "\\1", basename(files))

## input data.frame describing the length of each sequence, coerce to
## 'Seqinfo' object
seqinfo <- as(readRDS(file.path(extdata, "seqinfo.rds")), "Seqinfo")

stack <- VcfStack(files, seqinfo)
stack

## Use seqinfo from VCF files instead of explict value
stack2 <- VcfStack(files)

rstack <- RangedVcfStack(stack)
gr <- GRanges(c("7:1-159138000", "X:1-155270560"))
rstack2 <- RangedVcfStack(stack, gr)
rstack2

## ---------------------------------------------------------------------
## ACCESSORS
## ---------------------------------------------------------------------
dim(stack)
colnames(stack)
rownames(stack)
dimnames(stack)
head(files(stack))
seqinfo(stack)
colData(stack)

## ---------------------------------------------------------------------
## METHODS
## ---------------------------------------------------------------------
readVcfStack(stack, i=GRanges("20:862167-62858306"))
i <- GRanges(c("20:862167-62858306", "7:1-159138000"))
readVcfStack(stack, i=i, j="NA12891")

head(assay(stack, gr))
head(assay(rstack2))

seqlevels(stack2)
rownames(stack2)
seqlevelsStyle(stack2)
seqlevelsStyle(stack2) <- "UCSC"
seqlevelsStyle(stack2)
seqlevels(stack2)
rownames(stack2)
vcfFields(stack2)

## ---------------------------------------------------------------------
## SUBSETTING
## ---------------------------------------------------------------------
## select rows 4, 5, 6 and samples 1, 2
stack[4:6, 1:2]
## select rownames "7", "11" and sample "NA12891"
stack[c("7", "11"), "NA12891"]
stack[c("7", "11", "X"), 2:3]
## subset with GRanges
stack[GRanges("20:862167-62858306")]

rstack2[]
rstack2[,1]

## ---------------------------------------------------------------------
## HELPERS
## ---------------------------------------------------------------------
paths1kg(1:3)

Bioconductor/GenomicFiles documentation built on May 2, 2024, 4:22 p.m.