R/read-pgx.R

Defines functions readPGx

Documented in readPGx

#' Read in PGx Data from VCF File
#'
#' This function is a wrapper around \link[VariantAnnotation]{readVcf} that reads
#' only the PGx locations into a PGx object. The user must provide an indexed VCF
#' file. The path to the indexed VCF file gets passed to \link[Rsamtools]{TabixFile}
#' which is then used as the file argument to \code{readVcf}.
#'
#' @details The PGx locations used to subset the input VCF file are derived from
#' PharmVar v5.1.14 VCF files available at \url{https://www.pharmvar.org/download}.
#' The "create-allele-defintions.R" file included in the data-raw directory of
#' this package describes the steps used to reproduce the construction of the
#' GRanges object used in this function. These GRanges objects are accessible
#' via the \code{pgxReferenceRanges} function. Likewise, haplotype definitions
#' for each gene were derived from PharmVar v5.1.14 haplotypes.tsv files. The
#' haplotype definitons for all genes are accessible via \code{pgxHaplotypeRanges}.
#' @param file A phased and normalized VCF file. This file must be indexed (.tbi)
#' @param gene The PGx gene to subset from VCF file. See \code{\link{pgxGenes}} for available genes.
#' @param genome The genome build. Currently only GRCh38 is supported.
#' @export
#' @return Object of class PGx
readPGx <- function(file, gene, genome = "GRCh38") {
  gene_list <- pgx::pgxGenes()
  genomes <- c("GRCh38")
  stopifnot("Multple genes supplied to function" = length(gene) == 1)
  stopifnot("Gene not in gene list" = gene %in% gene_list)
  stopifnot("Genome not genome list" = genome %in% genomes)

  ref <- pgx:::reference_ranges[[gene]]
  tab <- Rsamtools::TabixFile(file)
  vcf <- VariantAnnotation::readVcf(tab, genome = genome, param = ref)

  # Construct the PGx object
  matches <- GenomicRanges::intersect(ref, SummarizedExperiment::rowRanges(vcf))
  missings <- GenomicRanges::setdiff(ref,  SummarizedExperiment::rowRanges(vcf))
  haplotypes <- pgx::haplotypes_gr[pgx::haplotypes_gr$Gene == gene, ]

  PGx(vcf, gene = gene, referencePositions = ref,
      missingPositions = missings, matchingPositions = matches,
      referenceHaplotypes = haplotypes)
}
coriell-research/pgx documentation built on June 4, 2022, 11:08 a.m.