GGBase -- infrastructure for GGtools, genetics of gene expression

NOTA BENE: IF STARTING ANEW, USE gQTLBase/gQTLstats

This package was published in the dawn of eQTL analysis. It uses somewhat idiosyncratic data structures. gQTL* packages are more up to date.

Introduction

The GGBase package defines infrastructure for analysis of data on the genetics of gene expression. This document is primarily of concern to developers; for information on conducting analyses in genetics of expression, please see the vignette for the GGtools package.

Primary class structure, and associated methods

\texttt{smlSet} is used to denote "SNP matrix list" integrative container for expression plus genotype data. The \texttt{SnpMatrix} class is defined in Clayton's \textit{snpStats} package.

library(GGBase)
getClass("smlSet")
showMethods(class="smlSet", where="package:GGBase")

Genotype data are stored in a list in the \texttt{smlEnv} environment to diminish copying as functions are called on the \texttt{smlSet} instance.

Example data structure

Expression data were published by the Wellcome Trust GENEVAR project in 2007. Genotype data are from HapMap phase II.

if ("GGtools" %in% installed.packages()[,1]) {
 library(GGtools)
 s20 = getSS("GGtools", "20")
 s20
}

Visualizing a specific gene-SNP relationship

The SNP rs6060535 was reported as an eQTL for CPNE1 by Cheung et al in a Nature paper of 2005.

if (exists("s20")) {
 plot_EvG(genesym("CPNE1"), rsid("rs6060535"), s20)
} else plot(1) # pdf must exist....

Genotype representations

The \texttt{SnpMatrix} class of the \textit{snpStats} package is used to represent genotypes. Imputed genotypes and their uncertainties can be represented in this scheme, but the example does not depict this.

if (exists("s20")) {
# raw bytes
 as(smList(s20)[[1]], "matrix")[1:5,1:5]
# generic calls
 as(smList(s20)[[1]], "character")[1:5,1:5]
# risk allele (alphabetically later nucleotide) counts
 as(smList(s20)[[1]], "numeric")[1:5,1:5]
}

Reducing memory footprint of integrative data structures

When millions of genotypes are recorded, it can be cumbersome to work with all simultaneously in memory, and it is seldom scientifically relevant to do so. Thus a packaging protocol has been established in conjunction with the \texttt{getSS} function to allow chromosome-at-a-time loading of genotype data in conjunction with expression data.

To deploy the packaging protocol, use the \texttt{externalize} function on a "one-time" full smlSet representation of the data, or mimic the behavior of this function by creating a new package folder structure and populating the inst/parts with rda files representing a partition (usually by chromosome) of the genotype SnpMatrix instances.



Try the GGBase package in your browser

Any scripts or data that you put into this service are public.

GGBase documentation built on Nov. 8, 2020, 5:45 p.m.