varblup: Convert BLUP of individuals into BLUP of variants

ghap.varblupR Documentation

Convert BLUP of individuals into BLUP of variants

Description

Given genomic estimated breeding values (GEBVs), compute Best Linear Unbiased Predictor (BLUP) solutions for variant effects.

Usage

ghap.varblup(object, gebv, covmat, type = 1,
             only.active.variants = TRUE,
             weights = NULL, tol = 1e-12,
             vcp = NULL, errormat = NULL, 
             errorname = "", nlambda = 1000,
             ncores = 1, verbose = TRUE)

Arguments

object

A valid GHap object (phase, haplo or plink).

gebv

A named vector of genomic estimated breeding values.

covmat

An additive genomic relationship matrix, such as obtained with type=1 or type=2 in the ghap.kinship function.

type

A numeric value indicating the type of relationship matrix (see details in the ghap.kinship function).

only.active.variants

A logical value specifying whether only active variants should be included in the calculations (default = TRUE).

weights

A numeric vector providing variant-specific weights.

tol

A numeric value specifying the scalar to add to the diagonal of the relationship matrix it is not inversible (default = 1e-10).

vcp

A numeric value for the variance in GEBVs.

errormat

A square error matrix for GEBVs. This matrix can be obtained with argument extras = "LHSi" in the ghap.lmm function. If provided, calculation of standard errors and test statistics for the variants is activated.

errorname

The name used for the random effect representing GEBVs in the ghap.lmm function. If the error matrix was imported from somewhere else, this argument can be ignored provided that the names in the error matrix match the ones in the relationship matrix.

nlambda

A numeric value for the number of variants to be used in the estimation of the inflation factor (default = 1000).

ncores

A numeric value specifying the number of cores to be used in parallel computations (default = 1).

verbose

A logical value specifying whether log messages should be printed (default = TRUE).

Details

The function uses the equation:

\mathbf{\hat{a}} = q\mathbf{DM}^T\mathbf{K}^{-1}\mathbf{\hat{u}}

where \mathbf{M} is the n x m matrix of genotypes, where n is the number of individuals and m is the number of variants (i.e, markers or HapAlleles), \mathbf{D} = diag(d_i) with d_i being the weight of variant i (default d_i = 1), q is the inverse weighted sum of variances in the columns of \mathbf{M}, \mathbf{K} is the additive genomic relationship matrix and \hat{u} is the vector of GEBVs.

Value

The function returns a data frame with results from the genome-wide conversion of BLUP of individuals into BLUP of variants. If a GHap.haplo object is used, the first columns of the data frame will be:

CHR

Chromosome name.

BLOCK

Block alias.

BP1

Block start position.

BP2

Block end position.

For GHap.phase and GHap.plink objects, the first columns will be:

CHR

Chromosome name.

MARKER

Block start position.

BP

Block end position.

The remaining columns of the data frame will be equal for any class of GHap objects:

ALLELE

Identity of the counted (A1 or haplotype) allele.

FREQ

Frequency of the allele.

SCORE

Estimated BLUP of the allele.

VAR

Variance in allele-specific breeding values.

pVAR

Proportion of variance explained by the allele.

CENTER

Average genotype (meaningful only for predictions with ghap.profile).

SCALE

A constant set to 1 (meaningful only for predictions with ghap.profile).

If an error matrix for GEBVs is provided through the 'errormat' argument, the following additional columns are included in the data frame:

SE

Standard error for the BLUP of the allele.

CHISQ.EXP

Expected values for the test statistics.

CHISQ.OBS

Observed value for the test statistics.

CHISQ.GC

Test statistics scaled by the inflation factor (Genomic Control). Inflation is computed through regression of observed quantiles onto expected quantiles. In order to avoid overestimation by variants rejecting the null hypothesis, a random sample of variants (with size controled via the nlambda argument) is taken within three standard deviations from the mean of the distribution of test statistics.

LOGP

log10(1/P) or -log10(P) for the BLUP of the allele.

LOGP.GC

log10(1/P) or -log10(P) for the BLUP of the allele (scaled by the inflation factor).

Author(s)

Yuri Tani Utsunomiya <ytutsunomiya@gmail.com>

References

I. Stranden and D.J. Garrick. Technical note: derivation of equivalent computing algorithms for genomic predictions and reliabilities of animal merit. J Dairy Sci. 2009. 92:2971-2975.

J.L.G. Duarte et al. Rapid screening for phenotype-genotype associations by linear transformations of genomic evaluations. BMC Bioinformatics. 2014, 15:246.

Examples


# #### DO NOT RUN IF NOT NECESSARY ###
# 
# # Copy plink data in the current working directory
# exfiles <- ghap.makefile(dataset = "example",
#                          format = "plink",
#                          verbose = TRUE)
# file.copy(from = exfiles, to = "./")
# 
# # Copy metadata in the current working directory
# exfiles <- ghap.makefile(dataset = "example",
#                          format = "meta",
#                          verbose = TRUE)
# file.copy(from = exfiles, to = "./")
# 
# # Load plink data
# plink <- ghap.loadplink("example")
# 
# # Load phenotype and pedigree data
# df <- read.table(file = "example.phenotypes", header=T)
# 
# ### RUN ###
# 
# # Subset individuals from the pure1 population
# pure1 <- plink$id[which(plink$pop == "Pure1")]
# plink <- ghap.subset(object = plink, ids = pure1, variants = plink$marker)
# 
# # Subset markers with MAF > 0.05
# freq <- ghap.freq(plink)
# mkr <- names(freq)[which(freq > 0.05)]
# plink <- ghap.subset(object = plink, ids = pure1, variants = mkr)
# 
# # Compute genomic relationship matrix
# # Induce sparsity to help with matrix inversion
# K <- ghap.kinship(plink, sparsity = 0.01)
# 
# # Fit mixed model
# df$rep <- df$id
# model <- ghap.lmm(formula = pheno ~ 1 + (1|id) + (1|rep),
#                   data = df,
#                   covmat = list(id = K, rep = NULL),
#                   extras = "LHSi")
# refblup <- model$random$id$Estimate
# names(refblup) <- rownames(model$random$id)
# 
# # Convert blup of individuals into blup of variants
# mkrblup <- ghap.varblup(object = plink, gebv = refblup,
#                         covmat = K, vcp = model$vcp$Estimate[1],
#                         errormat = model$extras$LHSi, errorname = "id")
# 
# # Build GEBVs from variant effects and compare predictions
# gebv <- ghap.profile(object = plink, score = mkrblup)
# plot(gebv$SCORE, refblup); abline(0,1)
# 
# # Compare variant solutions with regular GWAS
# gwas <- ghap.assoc(object = plink,
#                    formula = pheno ~ 1 + (1|id) + (1|rep),
#                    data = df,
#                    covmat = list(id = K, rep = NULL))
# ghap.manhattan(data = gwas, chr = "CHR", bp = "BP", y = "LOGP")
# ghap.manhattan(data = mkrblup, chr = "CHR", bp = "BP", y = "LOGP")
# plot(mkrblup$LOGP, gwas$LOGP); abline(0,1)


GHap documentation built on July 2, 2022, 1:07 a.m.