README.md

GUSrelate

gplv3+ Hits

Genotyping Uncertainty with Sequencing data and RELATEdness.

An R package for constructing genomic relationship matrices (GRMs) for autopolyploid species using high-throughput sequencing data.

Installation:

The easiest way to install GUSrelate in R is using the devtools package.

install.packages("devtools")
devtools::install_github("tpbilton/GUSbase")
devtools::install_github("tpbilton/GUSrelate")

Note: Some of the functions are coded in C and therefore an appropriate C compiler is needed for the package to work. For windows OS, Rtools (https://cran.r-project.org/bin/windows/Rtools/) provides a compiler.

Example:

We give a simple example to illustrate how to use GUSrelate using the simulated dataset found in the GUSbase package. There are two ways to load data into GUSrelate:

  1. From a VCF file: The data can be loaded and an RA object created using the following code
library(GUSrelate)
ra <- VCFtoRA(simDS()$vcf)
radata <- readRA(ra)
  1. From count matrices of reference and alternate allele counts: This requires loading the data into R and creating a RA object as follows:
library(GUSrelate)
## matrices of simulated data saved in Rdata file within package
datloc = system.file("extdata", "simdata.Rdata", package = "GUSbase")
load(datloc)

## Check that the objects ref, alt, chrom, pos and indID have been loaded.
ls()

## Create RA object
radata <- makeRA(ref=ref, alt=alt, chrom=chrom, pos=pos, indID=indID)

The next step is to create a metadata file containing ploidy information of each sample (in case the ploidy level is not consistent in the population) and any other phenotype/grouping information that might be used later to explore the GRM. An example file is provide within the GUSbase package and the file location can be obtained using

(sim_metaInfo = simDS()$meta)

The next step is to create an GRM object.

grm <- makeGRM(RAobj=radata, samfile=sim_metaInfo, 
               filter=list(MAF=NULL, MISS=NULL, BIN=100, MAXDEPTH=500))

The arguments of the makeGRM function are:

An additional filtering criteria based on Hardy Weinberg Disequilibrium (HWE) can be performed but requires performing the HWE test for each SNP as follows:

grm$HWEtest(nThreads = 3)

Comments on the $HWEtest function:

A GRM can be computed using the $computeGRM function as follows:

grm$computeGRM(name = "GRM_VR", method="VanRaden", ep=0, snpsubset=NULL, 
               filter=list(MAF=NULL, MISS=NULL, PVALUE=0.01))

The arguments of the $computeGRM function are:

Another GRM can be constructed with different parameters using the makeGRM function. For example:

grm$computeGRM(name = "VR_filt", method="VanRaden", ep=0, snpsubset=NULL,
               filter=list(MAF=0.05, MISS=0.4, PVALUE=0.01))

A principal component analysis (PCA) plot of the GRM can be produced using the function $plotPCA:

grm$PCA(name = "VR_filt", colour=NULL, shape=NULL) 

The arguments of the $PCA function are: name: A character string giving the name of the GRM. colour: Column name of the file supplied to the samfile argument to use a grouping variable for colour on the PCA plot. * shape: Column name of the file supplied to the samfile argument to use a grouping variable for shape of points on the PCA plot.

Lastly, the GRM created can be extracted from the GRM object using

grm$extractGRM(name = "VR_filt", IDvar=NULL)

or exported to a file using the function

grm$writeGRM(name = "VR_filt", filename = "VR_filt.csv", IDvar=NULL)

where the arguments are:

Note:

Although not described in Bilton et al. (2024), GUSrelate allows input of SNP and individual specific error values if available. For example:

## Generate matrix of error parameters (random)
## Note: would not recommend this in practice. Use either information in the VCF 
## or some other software to estimate the error parameter.
nSnps = grm$extractVar("nSnps")$nSnps
nInd = grm$extractVar("nInd")$nInd
ep_mat = matrix(plogis(rnorm(n = nSnps*nInd, mean = -10)), nrow=nInd, ncol=nSnps)

## Using the matrix of SNP and individual specific error parameters
grm$computeGRM(name = "VR_filt", method="VanRaden", ep=ep_mat, snpsubset=NULL,
               filter=list(MAF=0.05, MISS=0.4, PVALUE=0.01))

Future development:

This package is still under development and additional methods and functions for analyzing GRMs is intended to be added at a later date.

Citation:

To cite this R package:

Bilton, T.P., Sharma, S.K., Schofield, M.R., Black, M.A., Jacobs, J.M.E., Bryan, G.J. \& Dodds, K.G. (2024). Construction of relatedness matrices in autopolyploid populations using low depth high-throughput sequencing data. Theoretical and Applied Genetics. 137:64. doi: 10.1007/s00122-024-04568-2

Funding:

The initial development of this package was partially funded by the Ministry of Business, Innovation and Employment via its funding of the “Genomics for Production & Security in a Biological Economy” programme (Contract ID C10X1306).



tpbilton/GUSrelate documentation built on Feb. 20, 2025, 4:35 p.m.