Internals: Internal functions
In alenxav/NAM: Nested Association Mapping

Internal function under optimization, complimentary statistics, and loops written in C++ to speed up gwas, gibbs and wgr.

Some of the functions available for users include:

01) Import_data(file,type=c('GBS','HapMap','VCF')): This function can be used to import genotypic data in the NAM format, providing a list with a genotypic matrix gen coded as 012 and a vector chr with count of markers per chromosome. Currently, it helps users to import three types of files: GBS text, HapMap and VCF.

02) markov(gen,chr): Imputation method based forwards Markov model for SNP data coded as 012. We recommend users to remove non-segregating markers before using this function.

03) LD(gen): Computes the linkage disequilibrium in terms of r2 for SNP data coded as 012. Missing data is not allowed.

04) PedMat(ped): Builds a kinship from a pedigree. Input format is provided with PedMat().

05) PedMat2(ped,gen=NULL,IgnoreInbr=FALSE,PureLines=FALSE): Builds a kinship from a genomic data and pedigree. Useful when not all individuals are genotyped. Row names of gen must indicate the genotype id.

06) Gdist(gen, method = 1): Computes genetic distance among individuals. Five methods are available: 1) Nei distance; 2) Edwards distance; 3) Reynolds distance; 4) Rogers distance; 5) Provesti's distance. 6) Modified Rogers distance

07) covar(sp=NULL,rho=3.5,type=1,dist=2.5): Builds a spatial kernel from field plot information. Input format is provided with covar(). Parameter rho detemines the decay of relationship among neighbor plots. type defines if the kernel is exponential (1), Gaussian (2) or some intermediate. dist informs the distance ratio between range neighbors and row neighbors.

08) eigX(gen,fam): Computes the input of the argument EIG of the function gwas2.

09) G2A_Kernels(gen): Computes a list of orthogonal kernels containing additive, dominant and first-order epistatic effects, in accordance to the G2A model from ZB Zeng et al. 2005. These kernels can be used for description of genetic architecture through variance components, for that we recommend packages varComp and BGLR.

10) NNsrc(sp=NULL,rho=1,dist=3): Using the same field data input required by the function covar, this function provides a list of nearest neighbor plots for each entry.

11) NNcov(NN,y): This function utilizes the output of NNsrc to generate a numeric vector, averageing the observed values of y. This function is useful to generate field covariates to control micro-environmental variance without krigging.

11) emXX(y,gen,...): Fits whole-genome regressions using the expectation-maximization algorithm as opposed to MCMC. Currently avaible methods include BayesA (emBA), BayesB (emBB), BayesC (emBC), BayesD (emBD), BLASSO (emBL), FLM (emDE), Elastic-Net (emEN), maximum likelihood (emML) and ridge regression (emRR). A cross-validation option is also available (emCV).

12) CNT(X): Centralizes parameters from matrix X.

13) IMP(X): Imputes missing points from matrix X with the average value of the column.

14) GAU(X): Creates a Gaussian kernel from matrix X.

15) GRM(X, Code012=FALSE): Creates genomic relationship matrix as linear kernel from matrix X. If genotypes are coded as 012 and Code012=TRUE, the kinship is the same as proposed by VanRaden (2008), otherwise the outcome is an additive G2A kernel.

16) MSX(X): Computes the cross-product of each column of X and the sum of variances of each column of X.

17) NOR(y,X,cxx,xx,maxit=50,tol=10e-6): Solves a ridge regression using GSRU, where y corresponds to the response variable, X is the set of parameters, cxx and xx are the output from the MSX function, maxit and tol are the convergence criteria.

18) SPC(y,blk,row,col,rN=3,cN=1): Computes a spatial covariate, similar to what could be obtained using NNsrc and NNcov but in a single step. It often is faster than NNsrc/NNcov.

19) SPM(blk,row,col,rN=3,cN=1): Computes a spatial matrix that capture nearest neighbots, to be used as design matrix of random effects. The least-square solution gives the same as SPC.

20) BRR2(y,X1,X2,it=1500,bi=500,df=5,R2=0.5): A simple C++ implementation of a Bayesian Ridge Regression that accomodates two random effects.

21) emML2(y,X1,X2,D1=NULL,D2=NULL): A simple C++ implementation of emML that accomodates two random effects.

22) press(y,K,MaxIt=10):Solves a PRESS-regularized GBLUP. You can provide K as a matrix or as the output of the functin eigen. MaxIt the maximum number of iterations to for updating missing values (if any) if H*y does not converge.

23) emGWA(y,gen): A vanilla algorithm written in C++ for GWAS (very simple, but very efficient). It fits a snpBLUP via EM-REML based GSRU, then run an additional round checkinkg the likelihood of treating each marker as fixed effect instead of random, thus avoiding double-fitting. It returns the marker p-values, snpBLUP marker effects for genomic prediction, LS marker effects from the GWAS, variance components, heritability, and GEBVs (fitted values).

24) BCpi(y,X,it=3000,bi=500,df=5,R2=0.5): A vanilla implementation in C++ of BayesCpi for GWAS or GWP. It returns the marker p-values (as the minus log probability of marker excluded), marker effects for genomic prediction, probability of marker included, variance components, heritability, and GEBVs (fitted values).

25) mrr(Y,X)/mkr(Y,K)/mrrV2(Y,X)/mrr2X(Y,X1,X2)/mkr2X(Y,K1,K2):A C++ implementation for multivariate regression.

Alencar Xavier

## Not run: 


# Forward gen imputation
data(tpod)
fast.impute = markov(gen,chr)

# Wright's A matrix 
PedMat()

# Pairwise LD
ld = LD(gen[,1:10])
heatmap(ld)

# Spatial correlation (kernel-based)
covar()

# Spatial correlation (NN-based)
NNsrc()

# Genetic distance
round(Gdist(gen[1:10,],method=1),2)

# PCs of a NAM kinship
eG = eigX(gen,fam)
plot(eG[[2]],col=fam)

# Polygenic kinship matrices
Ks = G2A_Kernels(gen)
ls(Ks)

# Genomic regression fitted via EM
h = emBA(y,gen)
plot(h$b,pch=20)

# GBLUP and RRBLUP
g = GRM(gen)
eg = eigen(g)
gblup = emML(y=y, gen=eg$vectors,D=eg$values)
rrblup = emML(y=y, gen=gen)
plot(gblup$hat,rrblup$hat,xlab = 'gblup',ylab='rrblup')

# Vanilla GWAS
gwa = emGWA(y,gen)
plot(gwa$PVAL,pch=20)

## End(Not run)