ancsvm: SVM-based predictions of haplotype ancestry
In GHap: Genome-Wide Haplotyping

ghap.ancsvm

R Documentation

SVM-based predictions of haplotype ancestry

Description

This function uses Support Vector Machines (SVM) to predict ancestry of haplotype alleles in test samples.

Usage

 ghap.ancsvm(object, blocks, test = NULL, train = NULL,
             cost = 1, gamma = NULL, tune = FALSE,
             only.active.samples = TRUE, only.active.markers = TRUE,
             ncores = 1, verbose = TRUE)

Arguments

`object`	A GHap.phase object.
`blocks`	A data frame containing block boundaries, such as supplied by the `ghap.blockgen` function.
`test`	Character vector of individuals to test.
`train`	Character vector of individuals to use as reference samples.
`cost`	A numeric value specifying the C constant of the regularization term in the Lagrange formulation.
`gamma`	A numeric value specifying the gamma parameter of the RBF kernel (default = 1/blocksize).
`tune`	A logical value specfying if a grid search is to be performed for parameters (default = FALSE).
`only.active.samples`	A logical value specifying whether only active samples should be included in predictions (default = TRUE).
`only.active.markers`	A logical value specifying whether only active markers should be used for predictions (default = TRUE).
`ncores`	A numeric value specifying the number of cores to be used in parallel computing (default = 1).
`verbose`	A logical value specfying whether log messages should be printed (default = TRUE).

Details

This function predicts haplotype allele ancestry using Support Vector Machines (SVM) together with a Gaussian Radial Basis Function (RBF) kernel. The user is required to specify the C constant of the regularization term in the Lagrange formulation (default cost = 1) and the gamma parameter (default gamma = 1/blocksize) of the RBF kernel.

Value

If ran with tune = FALSE, the function returns a dataframe with the following columns:

`BLOCK`	Block alias.
`CHR`	Chromosome name.
`BP1`	Block start position.
`BP2`	Block end position.
`POP`	Original population label.
`ID`	Individual name.
`HAP1`	Predicted ancestry of haplotype 1.
`HAP2`	Predicted ancestry of haplotype 2.

If tune = TRUE, the function returns a dataframe with the following columns:

`cost`	The candidate value of the C constant.
`gamma`	The canidate value of the gamma parameter.
`accuracy`	The percentage of correctly assigned ancestries.

Author(s)

Yuri Tani Utsunomiya <ytutsunomiya@gmail.com>

References

R. J. Haasl et al. Genetic ancestry inference using support vector machines, and the active emergence of a unique American population. Eur J Hum Genet. 2013. 21(5):554-62.

D. Meyer et al. e1071: Misc Functions of the Department of Statistics, Probability Theory Group (e1071). TU Wien. 2019 R Package Version 1.7-0.1. http://cran.r-project.org/web/packages/e1071/index.html.

Examples


# #### DO NOT RUN IF NOT NECESSARY ###
# 
# # Copy phase data in the current working directory
# exfiles <- ghap.makefile(dataset = "example",
#                          format = "phase",
#                          verbose = TRUE)
# file.copy(from = exfiles, to = "./")
# 
# # Load phase data
# 
# phase <- ghap.loadphase("example")
# 
# ### RUN ###
# 
# # Calculate marker density
# mrkdist <- diff(phase$bp)
# mrkdist <- mrkdist[which(mrkdist > 0)]
# density <- mean(mrkdist)
# 
# # Generate blocks for admixture events up to g = 10 generations in the past
# # Assuming mean block size in Morgans of 1/(2*g)
# # Approximating 1 Morgan ~ 100 Mbp
# g <- 10
# window <- (100e+6)/(2*g)
# window <- ceiling(window/density)
# step <- ceiling(window/4)
# blocks <- ghap.blockgen(phase, windowsize = window,
#                         slide = step, unit = "marker")
# 
# # Tune supervised analysis
# train <- unique(phase$id[which(phase$pop != "Cross")])
# ranblocks <- sample(x = 1:nrow(blocks), size = 5, replace = FALSE)
# tunesvm <- ghap.ancsvm(object = phase, blocks = blocks[ranblocks,],
#                        train = train, gamma = 1/window*c(0.1,1,10),
#                        tune = TRUE)
# 
# # Supervised analysis with default parameters
# hapadmix <- ghap.ancsvm(object = phase, blocks = blocks,
#                         train = train)
# anctracks <- ghap.ancsmooth(object = phase, admix = hapadmix)
# ghap.ancplot(ancsmooth = anctracks)

GHap documentation built on July 2, 2022, 1:07 a.m.