ancsvm: SVM-based predictions of haplotype ancestry

ghap.ancsvmR Documentation

SVM-based predictions of haplotype ancestry

Description

This function uses Support Vector Machines (SVM) to predict ancestry of haplotype alleles in test samples.

Usage

 ghap.ancsvm(object, blocks, test = NULL, train = NULL,
             cost = 1, gamma = NULL, tune = FALSE,
             only.active.samples = TRUE, only.active.markers = TRUE,
             ncores = 1, verbose = TRUE)

Arguments

object

A GHap.phase object.

blocks

A data frame containing block boundaries, such as supplied by the ghap.blockgen function.

test

Character vector of individuals to test.

train

Character vector of individuals to use as reference samples.

cost

A numeric value specifying the C constant of the regularization term in the Lagrange formulation.

gamma

A numeric value specifying the gamma parameter of the RBF kernel (default = 1/blocksize).

tune

A logical value specfying if a grid search is to be performed for parameters (default = FALSE).

only.active.samples

A logical value specifying whether only active samples should be included in predictions (default = TRUE).

only.active.markers

A logical value specifying whether only active markers should be used for predictions (default = TRUE).

ncores

A numeric value specifying the number of cores to be used in parallel computing (default = 1).

verbose

A logical value specfying whether log messages should be printed (default = TRUE).

Details

This function predicts haplotype allele ancestry using Support Vector Machines (SVM) together with a Gaussian Radial Basis Function (RBF) kernel. The user is required to specify the C constant of the regularization term in the Lagrange formulation (default cost = 1) and the gamma parameter (default gamma = 1/blocksize) of the RBF kernel.

Value

If ran with tune = FALSE, the function returns a dataframe with the following columns:

BLOCK

Block alias.

CHR

Chromosome name.

BP1

Block start position.

BP2

Block end position.

POP

Original population label.

ID

Individual name.

HAP1

Predicted ancestry of haplotype 1.

HAP2

Predicted ancestry of haplotype 2.

If tune = TRUE, the function returns a dataframe with the following columns:

cost

The candidate value of the C constant.

gamma

The canidate value of the gamma parameter.

accuracy

The percentage of correctly assigned ancestries.

Author(s)

Yuri Tani Utsunomiya <ytutsunomiya@gmail.com>

References

R. J. Haasl et al. Genetic ancestry inference using support vector machines, and the active emergence of a unique American population. Eur J Hum Genet. 2013. 21(5):554-62.

D. Meyer et al. e1071: Misc Functions of the Department of Statistics, Probability Theory Group (e1071). TU Wien. 2019 R Package Version 1.7-0.1. http://cran.r-project.org/web/packages/e1071/index.html.

See Also

svm, ghap.ancsmooth, ghap.ancplot, ghap.ancmark

Examples


# #### DO NOT RUN IF NOT NECESSARY ###
# 
# # Copy phase data in the current working directory
# exfiles <- ghap.makefile(dataset = "example",
#                          format = "phase",
#                          verbose = TRUE)
# file.copy(from = exfiles, to = "./")
# 
# # Load phase data
# 
# phase <- ghap.loadphase("example")
# 
# ### RUN ###
# 
# # Calculate marker density
# mrkdist <- diff(phase$bp)
# mrkdist <- mrkdist[which(mrkdist > 0)]
# density <- mean(mrkdist)
# 
# # Generate blocks for admixture events up to g = 10 generations in the past
# # Assuming mean block size in Morgans of 1/(2*g)
# # Approximating 1 Morgan ~ 100 Mbp
# g <- 10
# window <- (100e+6)/(2*g)
# window <- ceiling(window/density)
# step <- ceiling(window/4)
# blocks <- ghap.blockgen(phase, windowsize = window,
#                         slide = step, unit = "marker")
# 
# # Tune supervised analysis
# train <- unique(phase$id[which(phase$pop != "Cross")])
# ranblocks <- sample(x = 1:nrow(blocks), size = 5, replace = FALSE)
# tunesvm <- ghap.ancsvm(object = phase, blocks = blocks[ranblocks,],
#                        train = train, gamma = 1/window*c(0.1,1,10),
#                        tune = TRUE)
# 
# # Supervised analysis with default parameters
# hapadmix <- ghap.ancsvm(object = phase, blocks = blocks,
#                         train = train)
# anctracks <- ghap.ancsmooth(object = phase, admix = hapadmix)
# ghap.ancplot(ancsmooth = anctracks)


GHap documentation built on July 2, 2022, 1:07 a.m.