LR.PV.model: Logistic regression-based Prism Vote model

Description Usage Arguments Value Examples

View source: R/PredictionModel.R

Description

A Function to build a logistic regression prediction model on the training dataset, and make predictions on the test dataset with the framework of Prism Vote.

Usage

1
LR.PV.model(train.data.list, test.data.list, test.prob.to.stratum)

Arguments

train.data.list

[list] A list containing the training data [data.frame] of each stratum. The stratum data with rows corresponding to individuals and columns to features, and must contain a column named Y providing the case/control phenotype (0 = unaffected (control), 1 = affected (case)).

test.data.list

[list] A list containing the test data [data.frame]. It must contain a column named Y providing the case/control phenotype (0 = unaffected (control), 1 = affected (case)).

test.prob.to.stratum

[list] Output of stratification. A list providing the probability that the test samples belong to each stratum.

Value

LR.PV.model return a list containing the predicted probability from each stratum and the aggregated prediction.

pred.stratum

A list of the predicted probability from each stratum.

pred.agg

A vector of the aggregated prediction.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
input.dir <- system.file("data", package="pv")
output.dir <- system.file("data", package="pv")
path2plink <- '/path/to/plink'
## Not run: 
stratum.count <- 2
covar.number <-  c(2, 3)

stratification.result <- stratification(input.dir = input.dir,
output.dir = input.dir,
train.genotype = "train",
test.genotype = "test",
stratum.count = stratum.count,
PCA.separate = FALSE,
PCs.count = 10,
plink.path = path2plink,
verbose = TRUE)

feature.selection.result <- list()
for (i in 1:stratum.count) {
  feature.selection.result[[i]] <- feature.selection(input.dir = input.dir,
  output.dir = output.dir,
  genotype = paste0("train.stratum.",i),
  phenotype = paste0("train.stratum.",i,".phenotype.txt"),
  covar.number = covar.number,
  plink.path = path2plink,
  topK = 10,
  verbose = TRUE)
}

train.genotype.path <- file.path(input.dir, "train.raw")
train.genotype.path <- gsub('\\\\', '/', train.genotype.path)
train.data <- data.table::fread(train.genotype.path, data.table = FALSE)[, -c(1,2,3,4,5,6)]
colnames(train.data) <- unlist(purrr::map(colnames(train.data),function(x) {substr(x,1, nchar(x)-2)}))
train.pheno <- data.table::fread(train.phenotype.path, data.table = FALSE)

test.genotype.path <- file.path(input.dir, "test.raw")
test.genotype.path <- gsub('\\\\', '/', test.genotype.path)
test.data <- data.table::fread(test.genotype.path, data.table = FALSE)[, -c(1,2,3,4,5,6)]
colnames(test.data) <- unlist(purrr::map(colnames(test.data),function(x) {substr(x,1, nchar(x)-2)}))
test.pheno <- data.table::fread(test.phenotype.path, data.table = FALSE)

train.data.list <- list()
test.data.list <- list()
for (i in 1:stratum.count) {
  train.data.stratum <- train.data[stratification.result$train.stratum.index[[i]],
  feature.selection.result[[i]]$index, drop = FALSE]
  train.data.stratum$Y <- train.pheno[stratification.result$train.stratum.index[[i]], 3]-1

  test.data.stratum <- test.data[, feature.selection.result[[i]]$index, drop = FALSE]
  test.data.stratum$Y <- test.pheno[, 3]-1
  if(!is.null(covar.number){
    train.data.stratum <- cbind(train.data.stratum,
    train.pheno[stratification.result$train.stratum.index[[i]], covar.number + 2, drop = FALSE])
    test.data.stratum <- cbind(test.data.stratum, test.pheno[, covar.number + 2, drop = FALSE])
  }

  train.data.list[[i]] <- train.data.stratum
  test.data.list[[i]] <- test.data.stratum
}

LR.PV.pred <- LR.PV.model(train.data.list, test.data.list, stratification.result$test.prob.to.stratum)

## End(Not run)

abnerzyx/pv documentation built on Feb. 27, 2022, 12:06 a.m.