PV: Prism Vote

Description Usage Arguments Value See Also Examples

View source: R/PrismVote.R

Description

Perform Prism Vote method to build a prediction model on the training dataset, and make predictions on the test dataset.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
PV(
  input.dir,
  output.dir,
  train.genotype,
  train.phenotype,
  test.genotype,
  test.phenotype,
  covar.number.PV = NULL,
  covar.number.LR = NULL,
  PCA.separate = FALSE,
  PCs.count = 10,
  stratum.count = 2,
  plink.path = NULL,
  P.value = NULL,
  topK = 10,
  candidate.SNPs = NULL,
  CS = FALSE,
  verbose = TRUE
)

Arguments

input.dir

[character] The full absolute path to the directory containing the training and test dataset. If input.dir is missing, the current working directory obtained by getwd() is used.

output.dir

[character] The full absolute path where the result will be written to. If output.dir is missing, the current working directory obtained by getwd() is used.

train.genotype

[character] The prefix of PLINK binary files (bed/bim/fam) of the training dataset.

train.phenotype

[character] A space- or tab-delimited file to specify an alternate phenotype of the training dataset for the logistic regression analysis using the "--pheno" flag in plink. This file must have a header row. The first and second columns of the phenotype file must be "FID" and "IID", the case/control phenotype in column 3 (1 = unaffected (control), 2 = affected (case)), and covariates in remaining columns. See the PLINK 1.9 documentation for details (https://www.cog-genomics.org/plink/1.9/).

test.genotype

[character] The prefix of PLINK binary files (bed/bim/fam) of the test dataset.

test.phenotype

[character] A space- or tab-delimited file to specify an alternate phenotype of the training dataset for the logistic regression analysis using the "--pheno" flag in plink. This file must have a header row. The first and second columns of the phenotype file must be "FID" and "IID", the case/control phenotype in column 3 (1 = unaffected (control), 2 = affected (case)), and covariates in remaining columns. See the PLINK 1.9 documentation for details (https://www.cog-genomics.org/plink/1.9/).

covar.number.PV

[vector] Used in PV model to specify a subset of column numbers of covariates to load from phenotype file using the "--covar-number" flag in plink (via plink.lr). If NULL (the default), the logistic regression model without covariate adjustment.

covar.number.LR

[vector] Used in LR and PRS model to specify a subset of column numbers of covariates to load from phenotype file using the "--covar-number" flag in plink (via plink.lr). If NULL (the default), the logistic regression model without covariate adjustment.

PCA.separate

[logical] If TURE, the principal components are calculated from the training dataset and then project the test dataset onto those principal components. If FALSE, the principal components are calculated from the combined data of the training and test dataset. The default value is FALSE.

PCs.count

[numeric] To specify the number of top principal components that should be extracted. The default value is 10.

stratum.count

[numeric] To specify the number of strata, as default the sample size of each stratum is N/stratum.count, N is the sample size. The default value is 2.

plink.path

[character] The full absolute path to the PLINK executable file. The executable to run is path/to/plink.exe if you are on a Windows operating system, for Unix-like operating system this is path/to/plink. If plink.path is NULL, the PLINK PATH should be added as a system environment variable.

P.value

[double] To specify the genome-wide significance P-value threshold to select the significant SNPs to build a prediction model. The default value is NULL. This value is ignored when candidate.SNPs is not NULL. When left NULL (the default), the topK or candidate.SNPs will be used. The P-value of each SNP is calculated from logistic regression analysis using PLINK 1.9 (via plink.lr).

topK

[numeric] To specify the top K significant SNPs to build a prediction model. For a fair comparison, the number of the top-ranked SNPs from entire sample (for LR and PRS model) equals to the number of the unique union set of the selected SNPs from each stratum in PV. The default value is 10. This value is ignored when P.value or candidate.SNPs is not NULL.

candidate.SNPs

[vector] A character vector of SNP name, used to specify the candidate SNPs to build a prediction model, ignores P.value and topK. The default value is NULL. Should match the names of SNPs in the provided PLINK binary files.

CS

[logical] If TRUE, the softmax of cosine similarity will be used to calculate the probability that the samples belong to each stratum. If FALSE, the squared distance of a subject to a cluster center empirically follows a chi-squared distribution will be used. The default value is FALSE.

verbose

[logical] If TRUE, the PLINK log, error, and warning information are printed to standard out. The default value is TRUE.

Value

PV returns a list with the following components:

stratification.result

The output of stratification

feature.selection.result

The output of feature.selection

LR.result

The results of logistic regression model. A list with i) predict, the output of LR.model. ii) performance, a list containing the AUC and accuracy (Acc) value.

LR.PV.result

The results of the logistic regression model under the Prism Vote framework. A list with i) predict, the output of LR.PV.model. ii) performance, a list containing the AUC and accuracy (Acc) value.

PRS.result

The results of the logistic regression model based on the polygenic risk score. A list with i).predict, the output of LR.model. ii) performance, a list containing the AUC and accuracy (Acc) value.

PRS.PV.result

The results of the logistic regression model based on the polygenic risk score under the Prism Vote framework. A list with i) predict, the output of LR.PV.model. ii) performance, a list containing the AUC and accuracy (Acc) value.

See Also

PCA, stratification, LR.model, PV.model, PRS, feature.selection

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
input.dir <- system.file("extdata", package="pv")
output.dir <- system.file("extdata", package="pv")
path2plink <- '/path/to/plink'
## Not run: 
pv.result <- PV(input.dir = input.dir,
output.dir = output.dir,
train.genotype = "train",
train.phenotype = "train.phenotypes.txt",
test.genotype = "test",
test.phenotype = "test.phenotypes.txt",
covar.number.PV = c(2,3),
covar.number.LR = c(2,3),
PCA.separate = FALSE,
PCs.count = 10,
stratum.count = 2,
plink.path = path2plink,
P.value = NULL,
topK = 10,
CS = FALSE,
candidate.SNPs = NULL,
verbose = TRUE)

## End(Not run)

abnerzyx/pv documentation built on Feb. 27, 2022, 12:06 a.m.