PV: Prism Vote
In abnerzyx/pv: Prism Vote: A stratified statistical framework for individual disease risk prediction

Description Usage Arguments Value See Also Examples

View source: R/PrismVote.R

Perform Prism Vote method to build a prediction model on the training dataset, and make predictions on the test dataset.

PV(
  input.dir,
  output.dir,
  train.genotype,
  train.phenotype,
  test.genotype,
  test.phenotype,
  covar.number.PV = NULL,
  covar.number.LR = NULL,
  PCA.separate = FALSE,
  PCs.count = 10,
  stratum.count = 2,
  plink.path = NULL,
  P.value = NULL,
  topK = 10,
  candidate.SNPs = NULL,
  CS = FALSE,
  verbose = TRUE
)

`input.dir`	[character] The full absolute path to the directory containing the training and test dataset. If `input.dir` is missing, the current working directory obtained by `getwd()` is used.
`output.dir`	[character] The full absolute path where the result will be written to. If `output.dir` is missing, the current working directory obtained by `getwd()` is used.
`train.genotype`	[character] The prefix of PLINK binary files (bed/bim/fam) of the training dataset.
`train.phenotype`	[character] A space- or tab-delimited file to specify an alternate phenotype of the training dataset for the logistic regression analysis using the "`--pheno`" flag in plink. This file must have a header row. The first and second columns of the phenotype file must be "FID" and "IID", the case/control phenotype in column 3 (1 = unaffected (control), 2 = affected (case)), and covariates in remaining columns. See the PLINK 1.9 documentation for details (https://www.cog-genomics.org/plink/1.9/).
`test.genotype`	[character] The prefix of PLINK binary files (bed/bim/fam) of the test dataset.
`test.phenotype`	[character] A space- or tab-delimited file to specify an alternate phenotype of the training dataset for the logistic regression analysis using the "`--pheno`" flag in plink. This file must have a header row. The first and second columns of the phenotype file must be "FID" and "IID", the case/control phenotype in column 3 (1 = unaffected (control), 2 = affected (case)), and covariates in remaining columns. See the PLINK 1.9 documentation for details (https://www.cog-genomics.org/plink/1.9/).
`covar.number.PV`	[vector] Used in PV model to specify a subset of column numbers of covariates to load from `phenotype` file using the "`--covar-number`" flag in plink (via `plink.lr`). If NULL (the default), the logistic regression model without covariate adjustment.
`covar.number.LR`	[vector] Used in LR and PRS model to specify a subset of column numbers of covariates to load from `phenotype` file using the "`--covar-number`" flag in plink (via `plink.lr`). If NULL (the default), the logistic regression model without covariate adjustment.
`PCA.separate`	[logical] If TURE, the principal components are calculated from the training dataset and then project the test dataset onto those principal components. If FALSE, the principal components are calculated from the combined data of the training and test dataset. The default value is FALSE.
`PCs.count`	[numeric] To specify the number of top principal components that should be extracted. The default value is 10.
`stratum.count`	[numeric] To specify the number of strata, as default the sample size of each stratum is N/stratum.count, N is the sample size. The default value is 2.
`plink.path`	[character] The full absolute path to the PLINK executable file. The executable to run is path/to/plink.exe if you are on a Windows operating system, for Unix-like operating system this is path/to/plink. If `plink.path` is NULL, the PLINK PATH should be added as a system environment variable.
`P.value`	[double] To specify the genome-wide significance P-value threshold to select the significant SNPs to build a prediction model. The default value is NULL. This value is ignored when candidate.SNPs is not NULL. When left NULL (the default), the topK or candidate.SNPs will be used. The P-value of each SNP is calculated from logistic regression analysis using PLINK 1.9 (via `plink.lr`).
`topK`	[numeric] To specify the top K significant SNPs to build a prediction model. For a fair comparison, the number of the top-ranked SNPs from entire sample (for LR and PRS model) equals to the number of the unique union set of the selected SNPs from each stratum in PV. The default value is 10. This value is ignored when P.value or candidate.SNPs is not NULL.
`candidate.SNPs`	[vector] A character vector of SNP name, used to specify the candidate SNPs to build a prediction model, ignores `P.value` and `topK`. The default value is NULL. Should match the names of SNPs in the provided PLINK binary files.
`CS`	[logical] If TRUE, the softmax of cosine similarity will be used to calculate the probability that the samples belong to each stratum. If FALSE, the squared distance of a subject to a cluster center empirically follows a chi-squared distribution will be used. The default value is FALSE.
`verbose`	[logical] If TRUE, the PLINK log, error, and warning information are printed to standard out. The default value is TRUE.

PV returns a list with the following components:

`stratification.result`	The output of `stratification`
`feature.selection.result`	The output of `feature.selection`
`LR.result`	The results of logistic regression model. A list with i) predict, the output of `LR.model`. ii) performance, a list containing the AUC and accuracy (Acc) value.
`LR.PV.result`	The results of the logistic regression model under the Prism Vote framework. A list with i) predict, the output of `LR.PV.model`. ii) performance, a list containing the AUC and accuracy (Acc) value.
`PRS.result`	The results of the logistic regression model based on the polygenic risk score. A list with i).predict, the output of `LR.model`. ii) performance, a list containing the AUC and accuracy (Acc) value.
`PRS.PV.result`	The results of the logistic regression model based on the polygenic risk score under the Prism Vote framework. A list with i) predict, the output of `LR.PV.model`. ii) performance, a list containing the AUC and accuracy (Acc) value.

PCA, stratification, LR.model, PV.model, PRS, feature.selection

input.dir <- system.file("extdata", package="pv")
output.dir <- system.file("extdata", package="pv")
path2plink <- '/path/to/plink'
## Not run: 
pv.result <- PV(input.dir = input.dir,
output.dir = output.dir,
train.genotype = "train",
train.phenotype = "train.phenotypes.txt",
test.genotype = "test",
test.phenotype = "test.phenotypes.txt",
covar.number.PV = c(2,3),
covar.number.LR = c(2,3),
PCA.separate = FALSE,
PCs.count = 10,
stratum.count = 2,
plink.path = path2plink,
P.value = NULL,
topK = 10,
CS = FALSE,
candidate.SNPs = NULL,
verbose = TRUE)

## End(Not run)

abnerzyx/pv documentation built on Feb. 27, 2022, 12:06 a.m.

abnerzyx/pv index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

abnerzyx/pv
Prism Vote: A stratified statistical framework for individual disease risk prediction

PV: Prism Vote
In abnerzyx/pv: Prism Vote: A stratified statistical framework for individual disease risk prediction

Description

Usage

Arguments

Value

See Also

Examples

Related to PV in abnerzyx/pv...

R Package Documentation

Browse R Packages

We want your feedback!

abnerzyx/pv Prism Vote: A stratified statistical framework for individual disease risk prediction

PV: Prism Vote In abnerzyx/pv: Prism Vote: A stratified statistical framework for individual disease risk prediction

Description

Usage

Arguments

Value

See Also

Examples

Related to PV in abnerzyx/pv...

R Package Documentation

Browse R Packages

We want your feedback!

abnerzyx/pv
Prism Vote: A stratified statistical framework for individual disease risk prediction

PV: Prism Vote
In abnerzyx/pv: Prism Vote: A stratified statistical framework for individual disease risk prediction