Description Usage Arguments Value See Also Examples
View source: R/Stratification.R
A Function to stratify samples to several strata using top principal components.
1 2 3 4 5 6 7 8 9 10 11 12 | stratification(
input.dir,
output.dir,
train.genotype,
test.genotype,
stratum.count = 2,
PCA.separate = FALSE,
PCs.count = 10,
plink.path = NULL,
CS = FALSE,
verbose = TRUE
)
|
input.dir |
[character] The full absolute path to the directory containing the training and test dataset. If |
output.dir |
[character] The full absolute path where the result will be written to. If |
train.genotype |
[character] The prefix of PLINK binary files (bed/bim/fam) of the training dataset. |
test.genotype |
[character] The prefix of PLINK binary files (bed/bim/fam) of the test dataset. |
stratum.count |
[numeric] To specify the number of strata, as default the sample size of each stratum is N/stratum.count, N is the sample size. |
PCA.separate |
[logical] If TURE, the principal components are calculated from the training dataset and then project the test dataset onto those principal components. If FALSE, the principal components are calculated from the combined data of the training and test dataset. The default value is FALSE. |
PCs.count |
[numeric] To specify the number of top principal components that should be extracted. The default value is 10. |
plink.path |
[character] The full absolute path to the PLINK executable file. The executable to run is path/to/plink.exe if you are on a Windows operating system, for Unix-like operating system this is path/to/plink. If |
CS |
[logical] If TRUE, the softmax of cosine similarity will be used to calculate the probability that the samples belong to each stratum. If FALSE, the squared distance of a subject to a cluster center empirically follows a chi-squared distribution will be used. The default value is FALSE. |
verbose |
[logical] If TRUE, the PLINK log, error, and warning information are printed to standard out. The default value is TRUE. |
stratification
returns a list containing the following components:
train.stratum |
A vector containing the stratum number each training sample belongs to. |
train.stratum.index |
A list containing the index of training samples belonging to each stratum. |
stratum.center |
A vector containing the center of each stratum. |
train.distance.to.center |
A list containing the distance between the training samples and the center of each stratum. |
test.distance.to.center |
A list containing the distance between the training samples and the center of each stratum. |
train.prob.to.g |
A list containing the probability of the training samples having variable g under the hypothesis that the sample belongs to each stratum. |
test.prob.to.g |
A list containing the probability of the test samples having variable g under the hypothesis that the sample belongs to each stratum. |
train.prob.to.stratum |
A list containing the probability that the training samples belong to each stratum. |
test.prob.to.stratum |
A list containing the probability that the test samples belong to each stratum. |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | input.dir <- system.file("extdata", package="pv")
output.dir <- system.file("extdata", package="pv")
path2plink <- '/path/to/plink'
## Not run:
stratification.result <- stratification(input.dir = input.dir,
output.dir = input.dir,
train.genotype = "train",
test.genotype = "test",
stratum.count = 2,
PCA.separate = FALSE,
PCs.count = 10,
plink.path = path2plink,
CS = FALSE,
verbose = TRUE)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.