PCA: Principal component analysis

Description Usage Arguments Value Examples

View source: R/PCA.R

Description

A Function to run principal component analysis on input dataset using PLINK 1.9.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
PCA(
  input.dir,
  output.dir,
  train.genotype,
  test.genotype,
  PCA.separate = FALSE,
  PCs.count = 10,
  plink.path = NULL,
  verbose = TRUE
)

Arguments

input.dir

[character] The full absolute path to the directory containing the training and test dataset. If input.dir is missing, the current working directory obtained by getwd() is used.

output.dir

[character] The full absolute path where the result will be written to. If output.dir is missing, the current working directory obtained by getwd() is used.

train.genotype

[character] The prefix of PLINK binary files (bed/bim/fam) of the training dataset.

test.genotype

[character] The prefix of PLINK binary files (bed/bim/fam) of the test dataset.

PCA.separate

[logical] If TURE, the principal components are calculated from the training dataset and then project the test dataset onto those principal components. If FALSE, the principal components are calculated from the combined data of the training and test dataset. The default value is FALSE.

PCs.count

[numeric] To specify the number of top principal components that should be extracted. The default value is 10.

plink.path

[character] The full absolute path to the PLINK executable file. The executable to run is path/to/plink.exe if you are on Windows operating system, for Unix-like operating system this is path/to/plink. If plink.path is NULL, the PLINK PATH should be added as a system environment variable.

verbose

[logical] If TRUE, the PLINK log, error, and warning information are printed to standard out. The default value is TRUE.

Value

PCA returns a list containing eigenvalues and eigenvectors of the training and test dataset:

eigenvalue

A vector containing the top eigenvalues according to PCs.count specified.

train.eigenvector

A data frame containing the eigenvectors of the training dataset.

test.eigenvector

A data frame containing the eigenvectors of the test dataset.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
input.dir <- system.file("extdata", package="pv")
output.dir <- system.file("extdata", package="pv")
path2plink <- '/path/to/plink'
## Not run: 
pca.result <- PCA(input.dir = input.dir,
output.dir = output.dir,
train.genotype = "train",
test.genotype = "test",
PCA.separate = FALSE,
PCs.count = 10,
plink.path = path2plink,
verbose = TRUE)

## End(Not run)

abnerzyx/pv documentation built on Feb. 27, 2022, 12:06 a.m.