# Principal Component Analysis for outlier detection

### Description

`pcadapt`

performs principal component analysis and computes p-values to test for outliers. The test for
outliers is based on the correlations between genetic variation and the first `K`

principal components.
`pcadapt`

also handles Pool-seq data for which the statistical analysis is
performed on the genetic markers frequencies. Returns an object of class `pcadapt`

.

### Usage

1 2 3 |

### Arguments

`input` |
a character string specifying the name of the file to be processed with |

`K` |
an integer specifying the number of principal components to retain. |

`method` |
a character string specifying the method to be used to compute
the p-values. Three statistics are currently available, |

`data.type` |
a character string specifying the type of data being read, either a |

`min.maf` |
a value between |

`ploidy` |
an integer specifying the ploidy of the individuals. |

`output.filename` |
a character string specifying the names of the files created by |

`clean.files` |
a logical value indicating whether the auxiliary files should be deleted or not. |

`transpose` |
deprecated argument. |

`cover.matrix` |
a matrix specifying the average coverage per genetic marker and per population. |

### Details

First, a principal component analysis is performed on the scaled and centered genotype data. To account for missing
data, the correlation matrix between individuals is computed using only the markers available for each
pair of individuals. Depending on the specified `method`

, different test statistics can be used.

`mahalanobis`

(default): the robust Mahalanobis distance is computed for each genetic marker using a robust
estimate of both mean and covariance matrix between the `K`

vectors of z-scores.

`communality`

: the communality statistic measures the proportion of variance explained by the first `K`

PCs.

`componentwise`

: returns a matrix of z-scores.

To compute p-values, test statistics (`stat`

) are divided by a genomic inflation factor (`gif`

) when `method="mahalanobis"`

.
When `method="communality"`

, the test statistic is first multiplied by `K`

and divided by the percentage of variance explained by the first `K`

PCs
before accounting for genomic inflation factor. When using `method="mahalanobis"`

or `"communality"`

, the scaled statistics (`chi2_stat`

) should follow
a chi-squared distribution with `K`

degrees of freedom. When using `method="componentwise"`

, the z-scores should follow a chi-squared distribution with `1`

degree of freedom. For Pool-seq data, `pcadapt`

provides p-values based on the Mahalanobis distance for each SNP.

### Value

The returned value `x`

is an object of class `pcadapt`

.