Description Usage Arguments Details Value Author(s) References See Also Examples

`pcair`

is used to perform a Principal Components Analysis using genome-wide SNP data for the detection of population structure in a sample. Unlike a standard PCA, PC-AiR accounts for sample relatedness (known or cryptic) to provide accurate ancestry inference that is not confounded by family structure.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ```
## S4 method for signature 'gds.class'
pcair(gdsobj, kinobj, divobj,
kin.thresh = 2^(-11/2), div.thresh = -2^(-11/2),
unrel.set = NULL,
sample.include = NULL, snp.include = NULL,
num.cores = 1L, verbose = TRUE, ...)
## S4 method for signature 'SNPGDSFileClass'
pcair(gdsobj, ...)
## S4 method for signature 'GdsGenotypeReader'
pcair(gdsobj, ...)
## S4 method for signature 'GenotypeData'
pcair(gdsobj, ...)
## S4 method for signature 'SeqVarGDSClass'
pcair(gdsobj, ...)
``` |

`gdsobj` |
An object providing a connection to a GDS file. |

`kinobj` |
A symmetric matrix of pairwise kinship coefficients for every pair of individuals in the sample: upper and lower triangles must both be filled; diagonals should be self-kinship or set to a non-missing constant value. This matrix is used for partitioning the sample into the 'unrelated' and 'related' subsets. See 'Details' for how this interacts with |

`divobj` |
A symmetric matrix of pairwise ancestry divergence measures for every pair of individuals in the sample: upper and lower triangles must both be filled; diagonals should be set to a non-missing constant value. This matrix is used for partitioning the sample into the 'unrelated' and 'related' subsets. See 'Details' for how this interacts with |

`kin.thresh` |
Threshold value on |

`div.thresh` |
Threshold value on |

`unrel.set` |
An optional vector of IDs for identifying individuals that are forced into the unrelated subset. See 'Details' for how this interacts with |

`sample.include` |
An optional vector of IDs for selecting samples to consider for either set. |

`snp.include` |
An optional vector of snp or variant IDs to use in the analysis. |

`num.cores` |
The number of cores to use. |

`verbose` |
Logical indicator of whether updates from the function should be printed to the console; the default is TRUE. |

`...` |
Additional arguments to pass to |

The basic premise of PC-AiR is to partition the entire sample of individuals into an ancestry representative 'unrelated subset' and a 'related set', perform standard PCA on the 'unrelated subset', and predict PC values for the 'related subset'.

We recommend using software that accounts for population structure to estimate pairwise kinship coefficients to be used in `kinobj`

. Any pair of individuals with a pairwise kinship greater than `kin.thresh`

will be declared 'related.' Kinship coefficient estimates from the KING-robust software are typically used as measures of ancestry divergence in `divobj`

. Any pair of individuals with a pairwise divergence measure less than `div.thresh`

will be declared ancestrally 'divergent'. Typically, `kin.thresh`

and `div.thresh`

are set to be the amount of error around 0 expected in the estimate for a pair of truly unrelated individuals.

`kinobj`

and `divobj`

may be identical.

There are multiple ways to partition the sample into an ancestry representative 'unrelated subset' and a 'related subset'. If `unrel.set = NULL`

, then the PC-AiR algorithm is used to find an 'optimal' partition (see 'References' for a paper describing the algorithm). If `unrel.set`

is specified, then all individuals with IDs in `unrel.set`

are forced in the 'unrelated subset' and the PC-AiR algorithm is used to partition the rest of the sample; this is especially useful for including reference samples of known ancestry in the 'unrelated subset'.

An object of class '`pcair`

'. A list including:

`vectors` |
A matrix of principal components; each column is a principal component. Sample IDs are provided as rownames. The number of PCs returned can be adjusted by supplying the |

`values` |
A vector of eigenvalues matching the principal components. These values are determined from the standard PCA run on the 'unrelated subset'. |

`rels` |
A vector of IDs for individuals in the 'related subset'. |

`unrels` |
A vector of IDs for individuals in the 'unrelated subset'. |

`kin.thresh` |
The threshold value used for declaring each pair of individuals as related or unrelated. |

`div.thresh` |
The threshold value used for determining if each pair of individuals is ancestrally divergent. |

`sample.id` |
A vector of IDs for the samples used in the analysis. |

`nsamp` |
The total number of samples in the analysis. |

`nsnps` |
The total number of SNPs used in the analysis. |

`varprop` |
The variance proportion for each principal component. |

`call` |
The function call passed to |

`method` |
A character string. Either "PC-AiR" or "Standard PCA" identifying which method was used for computing principal components. "Standard PCA" is used if no relatives were identified in the sample. |

Matthew P. Conomos

Conomos M.P., Miller M., & Thornton T. (2015). Robust Inference of Population Structure for Ancestry Prediction and Correction of Stratification in the Presence of Relatedness. Genetic Epidemiology, 39(4), 276-293.

Manichaikul, A., Mychaleckyj, J.C., Rich, S.S., Daly, K., Sale, M., & Chen, W.M. (2010). Robust relationship inference in genome-wide association studies. Bioinformatics, 26(22), 2867-2873.

`pcairPartition`

for a description of the function used by `pcair`

that can be used to partition the sample into 'unrelated' and 'related' subsets without performing PCA.
`plot.pcair`

for plotting.
`kingToMatrix`

for creating a matrix of pairwise kinship coefficient estimates from KING output text files that can be used for `kinobj`

or `divobj`

.
`GWASTools`

for a description of the package containing the following functions: `GenotypeData`

for a description of creating a `GenotypeData`

class object for storing sample and SNP genotype data, `MatrixGenotypeReader`

for a description of reading in genotype data stored as a matrix, and `GdsGenotypeReader`

for a description of reading in genotype data stored as a GDS file. Also see `snpgdsBED2GDS`

in the `SNPRelate`

package for a description of converting binary PLINK files to GDS. The generic functions `summary`

and `print`

.

1 2 3 4 5 6 7 8 9 10 | ```
# file path to GDS file
gdsfile <- system.file("extdata", "HapMap_ASW_MXL_geno.gds", package="GENESIS")
# read in GDS data
HapMap_geno <- gdsfmt::openfn.gds(gdsfile)
# load saved matrix of KING-robust estimates
data("HapMap_ASW_MXL_KINGmat")
# run PC-AiR
mypcair <- pcair(HapMap_geno, kinobj = HapMap_ASW_MXL_KINGmat,
divobj = HapMap_ASW_MXL_KINGmat)
gdsfmt::closefn.gds(HapMap_geno)
``` |

smgogarten/GENESIS documentation built on Nov. 16, 2018, 9:25 p.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.