aberrant: Outlier identification

Description Usage Arguments Details Value Author(s) References Examples

Description

Outlier identification based on two summary statistics.

Usage

1
aberrant(x, lambda, niter = 10000, burnin = 100, prior_df = NULL, prior_scale = NULL, hyper_prior_mean = NULL, hyper_prior_var = NULL, hyper_prior_df = NULL, hyper_prior_scale = NULL, alpha = NULL, beta = NULL, standardize = TRUE, verbose = TRUE, uncorr = FALSE)

Arguments

x

Vector, matrix or numeric data frame, with two summary statistics in column.

lambda

Ratio of the standard deviations of outlying and normal individuals.

niter

Number of samples to be generated by the Gibbs sampling. Default to 10000.

burnin

Number of burn-in samples. Default to 100.

prior_df

Degree(s) of freedom of the distribution that describes prior information on the covariance matrix of the summary statistics for the normal individuals. If uncorr=FALSE, degree of freedom of length 1 of an Inverse-Wishart distribution. If uncorr=TRUE, vector of degrees of freedom of length 2 for the 2 Scaled Inverse Chi-Square priors.

prior_scale

Scale matrix of the distribution that describes prior information on the covariance matrix of the summary statistics for the normal individuals. If uncorr=FALSE, scale matrix of an Inverse-Wishart distribution. If uncorr=TRUE, diagonal scale matrix describing scale parameters of the 2 Scaled Inverse Chi-Square priors.

hyper_prior_mean

Means of the normal priors for the mean hyper-parameters. Vector of length 2.

hyper_prior_var

Variances of the normal priors for the mean hyper-parameters. Vector of length 2.

hyper_prior_df

Degrees of freedom of the Scaled Inverse Chi-Square priors for the variance hyper-parameters. Vector of length 2.

hyper_prior_scale

Scale parameters of the Scaled Inverse Chi-Square priors for the variance hyper-parameters. Vector of length 2.

alpha

First shape parameter of the Beta distribution describing prior information on the probability that an individual is an outlier.

beta

Second shape parameter of the Beta distribution describing prior information on the probability that an individual is an outlier.

standardize

A logical indicating whether the summary statistics should be standardized.

verbose

logical. If TRUE, verbose output is generated during the Gibbs sampling.

uncorr

logical. If TRUE, summary statistics are considered independent.

Details

Prior parameters 'prior_df', 'prior_scale', 'hyper_prior_mean', 'hyper_prior_var', 'hyper_prior_df' and 'hyper_prior_scale' must be all NULL or all numeric. Prior parameters 'alpha' and 'beta' must be both NULL or both numeric. If prior parameters are not specified, then uninformative priors are used.

Value

x

Initial data matrix with the 2 summary statistics.

group

Vector indicating whether an individual is an outlier (=1) or not (=0). Individuals are in the same order as in the initial data matrix x.

posterior

Vector indicating the posterior probability for an individual to be an outlier.

lambda

Ratio of the standard deviations of outlying and normal individuals used.

post_mean

Posterior mean of the summary statistics for the normal individuals.

post_var

Posterior covariance matrix of the summary statistics for the normal individuals.

standardize

Logical indicating if summary statistics were standardized.

inlier

Indices of normal individuals.

outlier

Indices of outlying individuals.

Author(s)

Celine Bellenguez and Chris CA Spencer

References

Celine Bellenguez, Amy Strange, Colin Freeman, Wellcome Trust Case Control Consortium 2, Chris CA Spencer. A robust clustering algorithm for identifying problematic samples in genome-wide association studies. Bioinformatics.

Examples

1
2
x<-rmvt(1000, sigma=matrix ( c ( 1, 0.5 , 0.5 , 1 ) , 2 , 2 ), df=3)
aberrant (x, lambda=30, alpha=1, beta=20)

carbocation/aberrant documentation built on May 15, 2020, 6:04 p.m.