ANEVADOT_test: ANEVA Dosage Outlier Test
In PejLab/ASEtestTools: ANEVA Dosage Outlier Test

Description Usage Arguments Value Examples

View source: R/ANEVA_DOT.R

This test is designed to detect if ASE data reveals sufficient allelic imbalance to induce an outlier in total gene expression.

1
2
3

ANEVADOT_test(ASEdat, output_columns = c("refCount", "altCount"),
  eh1 = "refCount", eh2 = "altCount", Eg_std, r0 = NULL, p0 = NULL,
  FDR = 0.05, coverage = 10, plot = TRUE)

`ASEdat`	Dataframe containing columns with reference and alternative count data.
`output_columns`	Vector containing any ASEdat column name strings which the user wishes to duplicate in the output.
`eh1`	String containing the column name of the reference count data.
`eh2`	String with the column name of the alternative count data.
`Eg_std`	Vector containing genetic standard deviation, the square root of genetic variation in regulation in general population in natural log scale (numeric value). Variance values from GTEx are available on Github.com/PejLab. For additional utilities and starter code to match Ensembl IDs with population variance estimates from GTEx v7, please see vgdat package. Square root transformation must be applied to GTEx variance estimates before running this test. P-values will not be generated for records with missing or infinite standard deviations. Eg_std vector must be in one-to-one correspondence with ASE count data, and must be ordered correctly.
`r0`	The ratio of the eh1 allele (i.e., eh1/(eh1+eh2)) in the absence of any regulatory difference (reference bias due to alignment). The simplest way to get such an estimate would be to get the median ratio between eh1 and eh2 across the entire library (i.e., eh1/(eh1+eh2)). This is done automatically if no user-specified r0 value is detected. Can be a single value across entire library, or a SNP-wise vector.
`p0`	An average noise rate p(R->A) or p(A->R), i.e., the probability of seeing an allele due to noise when it is essentially not there (for GTEx v7: LAMP = 0.0003). Can be a single value across entire library, or a SNP-wise vector.
`FDR`	A numeric value between 0 and 1 indicating the desired false discovery rate. Default FDR is 0.05.
`coverage`	A numeric value such that if total allelic count is less than this value, p-values will not be generated for that record. Default value is 10.
`plot`	Logical T/F indicating whether plots should be generated for the test data.

Data frame containing the user-specified output_columns, as well as unadjusted and adjusted p-values for detection of potential dosage outlier, testing H0:There does not exist a significantly abnormal allelic imbalance for this SNP, vs. H1: Allelic ratio significantly deviates from normal population. P-values are adjusted using Benjamini-Hoschberg method. P-values are not generated for records with missing or infinite standard deviations.

# Define the output columns
output_columns <- c("GENE_ID", "TISSUE_ID",  "REF_COUNT", "ALT_COUNT", "TOTAL_COUNT", "NULL_RATIO")

# re organize the tables by:
# 1: Selecting only genes that have Vg scores available
# 2: Reordering ASE data and Vg scores so they align
tiss <- "MSCLSK" # The data comes from a skeletal muscle sample
covered_genes <- intersect(Vg_GTEx_v7$IDs, sample_ASE$GENE_ID)
covered_gene_Vgs <- Vg_GTEx_v7[match(covered_genes, Vg_GTEx_v7$IDs), tiss]
covered_gene_ASE_data <- sample_ASE[match(covered_genes, sample_ASE$GENE_ID),]

# Take the square root of the Vg scores to the get the Standard Deviation (SDg)
covered_gene_SDgs <- sqrt(covered_gene_Vgs)

# Run ANEVA-DOT
ANEVADOT_scores <- ANEVADOT_test(covered_gene_ASE_data, output_columns = output_columns,
                                 eh1 = "REF_COUNT", eh2 = "ALT_COUNT", coverage = 10,
                                 r0 = covered_gene_ASE_data$NULL_RATIO,
                                 Eg_std = covered_gene_SDgs, plot = TRUE)