check_P: Checking GWAS p-values

View source: R/check_P.R

check_PR Documentation

Checking GWAS p-values

Description

A simple test to check if the reported p-values in a GWAS results file match the other statistics. This function calculates an expected p-value (from the effect size and standard error) and then correlates it with the actual, reported p-value.

Usage

check_P(dataset, HQ_subset,
        plot_correlation = FALSE, plot_if_threshold = FALSE,
        threshold_r = 0.99,
        save_name = "dataset", save_dir = getwd(),
        header_translations,
        use_log = FALSE, dataN = nrow(dataset), ...)

Arguments

dataset

table with at least three columns: p-value, effect size and standard error.

HQ_subset

an optional logical or numeric vector indicating the rows in dataset that contain high quality SNPs.

plot_correlation

logical; should a scatterplot of the reported vs. calculated p-values be made? If TRUE, the plot is saved as a .png file.

plot_if_threshold

logical; if TRUE, the scatterplot is only saved when the correlation between reported and calculated p-values is lower than threshold_r.

threshold_r

numeric; the correlation threshold for the scatterplot.

save_name

character string; the filename, without extension, for the scatterplot.

save_dir

character string; the directory where the output files are saved. Note that R uses forward slash (/) where Windows uses backslash (\).

header_translations

translation table for column names See translate_header for more information. If the argument is left empty, dataset is assumed to use the standard column names used by QC_GWAS.

use_log, dataN

arguments used by QC_GWAS; redundant when check_P is used separately.

...

arguments passed to plot.

Details

check_P calculates the expected p-value by taking the chi-square (1 degree of freedom) of the effect size divided by the standard error squared.

In a typical GWAS dataset, the expected and observed p-values should correlate perfectly. If this isn't the case, the problem either lies in a misidentified column, or the wrong values were used when generating the dataset.

Value

The correlation between expected and reported p-values.

Examples

  data("gwa_sample")

  selected_SNPs <- HQ_filter(data = gwa_sample,
                             FRQ_val = 0.05,
                             cal_val = 0.95,
                             filter_NA = FALSE)
  # To calculate a correlation between predicted and actual p-values:
  check_P(gwa_sample, HQ_subset = selected_SNPs,
          plot_correlation = FALSE)
  
  # To plot the correlation:
  ## Not run: 
    check_P(gwa_sample, HQ_subset = selected_SNPs,
            plot_correlation = TRUE, plot_if_threshold = FALSE,
            save_name = "sample")
  
## End(Not run)

QCGWAS documentation built on May 30, 2022, 5:05 p.m.