View source: R/script_v12-3_package.R
EWAS_series | R Documentation |
This function runs a QC (via the function EWAS_QC
)
over multiple files and generates additional graphs to
comparing the results of these files.
EWAS_series(EWAS_files, output_files, map, N, header_translations, save_final_dataset = TRUE, gzip_final_dataset = TRUE, high_quality_plots = FALSE, N_plot_beta = 500000L, ...)
EWAS_files |
a character vector containing the filenames of the EWAS results to be QC'ed. |
output_files |
a character vector containing the filenames of the output
files. Do not add an extension; |
map |
a data frame with chromosome and position values of the CpGs
in |
N |
a data frame containing the filenames (as listed in the
|
header_translations |
a translation table for the column names of the EWAS files,
or the name of a file containing the same. See
|
save_final_dataset,
gzip_final_dataset,
high_quality_plots |
logical values. See |
N_plot_beta |
integer specifying how many beta values per file should be used in the effect-size comparison plot. Set this to a value larger than the number of markers in the datasets to include all markers. |
... |
arguments passed to |
QCEWAS
includes a Quick-Start guide in the doc
folder of the library. This guide will explain how to
run a QC and how to interpret the results.
The start-up message when loading
QCEWAS
will indicate where it can be found on your
computer. In brief,
EWAS_series
works by calling EWAS_QC
for
every filename given in EWAS_files
. After all files
have been processed, it will generate two additional graphs: a
precision plot (provided N
was specified) and a
beta-distribution plot. The former shows the distribution of
precision (1 / median standard error) against the
square root of the sample size of the results file. Normally,
one expects to see a roughly positive correlation (i.e. the
cohorts ought to cluster around the linear diagonal from the
lower left to the upper right). The presence of outliers means
that the outlying cohort(s) have a far higher/lower
uncertainty in their estimates that can be expected from their
sample size. This could indicate a different method, a
different measure (check the effect-size distribution plot) or
possibly over- or undersignificance of their estimates (check
the QQ plot and lambda value).
The effect-size distribution plot allows comparison of the effect-size scale of different files. One expects the distribution to become somewhat narrower as sample size increases. However, large differences in scale suggest that the files used different units for their measurements.
As of version 1.2-0, the effect-size distribution plot shows
a random (rather than proportional) selection of effect-sizes
from the cohort. As a consequence, rerunning QC over a
dataset may result in a slightly different distribution plot
in each run. This is only a cosmetic issue (as the default
sample size is sufficiently large to include the majority of a
normally-sized EWAS dataset) and can be averted entirely by
changing the N_plot_beta
argument to a value exceeding
the number of markers in the dataset(s).
Both plots use numbers rather than names to identify files.
The full filenames and corresponding numbers are listed in the
EWAS_QC_legend.txt file that is generated after EWAS_series
completes.
The main output of EWAS_series
are the cleaned results
files, logs and graphs. The function also returns an invisible
data frame (also saved as EWAS_QC_legend.txt
), listing
the input file names, file numbers,
whether they passed a complete QC (note that this merely
indicates that the QC was completed, not that there were no
problems), the standard error and, if specified, the sample
size.
EWAS_QC
# For use in this example, the 4 sample files in the # extdata folder of the QCEWAS library will be copied # to your current R working directory. Running the QC # generates several files in your working directory: # consult the Quick-Start Guide for more information # on how to interpret these. ## Not run: file.copy(from = file.path(system.file("extdata", package = "QCEWAS"), "sample_map.txt.gz"), to = getwd(), overwrite = FALSE, recursive = FALSE) file.copy(from = file.path(system.file("extdata", package = "QCEWAS"), "sample1.txt.gz"), to = getwd(), overwrite = FALSE, recursive = FALSE) file.copy(from = file.path(system.file("extdata", package = "QCEWAS"), "sample2.txt.gz"), to = getwd(), overwrite = FALSE, recursive = FALSE) file.copy(from = file.path(system.file("extdata", package = "QCEWAS"), "translation_table.txt"), to = getwd(), overwrite = FALSE, recursive = FALSE) sample_list <- c("sample1.txt.gz", "sample2.txt.gz") sample_N <- data.frame(file = sample_list, N = c(77, 79), stringsAsFactors = FALSE) QC_results <- EWAS_series(EWAS_files = sample_list, output_files = c("sample_output1", "sample_output2"), map = "sample_map.txt.gz", N = sample_N, header_translations = "translation_table.txt", save_final_dataset = FALSE, threshold_outliers = c(-20, 20), exclude_outliers = FALSE, exclude_X = TRUE, exclude_Y = FALSE) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.