View source: R/script_v12-3_package.R
EWAS_QC | R Documentation |
The main function of the QCEWAS package.
EWAS_QC
accepts a single EWAS results file and runs a
thorough quality check (QC),
optionally applies various filters and generates QQ, Volcano
and Manhattan plots. The function EWAS_series
can be used to process multiple results files sequentially.
EWAS_QC(data, map, outputname, header_translations, threshold_outliers = c(NA, NA), markers_to_exclude, exclude_outliers = FALSE, exclude_X = FALSE, exclude_Y = FALSE, save_final_dataset = TRUE, gzip_final_dataset = TRUE, header_final_dataset = "standard", high_quality_plots = FALSE, return_beta = FALSE, N_return_beta = 500000L, ...)
data |
a data frame with EWAS results, or the name of a file
containing the same. The table must include the columns
|
map |
a data frame with chromosome and position values of the
probes, or the name of a file containing the same. This
argument is optional: if no map is specified,
|
outputname |
a character string specifying the intended filename for the
output. This includes not only the cleaned results file and
the log, but also any graphs created. Do not include an
extension; |
header_translations |
a translation table for the column names of the input file,
or the name of a file containing the same. This argument is
optional: if not specified, |
threshold_outliers |
a numeric string of length two. This defines which effect
sizes will be treated as outliers. The first value specifies
the lower limit (i.e. markers with effect sizes below this
value are considered outliers), the second the upper limit.
The check for low or high outliers is skipped if the
respective value is set to |
markers_to_exclude |
Either a vector or data frame containing a list of CpG IDs
that need to be excluded before starting the QC (in case of
a data frame only the first column will be processed), or
the name of a file containing the same. This argument is
optional: if not specified, no exclusions are made. Note
that when a single value (a vector of length 1) is
passed to this argument, |
exclude_outliers |
a logical value determining how outliers are treated. If
|
exclude_X, exclude_Y |
logical values determining whether markers at the X and Y
chromosome respectively are excluded from the final dataset.
This requires providing a map to |
save_final_dataset |
logical determining whether the cleaned dataset will be saved. |
gzip_final_dataset |
logical determining whether the saved dataset will be compressed in the .gz format. |
header_final_dataset |
either a character vector or a table determining the header
names used in the final dataset, or the name of a file
containing the same. If |
high_quality_plots |
logical. Setting this to TRUE will save the graphs as high-resolution tiff images. |
return_beta, N_return_beta |
arguments used by |
... |
arguments passed to |
QCEWAS
includes a Quick-Start guide in the doc
folder of the library. This guide will explain how to
run a QC and how to interpret the results.
The start-up message when loading
QCEWAS
will indicate where it can be found on your
computer. In brief, the QC consists of the following 5 stages:
Checking data integrity:
The values inside the EWAS results are tested for validity.
If impossible p-values, effect-sizes, etc. are encountered,
EWAS_QC
generates a warning in the R console and sets
them to NA
.
Filter for outliers and sex-chromosomes (optional)
Counts the number of outlying markers, as well as chromosome
X and Y markers, and deletes them if specified. The markers
named in markers_to_exclude
are removed here as well.
Generating QC plots
A histogram of beta and standard error distribution is plotted.
The p-values are checked by correlating and plotting them against p-values calculated from the effect size and standard error.
A QQ plot is generated to test for over/undersignificance.
A Manhattan plot is generated to see where the signals (if any) are located.
A Volcano plot is generated to check the distribution of effect sizes vs. p values.
Creating a QC log
The log contains notes about any problems encountered during the QC, as well as several tables describing the data.
Saving the cleaned dataset (optional)
The main output of EWAS_QC
are the cleaned results
file, log file and QC graphs. However, the function also
returns a list with 9 elements:
data_input |
the file name of the input file, if loaded from a file. If not, this will be an empty character string. |
file |
the filename of the cleaned results file. |
QC_success |
logical, indicates whether |
lambda |
the lambda value of reported p-values in the cleaned dataset. |
p_cor |
the correlation between reported and expected (based on effect size and standard error) p values. |
N |
a named integer vector reporting how many markers
were in the original dataset, how many had missing values,
how many were on chromosomes X and Y, how many were outliers,
how many were removed and how many are in the final, cleaned
dataset. Has no relation to the |
SE_median |
a numeric value: the median of the standard errors in the cleaned dataset. |
mean_methylation |
a |
effect_size |
if |
The function will return a warning if it encounters p-values
< 1e-300, as this is close to the smallest number that R can
process correctly. Various functions in the QCEWAS
package will set these values to 1e-300 to ensure proper
handling.
See EWAS_series
for running a QC over multiple
files.
See EWAS_plots
and P_correlation
for carrying out specific steps of the QC.
# For use in this example, the 2 sample files in the # extdata folder of the QCEWAS library will be copied # to your current R working directory. Running the QC # generates 7 new files in your working directory: # a cleaned, post-QC dataset, a log file, and 5 graphs. # Consult the Quick-Start guide for more information on # how to interpret these. ## Not run: file.copy(from = file.path(system.file("extdata", package = "QCEWAS"), "sample_map.txt.gz"), to = getwd(), overwrite = FALSE, recursive = FALSE) file.copy(from = file.path(system.file("extdata", package = "QCEWAS"), "sample1.txt.gz"), to = getwd(), overwrite = FALSE, recursive = FALSE) QC_results <- EWAS_QC(data = "sample1.txt.gz", map = "sample_map.txt.gz", outputname = "sample_output", threshold_outliers = c(-20, 20), exclude_outliers = FALSE, exclude_X = TRUE, exclude_Y = FALSE, save_final_dataset = TRUE, gzip_final_dataset = FALSE) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.