Description Usage Arguments Details Value Examples
combine_snptest
proceeds by chromosome: it summarizes and
combines all snptest output files for a chromosome and writes the
resulting data frame to a file in ‘outdir’. The chromosome-specific
output file is named according to ‘template’ with the "<CHROMOSOME>"
placeholder replaced by the actual chromosome.
1 2 3 4 5 |
indir |
Directory with snptest output files. |
outdir |
Directory where summarized and combined snptest output should go. |
ncore |
Number of cores to use in parallel. |
old2new |
Character vector used for renaming, selecting, and reordering columns from snptest output (see Details). |
select |
If TRUE, ‘old2new’ will be used to select and reorder a subset of columns for snptest output. |
pattern |
Regex pattern used to match snptest output files in ‘indir’. |
template |
String serving as template for naming output files. Must contain the substring "<CHROMOSOME>" which will be replaced by the actual chromosome. |
chr_chunk |
Extended regular expression with two parenthesized subexpressions matching chromosome and chunk number in snptest output files in ‘indir’. |
gzip |
If TRUE chromosome-specific output files will be compressed before being written to disk. |
hook |
Function accepting and returning a data frame (see Details). |
overwrite |
If TRUE, pre-existing chromosome output files will be overwritten. The default is to skip chromosomes for which output files already exist. |
The ‘old2new’ argument, if non-NULL, serves up to three purposes.
Named elements of ‘old2new’ will be used to rename columns from the snptest output files. An element of ‘old2new’ with name ‘old_name’ and value ‘new_name’ will rename the ‘old_name’ column in a snptest output file to ‘new_name’.
The values of ‘old2new’ determine which columns of the snptest output files will be included in the returned data frame.
The order of the value in ‘old2new’ determines the order of columns in the returned data frame.
Note that elements of ‘old2new’ that are only used to select and reorder columns need not be named. In other words, ‘old2new’ may contain a mixture of named and unname elements. In fact, if the ‘names’ attribute of ‘old2new’ is NULL and ‘select = TRUE’, ‘old2new’ will not be used for renaming but only for selecting and reordering columns.
In addition to the columns that the snptest program creates in its output files there will be the following columns:
‘freq_alleleB’ gives the frequency of the allele in the ‘alleleB’ column. Since in snptest this is the effect allele, ‘freq_alleleB’ is the effect allele frequency. It is computed as (‘all_AB + (2 * 'all_BB’)) / (2 * (‘all_AA’ + ‘all_AB’ + ‘all_BB’)).
‘callrate’ gives the fraction of non-missing genotypes. It is computed as 1 - (‘all_NULL’ / (‘all_AA’ + ‘all_AB’ + ‘all_BB’ + ‘all_NULL’)).
‘imputed’ is 1 for snps that were imputed and otherwise 0.
If ‘old2new’ is NULL all columns from the snptest output files will be
included in the chromosome-specific output files of
combine_snptest
, including the above-mentioned extra columns.
Sometimes being able to rename, select, and reorder columns via
‘old2new’ and ‘select’ doesn't quite cut it. If you need unlimited
flexibility, you can supply a function via the ‘hook’ argument. The
function must take and return a data frame. It will be applied to a
data frame representing the summarized contents of a single (!)
snptest output file after columns have been renamed, selected, and
reordered according to ‘old2new’ and ‘select’. Make sure that your
‘hook’ function does indeed return a data frame. In particular, use
‘drop = FALSE’ when subsetting with [
.
None.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | library(genFun)
## Not run:
# Define how to rename, select, and reorder columns.
# OLD NAME NEW NAME
old2new <- c(rsid = "snp",
chromosome = "chr",
position = "pos",
"imputed",
"callrate",
freq_alleleB = "coding_freq",
frequentist_add_beta_1 = "beta",
frequentist_add_se_1 = "se",
frequentist_add_pvalue = "pval")
# Custom `hook' function.
drop_snps_without_pvalue <- function(d)
{
d[!is.na(d$pval), , drop = FALSE]
}
combine_snptest(indir = "path/to/snptest_output_files",
outdir = "dir/where/output/should_go",
ncore = 6L,
old2new = old2new,
select = TRUE,
template = "cohort_consortium_date_chr<CHROMOSOME>.txt.gz",
gzip = TRUE,
hook = drop_snps_without_pvalue)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.