prepareComputePlan: Return a suitable compute plan for a genome-wide association...

Description Usage Arguments Details Value See Also Examples

View source: R/model.R

Description

\lifecycle

maturing Instead of using OpenMx's default model processing sequence (i.e., omxDefaultComputePlan), it is more efficient and convienient to assemble a compute plan tailored for a genome-wide association study. This function returns a compute plan that loads SNP data into model modelName, fits the model, outputs the results to out, and repeats this procedure for all SNPs.

Usage

1
2
3
4
5
6
7
8
9
prepareComputePlan(
  model,
  snpData,
  out = "out.log",
  ...,
  SNP = NULL,
  startFrom = 1L,
  rowFilter = NULL
)

Arguments

model

an MxModel model, specified using RAM or LISREL notation. The model argument is designed to take the output from e.g. buildOneFac (or the other prebuilt GW-SEM functions), but advanced users can specify their own arbitrary OpenMx Model or use Onyx to draw their path diagrams.

snpData

a pathway to a file containing GWAS data. The data can be in a variety of forms, such as standard PLINK format (bed/bim/fam), PLINK2 format (pgen/pvar/psam), Oxford format (bgen/sample), or CSV format (csv format in much slower due to the lack of compression for non-binary files).

out

a file name or pathway where the output from the analysis will be saved. The default pathway is "out.log", which will save the file in the working directory. Users should take caution when specifying the output file name so that the output from different analyses/chromosomes do not overwrite existing files.

...

Not used. Forces remaining arguments to be specified by name.

SNP

a numerical range that specifies the number of SNPs to be evaluated from the snpData file. This argument can be used to evaluate a subset of snps for model testing. e.g. 1:10 will run the first 10 snps to make sure that the model is functioning the way the users intends, that the files exist pathways are correct. This option is also very useful to specify a range of snps to be evaluated that is smaller than the complete file. For example, users may wish to run several discrete batches of analyses for chromosome 1, by running 1:10000, 100001:200000, etc. This prevents users from constructing numerous snap files for each chromosome. The default value of the SNP argument is NULL, which will run all snps in the file.

startFrom

a numerical value indicating which SNP is the first SNP to be analyzed. The function will then run every SNP from the specified SNP to the end of the GWAS data file. This is very useful if the analysis stops for some reason (i.e. the cluster is restarted for maintenance) and you can start from the last SNP that you analyzed. Note, you will want to label the output file (specified in out) with a new file name so that you don't overwrite the existing results.

rowFilter

optional named list of logical vectors to indicate which rows to skip when loading the SNP column

Details

You can request a specific list of SNPs using the SNP argument. The numbers provided in SNP refer to offsets in the snpData file. For example, SNP=c(100,200) will process the 100th and 200th SNP. The first SNP in the snpData file is at offset 1. When SNP is omitted then all available SNPs are processed.

The suffix of snpData filename is interpreted to signal the format of how the SNP data is stored on disk. Suffixes ‘pgen’, ‘bed’, and ‘bgen’ are supported. Per-SNP descriptions are found in different places depending on the suffix. For ‘bgen’, both the SNP data and description are built into the same file. In the case of ‘pgen’, an associated file with suffix ‘pvar’ is expected to exist (see the spec for details). In the case of ‘bed’, an associated ‘bim’ file is expected to exist (see the spec for details). The chromosome, base-pair coordinate, and variant ID are added to each line of out.

The code to implement method='pgen' is based on plink 2.0 alpha. plink's ‘bed’ file format is supported in addition to ‘pgen’. Data are coerced appropriately depending on the type of the destination column. For a numeric column, data are recorded as the values NA, 0, 1, or 2. An ordinal column must have exactly 3 levels.

For method='bgen', the file path+".bgi" must also exist. If not available, generate this index file with the bgenix tool.

For ‘bgen’ and ‘pgen’ formats, the numeric column can be populated with a dosage (sum of probabilities multiplied by genotypes) if these data are available.

A compute plan does not do anything by itself. You'll need to combine the compute plan with a model (such as returned by buildOneFac) to perform a GWAS.

Value

The given model with an appropriate compute plan.

See Also

GWAS

Examples

1
2
3
4
5
6
pheno <- data.frame(anxiety=cut(rnorm(500), c(-Inf, -.5, .5, Inf),
ordered_result = TRUE))
m1 <- buildItem(pheno, 'anxiety')
dir <- system.file("extdata", package = "gwsem")
m1 <- prepareComputePlan(m1, file.path(dir,"example.bgen"))
m1$compute

gwsem documentation built on Jan. 18, 2022, 1:09 a.m.