GxE.scan.partition: Creates GxE.scan job files for a computing cluster

Description Usage Arguments Details Value See Also Examples

View source: R/wga_stream.GxE.R

Description

Creates job files for running GxE.scan on a parallel processing system.

Usage

1

Arguments

snp.list

See snp.list and details below. No default.

pheno.list

See pheno.list. No default.

op

See details for this list of options. The default is NULL.

Details

This function will create files needed for running a GWAS scan on a computing cluster. The user must know how to submit jobs and know how to use their particular cluster. On many clusters, the command for submitting a job is "qsub". The scan is partitioned into smaller jobs by either setting the values for snp.list$start.vec and snp.list$stop.vec or by setting the value for snp.list$include.snps. The partitioning is done so that each job will process an equal number of SNPs. In the output directory (see option out.dir), three types of files will be created. One type of file will be the R program file containing R statements defining snp.list, pheno.list and op for the GxE.scan function. These files have the ".R" file extension. Another type of file will be the job file which calls the R program file. These files are named
paste(op$out.dir, "job_", op$id.str, 1:op$n.jobs, sep="") The third type of file is a single file containing the names of all the job files. This file has the prefix "Rjobs_". This function will automatically set the name of the output file created by GxE.scan to a file in the op$out.dir directory with the prefix "GxEout_".

Options list op: Below are the names for the options list op. All names have default values if they are not specified.

snp.list
The objects start.vec and stop.vec in snp.list are set automatically, so they do not need to be set by the user. In general, it is more efficient in terms of memory usage and speed to have the genotype data partitioned into many files. Thus, snp.list$file can not only be set to a single file but also set to a character vector of the partitioned files when calling this function. In this case, the number of jobs to create (op$n.jobs) must be greater than or equal to the number of partitioned files. An object in snp.list that is unique to the GxE.scan.partition function is nsnps.vec. Each element of snp.list$nsnps.vec is the number of SNPs in each file of snp.list$file. If nsnps.vec is not specified and snp.list$file contains more than one file, then each job will process an entire file in snp.list$file.

For the scenarios when the genotype data must be transformed and the data is contained in a single file, then snp.list$include.snps should also be set. This will create a separate list of SNPs for each job to process.

Value

The name of the file containing names of the job files to be submitted. See details.

See Also

GxE.scan, GxE.scan.combine

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
 # Define the list for the genotype data. There are 50 SNPs in the TPED file. 
 snp.list <- list(nsnps.vec=50, format="tped")
 snp.list$file <- system.file("sampleData", "geno_data.tped.gz", package="CGEN")
 snp.list$subject.list <- system.file("sampleData", "geno_data.tfam", package="CGEN")
 
 # Define pheno.list
 pheno.list <- list(id.var=c("Family", "Subject"), delimiter="\t", header=1,
                    response.var="CaseControl")
 pheno.list$file <- system.file("sampleData", "pheno.txt", package="CGEN")
 pheno.list$main.vars <- ~Gender + Exposure
 pheno.list$int.vars <- ~Exposure
 pheno.list$strata.var <- "Study"

 # Define the list of options. 
 # Specifying n.jobs=5 will let each job process 10 SNPs.
 op <- list(n.jobs=5, GxE.scan.op=list(model=1))

 # GxE.scan.partition(snp.list, pheno.list, op=op)

CGEN documentation built on April 28, 2020, 8:08 p.m.