Run.Model: Run Model

Description Usage Arguments Examples

View source: R/Run.model.R

Description

Allows you to measure statistical association between nearby regulatory variants and the level of expression at a heterozygous coding polymorphism, controlling for factors such as sex and population, by utilising a generalized linear model and applying permutations to the data in order to provide a robust p-value

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
Run.Model(
  inputObj,
  output_prefix,
  task = 1,
  totalTasks = 1,
  minInd = 10,
  numPerms = 1e+05,
  TSSwindow = 5e+05,
  pval_threshold = 5e-05,
  other_all = FALSE,
  seed = 10
)

Arguments

inputObj

This is the RData file outputted from the first function, Gen.input

output_prefix

This is the pre-fix of the name of the output file

task

This analysis is very computationally burdensome. To speed up the process it is an advantage to split it up into tasks that may be run on multiple nodes, concurrently.

totalTasks

Set the total number of tasks to split the analysis into. Defaults to 100.

minInd

Specify the minimum number of individuals in which an ASE site must be found before in order to be included in the analysis. Defaults to 10

numPerms

Select how many permutations to run. Along with splitting the process up into simultaneous tasks, this is the biggest factor in determining how long the analysis will take. However, the more permutations, in general and up until a point, the more precise and accurate the results may be; for example, if set to 100, the minimum p-value that can possibly be reached as a result of permutations, is 0.01. Defaults to 100,000.

TSSwindow

This represents the distance over which nearby variants will be selected, either side of the transcript start site. Defaults to 500kb

pval_threshold

There is a theoretical minimum p-value for each particular combination of reference and alternative alleles for a given set of individuals for a given nearby variant of an ASE site

other_all

Specify whether or not to account for the allele on the haplotype to which the expressed site does not belong. Defaults to FALSE

seed

Specify the number that the seed should be set to. The seed is the starting point used in the generation of a sequence of random numbers. Defaults to 10 . \t\t\t This parameter sets the upper limit. Default is 0.00005. In this example the model will not be run if it is not possible to reach a p-value as low as 0.00005, even theoretically.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#Run model with task set to 10, for 100,000 permutations, a transcript start site window of 500kb and a theoretical p-value threshold of 0.00005
Run.Model('input_file.RData', 10, 
        'output_prefix')

#Run model with task\tset to 10, for 10,000 permutations, a transcript start site window of 500kb and a theoretical p-value of 0.00005
Run.Model('input_file.RData', 10,
        'output_prefix',
        numPerms=10000, pval_threshold=10000)

#Run model with task\tset to 2, for 10,000 permutations, a transcript start site window of 1Mb and a theoretical p-value of 0.00005
Run.Model('input_file.RData', 2,
        'output_prefix',
        numPerms=10000,TSSwindow=100000, pval_threshold=10000)

NB: The smallest possible p-value attainable as a result of running permutations is 1/numPerms. Hence, there is no advantage to setting the minimum p-value threshold to below this number.

AClement1990/hap-eQTL documentation built on Jan. 8, 2021, 12:41 a.m.