randomization: Runs the randomization step for Single and multiple...

Description Usage Arguments Value

Description

The randomization step is run. This is followed by the p-value calulation if the mode is SAE. If the mode is not SAE, assumes MAE and generates the z-scores. The default binwidth is set to 2 (see bins parameter below) The following output files are generated for randomization and saved in the current directory <outFilePrefix>_obsRandomVals.rda : <outFilePrefix>_DistributionMean<colname_rankSNPann>.pdf : saves the distribution of the mean observed and random values <outFilePrefix>_DistrMeanSNPsPerBin.pdf if mode is SAE <outFilePrefix>_obsRandomValsZscorePval.rda - contains the zscores for the observed, randomly selected values and the pnorm pvalue <outFilePrefix>_DistributionPvalZscore.pdf else <outFilePrefix>_zscoreNmlDistr.rda <outFilePrefix>_zscoreDistr.pdf are also generated

Usage

1
2
3
4
randomization(df_sigASE_SNPann, df_nonASE_SNPann, colname_rankSNPann,
  colname_chk4distr, outFilePrefix, nIterations = 10000,
  binwidth = eraseBinwidth, mode = "SAE", seedValue = NULL,
  bins = NULL)

Arguments

df_sigASE_SNPann

-dataframe containing the significant ASEs that overlap with the SNP annotation, will need to have the column name "cmp.col" (containing e.g. "rsid", "chr_pos" values used to find the interesection with the snp annotation dataset) and those passed via the parameters colname_rankSNPann and colname_chk4distr

df_nonASE_SNPann

-dataframe containing the non significant ASEs that overlap with the SNP annotation, will need to have the column name "cmp.col" and those passed via the parameters colname_rankSNPann and colname_chk4distr

colname_rankSNPann

the column name in the above two data frames with the transformed SNP score, to be used to rank the SNPs/ calculate the mean e.g. in case of GWAS p value can be transformed to "neglog10pval" containing -log10(p)

colname_chk4distr

name of the column to be used to check the distribution of the randomly selected values (in df_nonASE_SNPann) are same as that in the df_sigASE_SNPann e.g. "averageReads"

outFilePrefix

The names of all the output files generated will be assigned this prefix

nIterations

number of random selection iterations. Default value of 10000

binwidth

Bindwidth to be used. default = 2 #' @param mode - default value of 'SAE', calculates the p-value else for MAE will transform into z-score

seedValue

The seed value to be set. If NULL then no seed is set.

bins

bins to which loci are to be assigned based on their value in colname_chk4distr. If not provided and value NULL - code will assign it a binwidth of 2 ( in variable 'eraseBinwidth') as long as the colname_chk4distr value is < 200 (in variable 'eraseBinwidthEnd') and the remaining are placed in a single last bin. The defaults can be changed by assigning the variables 'eraseBinwidth' and 'eraseBinwidthEnd' new values.

Value

the pnorm p-value


karishdsa/ERASE documentation built on May 9, 2019, 2:55 p.m.