seqFitNullGLMM_SPA: Fit the null model with GRM

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/saige_main.r

Description

Fit the null model in the mixed model framework with genetic relationship matrix (GRM).

Usage

1
2
3
4
5
6
seqFitNullGLMM_SPA(formula, data, gdsfile, trait.type=c("binary", "quantitative"),
    sample.col="sample.id", maf=0.005, missing.rate=0.01, max.num.snp=1000000L,
    variant.id=NULL, inv.norm=TRUE, X.transform=TRUE, tol=0.02, maxiter=20L,
    nrun=30L, tolPCG=1e-5, maxiterPCG=500L, num.marker=30L, tau.init=c(0,0),
    traceCVcutoff=0.0025, ratioCVcutoff=0.001, geno.sparse=TRUE, num.thread=1L,
    model.savefn="", seed=200L, fork.loading=FALSE, verbose=TRUE)

Arguments

formula

an object of class formula (or one that can be coerced to that class), e.g., y ~ x1 + x2, see lm

data

a data frame for the formulas

gdsfile

a SeqArray GDS filename, or a GDS object

trait.type

"binary" for binary outcomes, "quantitative" for continuous outcomes

sample.col

the column name of sample IDs corresponding to the GDS file

maf

minor allele frequency for imported genotypes (checking >= maf), if variant.id=NULL; NaN for no filter

missing.rate

threshold of missing rate (checking <= missing.rate), if variant.id=NULL; NaN for no filter

max.num.snp

the maximum number of SNPs used, or -1 for no limit

variant.id

a list of variant IDs, used to construct GRM

inv.norm

if TRUE, perform inverse normal transformation on residuals for quantitative outcomes, see the reference [Sofer, 2019]

X.transform

if TRUE, perform QR decomposition on the design matrix

tol

overall tolerance for model fitting

maxiter

the maximum number of iterations for model fitting

nrun

the number of random vectors in the trace estimation

tolPCG

tolerance of PCG iterations

maxiterPCG

the maximum number of PCG iterations

num.marker

the number of SNPs used to calculate the variance ratio

tau.init

a 2-length numeric vector, the initial values for variance components, tau; for binary traits, the first element is always be set to 1. if tau.init is not specified, the second element will be 0.5 for binary traits

traceCVcutoff

the threshold for coefficient of variation (CV) for the trace estimator, and the number of runs for trace estimation will be increased until the CV is below the threshold

ratioCVcutoff

the threshold for coefficient of variation (CV) for estimating the variance ratio, and the number of randomly selected markers will be increased until the CV is below the threshold

geno.sparse

if TRUE, store the sparse structure for genotypes; otherwise, save genotypes in a 2-bit dense matrix; see details

num.thread

the number of threads

model.savefn

the filename of model output, R data file '.rda', '.RData', or '.rds'

seed

an integer as a seed for random numbers

fork.loading

load genotypes via forking or not; forking processes in Unix can reduce loading time of genotypes, but may double the memory usage; not applicable on Windows

verbose

if TRUE, show information

Details

Utilizing the sparse structure of genotypes could significantly improve the computational efficiency of model fitting, but it also increases the memory usage. For more details of SAIGE algorithm, please refer to the SAIGE paper [Zhou et al. 2018] (see the reference section).

Value

Returns a list with the following components:

coefficients

the beta coefficients for fixed effects;

tau

a numeric vector of variance components 'Sigma_E' and 'Sigma_G';

linear.predictors

the linear fit on link scale;

fitted.values

fitted values from objects returned by modeling functions using glm.fit;

residuals

residuals;

cov

covariance matrix of beta coefficients;

converged

whether the model is fitted or not;

obj.noK

internal use, returned object from the SPAtest package;

var.ratio

a data.frame with columns 'id' (variant.id), 'maf' (minor allele frequency), 'mac' (minor allele count), 'var1' (the variance of score statistic), 'var2' (a variance estimate without accounting for estimated random effects) and 'ratio' (var1/var2, estimated variance ratio for variance approximation);

trait.type

either "binary" or "quantitative";

sample.id

the sample IDs used in the model fitting;

variant.id

the variant IDs used in the model fitting.

Author(s)

Xiuwen Zheng

References

Zhou W, Nielsen JB, Fritsche LG, Dey R, Gabrielsen ME, Wolford BN, LeFaive J, VandeHaar P, Gagliano SA, Gifford A, Bastarache LA, Wei WQ, Denny JC, Lin M, Hveem K, Kang HM, Abecasis GR, Willer CJ, Lee S. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet (2018). Sep;50(9):1335-1341.

T Sofer, X Zheng, SM Gogarten, CA Laurie, etc. A fully adjusted two-stage procedure for rank-normalization in genetic association studies. 2019. Genetic Epidemiology 43(3), 263-275

See Also

seqAssocGLMM_SPA

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# open a GDS file
fn <- system.file("extdata", "grm1k_10k_snp.gds", package="SAIGEgds")
gdsfile <- seqOpen(fn)

# load phenotype
phenofn <- system.file("extdata", "pheno.txt.gz", package="SAIGEgds")
pheno <- read.table(phenofn, header=TRUE, as.is=TRUE)
head(pheno)

# fit the null model
glmm <- seqFitNullGLMM_SPA(y ~ x1 + x2, pheno, gdsfile, trait.type="binary")
glmm

# close the GDS file
seqClose(gdsfile)

Example output

Loading required package: gdsfmt
Loading required package: SeqArray
Loading required package: Rcpp
  sample.id y     yy      x1 x2
1        s1 0 4.5542  1.5118  1
2        s2 0 3.7941  0.3898  1
3        s3 0 5.0411 -0.6212  1
4        s4 0 5.6394 -2.2147  1
5        s5 0 4.2134  1.1249  1
6        s6 0 4.6145 -0.0449  1
SAIGE association analysis:
Fri Jun  4 16:06:42 2021
Filtering variants:

[..................................................]  0%, ETC: ---    
[==================================================] 100%, completed, 1s
# of selected variants: 9,976
Fit the null model: y ~ x1 + x2 + var(GRM)
    # of samples: 1,000
    # of variants: 9,976
    using 1 thread
Transform on the design matrix with QR decomposition:
    new formula: y ~ x0 + x1 + x2 - 1
Start loading SNP genotypes:

[..................................................]  0%, ETC: ---    
[==================================================] 100%, completed, 0s
    using 6.6M (sparse matrix)
Binary outcome: y
       y Number Proportion
       0    902      0.902
       1     98      0.098
Initial fixed-effect coefficients:
          x0         x1         x2
    2.520514 -0.7666948 -0.4557928
Initial variance component estimates, tau:
    Sigma_E: 1, Sigma_G: 0.499412
Iteration 1:
    tau: (1, 0.4994116)
    fixed coeff: (2.520514, -0.7666948, -0.4557928)
Iteration 2:
    tau: (1, 0.3287896)
    fixed coeff: (2.521231, -0.776603, -0.4592503)
Iteration 3:
    tau: (1, 0.2817812)
    fixed coeff: (2.525954, -0.7738757, -0.4579659)
Iteration 4:
    tau: (1, 0.3211452)
    fixed coeff: (2.525719, -0.7730823, -0.4577413)
Iteration 5:
    tau: (1, 0.3361534)
    fixed coeff: (2.527166, -0.7739766, -0.4579633)
Final tau: (1, 0.3322063)
    fixed coeff: (2.527666, -0.774237, -0.4580237)
Calculate the average ratio of variances:
Fri Jun  4 16:06:49 2021
     1, maf: 0.0370, mac: 74,	ratio: 0.9325 (var1: 0.0792, var2: 0.0849)
     2, maf: 0.0645, mac: 129,	ratio: 0.9459 (var1: 0.0723, var2: 0.0764)
     3, maf: 0.4390, mac: 878,	ratio: 0.9398 (var1: 0.0422, var2: 0.0449)
     4, maf: 0.0115, mac: 23,	ratio: 0.9202 (var1: 0.1, var2: 0.109)
     5, maf: 0.0135, mac: 27,	ratio: 0.9439 (var1: 0.0823, var2: 0.0872)
     6, maf: 0.0505, mac: 101,	ratio: 0.9391 (var1: 0.0716, var2: 0.0763)
     7, maf: 0.0425, mac: 85,	ratio: 0.9417 (var1: 0.073, var2: 0.0775)
     8, maf: 0.2290, mac: 458,	ratio: 0.9421 (var1: 0.0563, var2: 0.0598)
     9, maf: 0.0270, mac: 54,	ratio: 0.9381 (var1: 0.0761, var2: 0.0812)
    10, maf: 0.0205, mac: 41,	ratio: 0.9390 (var1: 0.0824, var2: 0.0878)
    11, maf: 0.1560, mac: 312,	ratio: 0.9384 (var1: 0.0666, var2: 0.071)
    12, maf: 0.0285, mac: 57,	ratio: 0.9343 (var1: 0.0803, var2: 0.0859)
    13, maf: 0.4110, mac: 822,	ratio: 0.9376 (var1: 0.0458, var2: 0.0488)
    14, maf: 0.4530, mac: 906,	ratio: 0.9421 (var1: 0.0398, var2: 0.0422)
    15, maf: 0.0930, mac: 186,	ratio: 0.9396 (var1: 0.0715, var2: 0.0761)
    16, maf: 0.0220, mac: 44,	ratio: 0.9387 (var1: 0.0668, var2: 0.0712)
    17, maf: 0.1655, mac: 331,	ratio: 0.9410 (var1: 0.0641, var2: 0.0681)
    18, maf: 0.4520, mac: 904,	ratio: 0.9375 (var1: 0.0438, var2: 0.0467)
    19, maf: 0.0105, mac: 21,	ratio: 0.9406 (var1: 0.0821, var2: 0.0873)
    20, maf: 0.0350, mac: 70,	ratio: 0.9332 (var1: 0.0833, var2: 0.0893)
    21, maf: 0.0235, mac: 47,	ratio: 0.9377 (var1: 0.0744, var2: 0.0793)
    22, maf: 0.3730, mac: 746,	ratio: 0.9406 (var1: 0.0474, var2: 0.0504)
    23, maf: 0.0130, mac: 26,	ratio: 0.9530 (var1: 0.0629, var2: 0.066)
    24, maf: 0.0345, mac: 69,	ratio: 0.9459 (var1: 0.0684, var2: 0.0724)
    25, maf: 0.1905, mac: 381,	ratio: 0.9370 (var1: 0.0628, var2: 0.0671)
    26, maf: 0.4080, mac: 816,	ratio: 0.9392 (var1: 0.0422, var2: 0.0449)
    27, maf: 0.0105, mac: 21,	ratio: 0.9338 (var1: 0.0902, var2: 0.0966)
    28, maf: 0.1350, mac: 270,	ratio: 0.9422 (var1: 0.0624, var2: 0.0663)
    29, maf: 0.1600, mac: 320,	ratio: 0.9393 (var1: 0.0648, var2: 0.069)
    30, maf: 0.0335, mac: 67,	ratio: 0.9396 (var1: 0.0738, var2: 0.0785)
    ratio avg. is 0.9391186, sd: 0.005418568
Fri Jun  4 16:06:49 2021
Done.
List of 12
 $ coefficients     : Named num [1:3] -2.98 0.752 0.916
  ..- attr(*, "names")= chr [1:3] "(Intercept)" "x1" "x2"
 $ tau              : Named num [1:2] 1 0.332
  ..- attr(*, "names")= chr [1:2] "Sigma_E" "Sigma_G"
 $ linear.predictors: num [1:1000] -1.04 -1.85 -2.66 -3.88 -1.26 ...
 $ fitted.values    : num [1:1000] 0.2605 0.1356 0.0652 0.0202 0.2204 ...
 $ residuals        : num [1:1000] -0.2605 -0.1356 -0.0652 -0.0202 -0.2204 ...
 $ cov              : num [1:3, 1:3] 0.01799 -0.00764 -0.00497 -0.00764 0.014 ...
 $ converged        : logi TRUE
 $ obj.noK          :List of 7
  ..$ y       : Named num [1:1000] 0 0 0 0 0 0 0 0 0 0 ...
  .. ..- attr(*, "names")= chr [1:1000] "1" "2" "3" "4" ...
  ..$ mu      : Named num [1:1000] 0.2823 0.1458 0.0744 0.024 0.2278 ...
  .. ..- attr(*, "names")= chr [1:1000] "1" "2" "3" "4" ...
  ..$ res     : Named num [1:1000] -0.2823 -0.1458 -0.0744 -0.024 -0.2278 ...
  .. ..- attr(*, "names")= chr [1:1000] "1" "2" "3" "4" ...
  ..$ V       : Named num [1:1000] 0.2026 0.1245 0.0689 0.0234 0.1759 ...
  .. ..- attr(*, "names")= chr [1:1000] "1" "2" "3" "4" ...
  ..$ X1      : num [1:1000, 1:3] -1 -1 -1 -1 -1 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:1000] "1" "2" "3" "4" ...
  .. .. ..$ : chr [1:3] "x0" "x1" "x2"
  .. ..- attr(*, "assign")= int [1:3] 1 2 3
  ..$ XV      : num [1:3, 1:1000] -0.2026 -0.3058 -0.1914 -0.1245 -0.0503 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:3] "x0" "x1" "x2"
  .. .. ..$ : chr [1:1000] "1" "2" "3" "4" ...
  .. ..- attr(*, "assign")= int [1:3] 1 2 3
  ..$ XXVX_inv: num [1:1000, 1:3] -0.0019 -0.01009 -0.01746 -0.02909 -0.00472 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:1000] "1" "2" "3" "4" ...
  .. .. ..$ : chr [1:3] "x0" "x1" "x2"
  ..- attr(*, "class")= chr "SA_NULL"
 $ var.ratio        :'data.frame':	30 obs. of  6 variables:
  ..$ id   : int [1:30] 904 987 1008 1408 1566 2013 2221 3767 3809 3981 ...
  ..$ maf  : num [1:30] 0.452 0.135 0.453 0.1655 0.0505 ...
  ..$ mac  : num [1:30] 904 270 906 331 101 44 74 54 21 70 ...
  ..$ var1 : num [1:30] 0.0438 0.0624 0.0398 0.0641 0.0716 ...
  ..$ var2 : num [1:30] 0.0467 0.0663 0.0422 0.0681 0.0763 ...
  ..$ ratio: num [1:30] 0.937 0.942 0.942 0.941 0.939 ...
 $ trait.type       : chr "binary"
 $ sample.id        : chr [1:1000] "s1" "s2" "s3" "s4" ...
 $ variant.id       : int [1:9976] 1 2 3 4 5 6 7 8 9 10 ...
 - attr(*, "class")= chr "ClassSAIGE_NullModel"

SAIGEgds documentation built on Nov. 8, 2020, 7:46 p.m.