View source: R/SAIGE_SPATest.R
SPAGMMATtest | R Documentation |
Run single variant or gene- or region-based score tests with SPA based on the linear/logistic mixed model.
SPAGMMATtest( bgenFile = "", bgenFileIndex = "", vcfFile = "", vcfFileIndex = "", vcfField = "DS", savFile = "", savFileIndex = "", sampleFile = "", idstoExcludeFile = "", idstoIncludeFile = "", rangestoExcludeFile = "", rangestoIncludeFile = "", chrom = "", start = 1, end = 2.5e+08, IsDropMissingDosages = FALSE, minMAC = 0.5, minMAF = 0, maxMAFforGroupTest = 0.5, minInfo = 0, GMMATmodelFile = "", varianceRatioFile = "", SPAcutoff = 2, SAIGEOutputFile = "", numLinesOutput = 10000, IsSparse = TRUE, IsOutputAFinCaseCtrl = FALSE, IsOutputHetHomCountsinCaseCtrl = FALSE, IsOutputNinCaseCtrl = FALSE, IsOutputlogPforSingle = FALSE, LOCO = TRUE, condition = "", sparseSigmaFile = "", groupFile = "", kernel = "linear.weighted", method = "optimal.adj", weights.beta.rare = c(1, 25), weights.beta.common = c(1, 25), weightMAFcutoff = 0.01, weightsIncludeinGroupFile = FALSE, weights_for_G2_cond = NULL, r.corr = 0, IsSingleVarinGroupTest = TRUE, cateVarRatioMinMACVecExclude = c(0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 10.5, 20.5), cateVarRatioMaxMACVecInclude = c(1.5, 2.5, 3.5, 4.5, 5.5, 10.5, 20.5), dosageZerodCutoff = 0.2, IsOutputPvalueNAinGroupTestforBinary = FALSE, IsAccountforCasecontrolImbalanceinGroupTest = TRUE, IsOutputBETASEinBurdenTest = FALSE, IsOutputMAFinCaseCtrlinGroupTest = FALSE, X_PARregion = "60001-2699520,154931044-155270560", is_rewrite_XnonPAR_forMales = FALSE, sampleFile_male = "", method_to_CollapseUltraRare = "absence_or_presence", MACCutoff_to_CollapseUltraRare = 10, DosageCutoff_for_UltraRarePresence = 0.5 )
bgenFile |
character. Path to bgen file. Currently version 1.2 with 8 bit compression is supported |
bgenFileIndex |
character. Path to the .bgi file (index of the bgen file) |
vcfFile |
character. Path to vcf file |
vcfFileIndex |
character. Path to index for vcf file by tabix, ".tbi" by "tabix -p vcf file.vcf.gz" |
vcfField |
character. genotype field in vcf file to use. "DS" for dosages or "GT" for genotypes. By default, "DS". |
savFile |
character. Path to sav file |
savFileIndex |
character. Path to index for sav file .s1r |
sampleFile |
character. Path to the file that contains one column for IDs of samples in the bgen file with NO header |
idstoExcludeFile |
character. Path to the file containing variant ids to be excluded from the bgen file. The file does not have a header and each line is for a marker ID. |
idstoIncludeFile |
character. Path to the file containing variant ids to be included from the bgen file. The file does not have a header and each line is for a marker ID. |
rangestoExcludeFile |
character. Path to the file containing genome regions to be excluded from the bgen file. The file contains three columns for chromosome, start, and end respectively with no header |
rangestoIncludeFile |
character. Path to the file containing genome regions to be included from the bgen file. The file contains three columns for chromosome, start, and end respectively with no header |
chrom |
character. string for the chromosome to include from vcf file. Required for vcf file. Note: the string needs to exactly match the chromosome string in the vcf/sav file. For example, "1" does not match "chr1". If LOCO is specified, providing chrom will save computation cost |
start |
numeric. start genome position to include from vcf file. By default, 1 |
end |
numeric. end genome position to include from vcf file. By default, 250000000 |
IsDropMissingDosages |
logical. whether to drop missing dosages (TRUE) or to mean impute missing dosages (FALSE). By default, FALSE. This option only works for bgen, vcf, and sav input. |
minMAC |
numeric. Minimum minor allele count of markers to test. By default, 0.5. The higher threshold between minMAC and minMAF will be used |
minMAF |
numeric. Minimum minor allele frequency of markers to test. By default 0. The higher threshold between minMAC and minMAF will be used |
maxMAFforGroupTest |
numeric. Maximum minor allele frequency of markers to test in group test. By default 0.5. |
minInfo |
numeric. Minimum imputation info of markers to test. By default, 0. This option only works for bgen, vcf, and sav input |
GMMATmodelFile |
character. Path to the input file containing the glmm model, which is output from previous step. Will be used by load() |
varianceRatioFile |
character. Path to the input file containing the variance ratio, which is output from the previous step |
SPAcutoff |
by default = 2 (SPA test would be used when p value < 0.05 under the normal approximation) |
SAIGEOutputFile |
character. Path to the output file containing assoc test results |
numLinesOutput |
numeric. Number of markers to be output each time. By default, 10000 |
IsSparse |
logical. Whether to exploit the sparsity of the genotype vector for less frequent variants to speed up the SPA tests or not for dichotomous traits. By default, TRUE |
IsOutputAFinCaseCtrl |
logical. Whether to output allele frequency in cases and controls. By default, FALSE |
IsOutputHetHomCountsinCaseCtrl |
logical. Whether to output heterozygous and homozygous counts in cases and controls. By default, FALSE. If True, the columns "homN_Allele2_cases", "hetN_Allele2_cases", "homN_Allele2_ctrls", "hetN_Allele2_ctrls" will be output. |
IsOutputNinCaseCtrl |
logical. Whether to output sample sizes in cases and controls. By default, FALSE |
IsOutputlogPforSingle |
logical. Whether to output log(Pvalue) for single-variant assoc tests. By default, FALSE. If TRUE, the log(Pvalue) instead of original P values will be output |
LOCO |
logical. Whether to apply the leave-one-chromosome-out option. By default, TRUE |
condition |
character. For conditional analysis. Genetic marker ids (chr:pos_ref/alt if sav/vcf dosage input , marker id if bgen input) seperated by comma. e.g.chr3:101651171_C/T,chr3:101651186_G/A, Note that currently conditional analysis is only for bgen,vcf,sav input. |
sparseSigmaFile |
character. Path to the file containing the sparseSigma from step 1. The suffix of this file is ".mtx". |
groupFile |
character. Path to the file containing the group information for gene-based tests. Each line is for one gene/set of variants. The first element is for gene/set name. The rest of the line is for variant ids included in this gene/set. For vcf/sav, the genetic marker ids are in the format chr:pos_ref/alt. For bgen, the genetic marker ids should match the ids in the bgen file. Each element in the line is seperated by tab. |
kernel |
character. For gene-based test. By default, "linear.weighted". More options can be seen in the SKAT library |
method |
character. method for gene-based test p-values. By default, "optimal.adj". More options can be seen in the SKAT library |
weights.beta.rare |
vector of numeric. parameters for the beta distribution to weight genetic markers with MAF <= weightMAFcutoff in gene-based tests.By default, "c(1,25)". More options can be seen in the SKAT library |
weights.beta.common |
vector of numeric. parameters for the beta distribution to weight genetic markers with MAF > weightMAFcutoff in gene-based tests.By default, "c(1,25)". More options can be seen in the SKAT library. NOTE: this argument is not fully developed. currently, weights.beta.common is euqal to weights.beta.rare |
weightMAFcutoff |
numeric. Between 0 and 0.5. See document above for weights.beta.rare and weights.beta.common. By default, 0.01 |
weightsIncludeinGroupFile |
logical. Whether to specify customized weight for makers in gene- or region-based tests. If TRUE, weights are included in the group file. For vcf/sav, the genetic marker ids and weights are in the format chr:pos_ref/alt;weight. For bgen, the genetic marker ids should match the ids in the bgen filE, e.g. SNPID;weight. Each element in the line is seperated by tab. By default, FALSE |
weights_for_G2_cond |
vector of float. weights for conditioning markers for gene- or region-based tests. The length equals to the number of conditioning markers, delimited by comma. By default, "c(1,2)" |
r.corr |
numeric. bewteen 0 and 1. parameters for gene-based tests. By default, 0. More options can be seen in the SKAT library |
IsSingleVarinGroupTest |
logical. Whether to perform single-variant assoc tests for genetic markers included in the gene-based tests. By default, FALSE |
cateVarRatioMinMACVecExclude |
vector of float. Lower bound of MAC for MAC categories. The length equals to the number of MAC categories for variance ratio estimation. By default, c(0.5,1.5,2.5,3.5,4.5,5.5,10.5,20.5). If groupFile="", only one variance ratio corresponding to MAC >= 20 is used |
cateVarRatioMaxMACVecInclude |
vector of float. Higher bound of MAC for MAC categories. The length equals to the number of MAC categories for variance ratio estimation minus 1. By default, c(1.5,2.5,3.5,4.5,5.5,10.5,20.5). If groupFile="", only one variance ratio corresponding to MAC >= 20 is used |
dosageZerodCutoff |
numeric. In gene- or region-based tests, for each variants with MAC <= 10, dosages <= dosageZerodCutoff with be set to 0. By default, 0.2. |
IsOutputPvalueNAinGroupTestforBinary |
logical. In gene- or region-based tests for binary traits. if IsOutputPvalueNAinGroupTestforBinary is TRUE, p-values without accounting for case-control imbalance will be output. By default, FALSE |
IsAccountforCasecontrolImbalanceinGroupTest |
logical. In gene- or region-based tests for binary traits. If IsAccountforCasecontrolImbalanceinGroupTest is TRUE, p-values after accounting for case-control imbalance will be output. By default, TRUE |
IsOutputBETASEinBurdenTest |
logical. Output effect size (BETA and SE) for burden tests. By default, FALSE |
IsOutputMAFinCaseCtrlinGroupTest |
logical. Whether to output minor allele frequency in cases and controls in set-based tests By default, FALSE |
X_PARregion |
character. ranges of (pseudoautosomal) PAR region on chromosome X, which are seperated by comma and in the format start:end. By default: '60001-2699520,154931044-155260560' in the UCSC build hg19. For males, there are two X alleles in the PAR region, so PAR regions are treated the same as autosomes. In the NON-PAR regions (outside the specified PAR regions on chromosome X), for males, there is only one X allele. If is_rewrite_XnonPAR_forMales=TRUE, genotypes/dosages of all variants in the NON-PAR regions on chromosome X will be multiplied by 2. |
is_rewrite_XnonPAR_forMales |
logical. Whether to rewrite gentoypes or dosages of variants in the NON-PAR regions on chromosome X for males (multiply by 2). By default, FALSE. Note, only use is_rewrite_XnonPAR_forMales=TRUE when the specified VCF or Bgen file only has variants on chromosome X. When is_rewrite_XnonPAR_forMales=TRUE, the program does not check the chromosome value by assuming all variants are on chromosome X |
sampleFile_male |
character. Path to the file containing one column for IDs of MALE samples in the bgen or vcf file with NO header. Order does not matter |
method_to_CollapseUltraRare |
character. Method to collpase the ultra rare variants in the set-based association tests. This argument can be 'absence_or_presence', 'sum_geno', or ”. absence_or_presence: For the resulted collpased marker, any individual having DosageCutoff_for_UltraRarePresence <= dosage < 1+DosageCutoff_for_UltraRarePresence for any ultra rare variant has 1 in the genotype vector, having dosage >= 1+DosageCutoff_for_UltraRarePresence for any ultra rare variant has 2 in the genotype vector, otherwise 0. sum_geno: Ultra rare variants with MAC <= MACCutoff_to_CollapseUltraRare will be collpased for set-based tests in the 'sum_geno' way and the resulted collpased marker's genotype equals weighted sum of the genotypes of all ultra rare variants. NOTE: this option sum_geno currently is NOT active. By default, "absence_or_presence". |
MACCutoff_to_CollapseUltraRare |
numeric. MAC cutoff to collpase the ultra rare variants (<= MACCutoff_to_CollapseUltraRare) in the set-based association tests. By default, 10. |
DosageCutoff_for_UltraRarePresence |
numeric. Dosage cutoff to determine whether the ultra rare variants are absent or present in the samples. Dosage >= DosageCutoff_for_UltraRarePresence indicates the varaint in present in the sample. 0< DosageCutoff_for_UltraRarePresence <= 2. By default, 0.5. |
SAIGEOutputFile
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.