Description Usage Arguments Details Value References See Also Examples
This function provides a pipeline for the plink data set based on the LSKAT function. Except the plink data set(BED, BIM and FAM), SNP set table is same as SKAT package.
1 2 3 4 5 6 7 8 9 10 |
file.plink.bed |
File name, PLINK bed file |
file.plink.bim |
File name, PLINK bim file |
file.plink.fam |
File name, PLINK fam file |
file.phe.long |
File name, indicating phenotype file in CSV format with row name (individual ID) and header information (Measured Index). Each individual is encoded to one row data which has individual ID as row name and multiple observed values followed by row name. |
file.phe.cov |
File name, indicating covariate file in CSV format with row name (individual ID) and header information (Covariate Index). Each individual is encoded to one row data which has individual ID as row name and covariate values followed by row name. |
file.phe.time |
File name, indicating measured times in CSV format with row name (individual ID) and header information (Measured Time). Each individual is encoded to one row data which has individual ID as row name and covariate values followed by row name. |
file.gene.set |
File name, indicating gene set table which has two columns, 1st column is gene name, 2nd column is variant name and no header. |
gene.set |
numeric or string. indicating the gene name or gene index in the gene set table. |
options |
String, see the details. |
verbose |
Logical variable, indicating whether some debug information can be outputted. |
The gene set file is same as the SetID file in SKAT packa which is white-space (space or tab) seperated file with 2 columns: SetID (1st column) and SNP_ID (2nd column), Please keep in mind that there should be no header!
The phenotype file, covariate file and measured time file are CSV files with header and row name(indicating the individual id), these 3 files should have same individual order. Please note that the columns in measured time file are corresponding to same columns in the phenotype file.
The options
list has some important feature to tune the association test. The default values are defined in the package as the follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Numeric, a value of MAF cutoff for the rare SNPs. Only SNPs that have MAFs smaller than this are considered as rare SNP. The default criterion of rare SNP is calculated by the formula 1/√{2*sample}
Numeric, indicating which order of time is included as extra covariates. If this value is 2, time and time square are considered as extra covariates.
Number, indicating the number of maximum iteration is applied to MLE alogrithm.
a numeric vector of parameters of beta weights for common variants (default=c(0.5,0.5)).
a numeric vector of parameters of beta weights for rare variants (default=c(1,25)).
Logical, indicating whether C/C++ functions are used to compute LSKAT.
Logical, indicating the computaional process output much more information than normal mode.
String, indicating the method of SNP imputation, the default model uses the mean of each variant to replace the missing SNP data. Two optional values: 'mean' or 'random'.
Logical, indicating whether the intercept is considered in the NULL model.
String, indicating PLINK command path, for the large PLINK data, the package will load small plink data set extracted by the plink command rather than the whole data set.
String, Three models can be selected, "joint", "Common.Only", "rare.Only".
String, indicating the estimate method for NULL model, two options, REML or ML, default is REML.
A list object with 3 items is returned by this function:
par |
The parameters of this function call. |
mle |
The estimation of NULL Model returned by |
result |
Matrix containing the test result of LSKAT and L-burden for all genes, the sub-items are described as follows. |
The structure: of result
item:
index |
Number, gene index |
gene.name |
String, gene name |
chr |
Number, chromosome |
min.pos |
Number, minimum position of all variants |
snp.total |
Number, the total number of variants |
snp.rare |
Number, the number of rare variants. |
q.lskat |
Numeric, the statistical test of LSKAT |
p.lskat |
Numeric, the p-value of LSKAT |
q.burden |
Numeric, the statistical test of L-Burden test |
p.burden |
Numeric, the p-value of L-Burden test |
Wang Z., Xu K., Zhang X., Wu X., and Wang Z., (2016) Longitudinal SNP-set association analysis of quantitative phenotypes. Genetic Epidemiology.
longskat_gene_test
and longskat_est_model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | ## data simulation for the test of type 1 error
p0 <- longskat_gene_simulate( plink.format=T, file.plink.prefix="tmp-gene-plink",
power.test=F, n.gene=20);
## test all genes in the PLINK data set
r.lskat0 <- longskat_gene_plink(p0$file.plink.bed, p0$file.plink.bim, p0$file.plink.fam,
p0$file.phe.long, p0$file.phe.cov, NULL,
p0$file.gene.set, gene.set=c(5:10), verbose=T );
## print the results
r.lskat0;
## data simulation for the power test
p0 <- longskat_gene_simulate( plink.format=T, file.plink.prefix="tmp-gene-plink",
power.test=T, n.gene=20);
## test all genes in the PLINK data set
r.lskat1<-longskat_snp_plink( p0$file.plink.bed, p0$file.plink.bim, p0$file.plink.fam,
p0$file.phe.long, p0$file.phe.cov, file.gene.set=p0$file.gene.set,
snp.set=c(1:199),
options=list(time.cov=1, g.maxiter=6, n.cpu=2, test.type="Rare.only" ));
## print the results
r.lskat1;
#draw the Manhattan figure for the result
plot( r.lskat1, pdf.file="tmp-plink-lskat1.pdf", title="TEST PLINK 1", bonferroni=T);
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.