Description Usage Arguments Details Value References See Also Examples
GGI
allows the search for Gene-Gene Interactions by testing all possible pairs of genes
in a set of genes.
1 2 |
Y |
numeric, integer, character or factor vector with exactly two different values. |
snpX |
SnpMatrix object. Must have a number of rows equal to the length of |
genes.length |
(optional) a numeric vector. |
genes.info |
(optional) a data frame. |
method |
a string matching one of the following: PCA, CCA, KCCA, CLD, PLSPM, GBIGM, minP, GATES, tTS or tProd. Only one method can be parsed. |
... |
Other optional arguments to be passed to the functions associated with the method chosen. See more in elementary methods help. |
This function is a wrapper for all Gene-Gene Interaction analysis methods and drive the overall analysis: splitting the dataset in gene matrices and starting elementary analysis for each pair of genes.
SNPs from the same gene are assumed to be ordered along the chromosome.
See selectSnps
.
If genes.lenght
is provided, it contains the number of SNPs of each gene. For example,
if genes.length
is the vector: c(20, 35, 15), then gene 1 will be interpreted as the set
of the first 20 columns/SNPs of snpX
, gene 2 will be interpreted as the following
35 columns/SNP, etc. Each gene declared is considered contiguous with the one before and
after it. genes.length
can be named if you want the returned matrix
to have dimensions named after those. If no names are given then generic
names are generated following the pattern Gene.n (n being the gene's index)
.
The following methods are available to perform the interaction test for a single pair of genes:
Principal Components Analysis method (PCA) PCA.test
- PCA is performed on both
genes and resulting principal components are used to fit a logistic regression model with
interaction between and a second logistic regression model without interaction term.
The interaction between the two genes is then tested using a likelihood ratio test between the two
logistic regression models (see Li et al. 2009).
Canonical Correlation Analysis (CCA) CCA.test
- The maximum of canonical
correlation between the two genes is computed for each group (cases and controls). The difference
between the two transformed values (Fisher transformation) is used to test for interaction between genes
(see Peng et al. 2010).
Kernel Canonical Correlation Analysis (KCCA) KCCA.test
- This method is similar to the CCA method where the canonical correlations are computed using Kernel method (see Yuan et al., 2012
and Larson et al., 2013).
Composite Linkage Disequilibrium (CLD) CLD.test
- CLD is based on the
difference of the covariance matrices between the two genes computed for cases and controls.
The covariance is estimated via the Composite Linkage Disequilibrium and a method based on Nagao
normalized Quadratic Distance is used to compute the test statistic (see Rajapakse et al., 2012).
Partial Least Square Path Modeling (PLSPM) PLSPM.test
- A network of
statistical relations between latent and manifest variables is built. The difference between the
path coefficients is used to compute the test statistic (see Zhang et al., 2013).
Gene-Based Information Gain Method (GBIGM) GBIGM.test
- Entropies and
Information Gain Ratio are used to compute a measure of the co-association between two genes
(see Li et al., 2015).
Minimum p-value test (minP) minP.test
- Given two genes, G1
with
m1 SNPs and G2
with m2 SNPs, all SNP-SNP interactions are first tested using
a logistic regression model, thus generated a set of m1*m2 p-values. The significance of the
minimum p-value is evaluated using multivariate normal distribution that accounts for the
covariance between the tests statistics at the SNP level (see Emily, 2016).
Gene Association Test using Extended Simes procedure (GATES) gates.test
- Given two genes, G1
with m1 SNPs and G2
with m2 SNPs, all SNP-SNP
interactions are first tested using a logistic regression model, thus generated a set of
m1*m2 p-values. P-values are then corrected for multiple testing using an extension of the
Simes procedure that take into account the correlation between the tests statistic via the number
of effective tests (see Li. et al., 2011).
Truncated Tail Strength test (tTS) tTS.test
- Given two genes, G1
with
m1 SNPs and G2
with m2 SNPs, all SNP-SNP interactions are first tested using
a logistic regression model, thus generated a set of m1*m2 p-values. All p-values below a
user defined threshold are weighted and summed up to provide the tTS test statistic
(see Jiang et al., 2011).
Truncated p-value Product test (tProd) tProd.test
- Similar to tTS but with
a different p-values transformation (see Zaykin, 2002)
Missing values are not allowed and trying to parse an incomplete SnpMatrix
object as an
argument will result in an error. Imputation can be performed prior to the analysis with the
imputeSnpMatrix
function.
A list with class "GGInetwork"
containing the following components:
statistic |
a symmetric |
p.value |
a symmetric |
df |
(Only for |
method |
The method used to perform the Gene-Gene interaction test. |
parameter |
A list of the parameters used to perform the Gene-Gene Interaction test. |
M. Emily. AGGrEGATOr: A Gene-based GEne-Gene interActTiOn test for case-control association studies, Statistical Application in Genetics and Molecular Biology, 15(2): 151-171, 2016.
J. Li et al. Identification of gene-gene interaction using principal components. BMC Proceedings, 3 (Suppl. 7): S78, 2009.
Qianqian Peng, Jinghua Zhao, and Fuzhong Xue. A gene-based method for detecting gene-gene co-association in a case-control study. European Journal of Human Genetics, 18(5) :582-587, 2010.
Yuan, Z. et al. (2012): Detection for gene-gene co-association via kernel canonical correlation analysis, BMC Genetics, 13, 83.
Larson, N. B. et al. (2013): A kernel regression approach to gene-gene interaction detection for case-control studies, Genetic Epidemiology, 37, 695-703.
Indika Rajapakse, Michael D. Perlman, Paul J. Martin, John A. Hansen, and Charles Kooperberg. Multivariate detection of gene-gene interactions. Genetic Epidemiology, 36(6):622-630, 2012.
X. Zhang et al. A PLSPM-based test statistic for detecting gene-gene co-association in genome-wide association study with case-control design. PLoS ONE, 8(4):e62129, 2013.
J. Li, et al.. A gene-based information gain method for detecting gene-gene interactions in case-control studies. European Journal of Human Genetics, 23 :1566-1572, 2015.
M.X. Li et al. GATES: A Rapid and Powerful Gene-Based Association Test Using Extended Simes Procedure, American Journal of Human Genetics, 88(3): 283-293, 2011.
B. Jiang, X. Zhang, Y. Zuo and G. Kang. A powerful truncated tail strength method for testing multiple null hypotheses in one dataset. Journal of Theoretical Biology 277: 67-73, 2011.
D.V. Zaykin, L.A. Zhivotovsky, P.H. Westfall and B.S. Weir. Truncated product method for combining P-values. Genetic epidemiology 22: 170-185, 2002.
PCA.test
,
CCA.test
, KCCA.test
, CLD.test
,
PLSPM.test
, GBIGM.test
, plot.GGInetwork
,
minP.test
, gates.test
, tTS.test
,
tProd.test
, imputeSnpMatrix
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | ## Not run:
## Dataset is included in the package
ped <- system.file("extdata/example.ped", package="GeneGeneInteR")
info <- system.file("extdata/example.info", package="GeneGeneInteR")
posi <- system.file("extdata/example.txt", package="GeneGeneInteR")
## Importation of the genotypes
data.imported <- importFile(file=ped, snps=info, pos=posi, pos.sep="\t")
## Filtering of the data: SNPs with MAF < 0.05 or p.value for HWE < 1e-3 or SNPs with
## call.rate < 0.9 are removed.
data.scour <- snpMatrixScour(snpX=data.imported$snpX,genes.info=data.imported$genes.info,min.maf=0.05,
min.eq=1e-3,call.rate=0.9)
## Imputation of the missing genotypes
data.imputed <- imputeSnpMatrix(data.scour$snpX, genes.info = data.scour$genes.info)
## End(Not run)
## Equivalent loading of the genotypes
load(system.file("extdata/dataImputed.Rdata", package="GeneGeneInteR"))
## Importation of the phenotype
resp <- system.file("extdata/response.txt", package="GeneGeneInteR")
Y <- read.csv(resp, header=FALSE)
## estimation of the interaction between the 17 genes with the CLD method -- can take a few minutes
## Not run:
GGI.res <- GGI(Y=Y, snpX=data.imputed$snpX, genes.info=data.imputed$genes.info,method="CLD")
## End(Not run)
## estimation of the interaction between 12 among the 17 genes with the default PCA method
## Selection of 12 genes among 17
dta <- selectSnps(data.imputed$snpX, data.imputed$genes.info, c("bub3","CDSN","Gc","GLRX",
"PADI1","PADI2","PADI4","PADI6","PRKD3","PSORS1C1","SERPINA1","SORBS1"))
GGI.res <- GGI(Y=Y, snpX=dta$snpX, genes.info=dta$genes.info,method="PCA")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.