View source: R/KW.hit.express.R
KW.hit.express | R Documentation |
Function uses Kruskal-Wallis test to evaluate the association between lesion groups and expression level of the same corresponding gene.
KW.hit.express(alex.data, gene.annotation, min.grp.size = NULL)
alex.data |
output of the alex.prep.lsn.expr function. It's a list of three data tables that include "row.mtch", "alex.expr" with expression data, "alex.lsn" with lesion data. Rows of alex.expr, and "alex.lsn" matrices are ordered by gene ensembl IDs and columns are ordered by patient ID. |
gene.annotation |
Gene annotation data either provided by the user or retrieved from ensembl BioMart database using get.ensembl.annotation function included in the GRIN2.0 library. Data.frame should has four columns: "gene" which is the ensembl ID of annotated genes, "chrom" which is the chromosome on which the gene is located, "loc.start" which is the gene start position, and "loc.end" the gene end position. |
min.grp.size |
Minimum number of subjects in a lesion group to be included in the KW test (there should be at least two groups with number of patients > min.grp.size) to run the KW test for a certain gene. |
The function uses the ensembl IDs in each row of the row.mtch file and run the Kruskal-Wallis test for association between lesion groups of the gene in the "hit.row" column with expression level of the gene in the "expr.row" column. IDs in the two columns should be the same if the KW test will be used to evaluate association between lesion groups and expression level of the same corresponding gene. If the same patient is affected with multiple types of lesions in the same gene for example gain AND mutations, the entry will be denoted as "multiple" and patients without any type of lesions will be coded as "none".
A data table with multiple columns that include:
gene |
ensembl ID of the gene of interest. |
gene.name |
Gene name of the gene of interest. |
p.KW |
Kruskal-Wallis test p-value. |
q.KW |
Kruskal-Wallis test FDR adjusted q-value. |
_n.subjects |
Multiple columns with number of subjects with each type of lesion affecting the gene, number of subjects without any lesion and number of subjects with multiple types of lesions. |
_mean |
Multiple columns with mean expression level of the gene in subjects with each type of lesion, mean expression in subjects without any lesion and mean expression in subjects with multiple types of lesions. |
_median |
Multiple columns with median expression of the gene in subjects with each type of lesion, median expression in subjects without any lesion and median expression in subjects with multiple types of lesions. |
_sd |
Multiple columns with standard deviation of the expression level of the gene in subjects with each type of lesion, standard deviation in subjects without any lesion and standard deviation in subjects with multiple types of lesions. |
Abdelrahman Elsayed abdelrahman.elsayed@stjude.org and Stanley Pounds stanley.pounds@stjude.org
Myles Hollander and Douglas A. Wolfe (1973), Nonparametric Statistical Methods. New York: John Wiley & Sons. Pages 115–120.
Cao, X., Elsayed, A. H., & Pounds, S. B. (2023). Statistical Methods Inspired by Challenges in Pediatric Cancer Multi-omics.
alex.prep.lsn.expr()
data(expr.data)
data(lesion.data)
data(hg19.gene.annotation)
# prepare expression, lesion data and return the set of genes with both types of data available
# ordered by gene IDs in rows and patient IDs in columns:
alex.data=alex.prep.lsn.expr(expr.data, lesion.data,
hg19.gene.annotation, min.expr=1, min.pts.lsn=5)
# run Kruskal-Wallis test for association between lesion groups and expression level of the
# same corresponding gene:
alex.kw.results=KW.hit.express(alex.data, hg19.gene.annotation, min.grp.size=5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.