library(cmlmanuscript)
data("deg_seset_targetaml")
dim(deg.seset)

This document describes analysis for the manuscript "Consensus Machine Learning for Gene Target Selection in Pediatric AML Risk" and utilizes the corresponding package cmlmanuscript.

Data Summaries and Pre-filtering Samples with Risk Group Available

Dimensions of dataset

dim(deg.seset)

[1] 1984 145

Risk group variable

table(deg.seset$Risk.group)

We defined binary risk group from risk group as follows.

deg.seset$deg.risk <- ifelse(deg.seset$Risk.group=="Low", 0,
                             ifelse(deg.seset$Risk.group %in% c("Standard","High"),1,"NA"))
table(deg.seset$deg.risk)

message("table of risk group x binarized risk group")
table(deg.seset$deg.risk, deg.seset$Risk.group)

Risk group filter

We filtered samples according to available risk group status.

degfilt.se <- deg.seset[,which(deg.seset$deg.risk %in% c(0,1))] # subset on deg risk group available
message("dim of filtered se object")
dim(degfilt.se)

Post-filter data summary

Next, we checked for confounding from demographic variables (age and sex) among binary risk group. This ensured age and sex do not confound the binary risk group variable.

# summarize gender and age at first diagnosis
message("table of gender x binarized risk")
table(degfilt.se$Gender,degfilt.se$deg.risk)

message("chisq test of gender x binarized risk")
chisq.test(table(degfilt.se$Gender,degfilt.se$deg.risk)) # p-value = 0.8044, gender evenly dist

degfilt.se$binom.age <- ifelse(degfilt.se$Age.at.Diagnosis.in.Days >= median(degfilt.se$Age.at.Diagnosis.in.Days), "old" ,"young")
message("table of binarized age-at-diag x binarized risk")
table(degfilt.se$binom.age,degfilt.se$deg.risk)

message("chisq results of binarized age-at-diag x binarized risk")
chisq.test(table(degfilt.se$binom.age,degfilt.se$deg.risk))

```

Ablation tests

We performed ablation tests to assess the impact of exclusion bias. Feature exclusion bias reflects the fact that feature assessment, and ultimate exlcusion or exclusion from penalized regression, is biased by inclusion or exclusion of other features. We showed this bias through ablation of penalized regression runs, using three penalization schema (lasso, ridge regression, and elastic net).

Lasso

Ridge regression

Elastic net



metamaden/cmlmanuscript documentation built on Dec. 12, 2019, 7:53 a.m.