RR_featureTally: Computes positive and negative calls upon changing stringency...

Description Usage Arguments Details Value Examples

View source: R/RR_featureTally.R

Description

Computes positive and negative calls upon changing stringency of feature selected networks (binary networks only)

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
RR_featureTally(
  netmat,
  phenoDF,
  TT_STATUS,
  predClass,
  pScore,
  outDir = tempdir(),
  enrichLabels = TRUE,
  enrichedNets,
  maxScore = 30L,
  verbose = FALSE
)

Arguments

netmat

(matrix) output of countPatientsInNet. Should contain all patients in dataset that overlap 1+ network

phenoDF

(data.frame) patient ID and STATUS

TT_STATUS

(list) output of splitTestTrain_partition; should be same as used for cross validation

predClass

(char) class to be predicted

pScore

(list of data.frames) contains 10-fold CV score, one entry for each resampling of the data. The data.frame has two columns: 1) pathway name, 2) pathway score

outDir

(char) path to dir where results should be written

enrichLabels

(logical) was network label enrichment used?

enrichedNets

(list of chars) networks passing network label enrichment

maxScore

(integer) max achievable score for pathways corresponding to N-way resampling

verbose

(logical) print messages

Details

This function computes predictor performance in the context of binary networks, where + and - calls are based on membership (or lack thereof) in feature selected networks. An example would be networks based on CNV occurrence in cellular pathways; in this use case, a + is based on patient membership in feature-selected networks. This function takes the output data from a feature selection exercise and computes the number and fraction of positive and negative calls at each level of feature selection stringency. The output of this function can then be used to compute performance measures such as the ROC or precision-recall curve.

Value

(list) 1) cumulativeFeatScores: pathway name, cumulative score over N-way data resampling. 2) performance_denAllNets: positive,negative calls at each cutoff: network score cutoff (score); num networks at cutoff (numPathways) ; total +, ground truth (pred_tot); + calls (pred_ol); + calls as pct of total (pred_pct); total -, ground truth (other_tot) ; - calls (other_ol) ; - calls as pct of total (other_pct) ; ratio of pred_pct and other_pct (rr) ; min. pred_pct in all resamplings (pred_pct_min) ; max pred_pct in all resamplings (pred_pct_max) ; min other_pct in all resamplings (other_pct_min); max other_pct in all resamplings (other_pct_max) 3) performance_denEnrichedNets: positive, negative calls at each cutoff label enrichment option: format same as performance_denAllNets. However, the denominator here is limited to patients present in networks that pass label enrichment 4) resamplingPerformance: breakdown of performance for each of the resamplings, at each of the cutoffs. This is a list of length 2, one for allNets and one for enrichedNets. The value is a matrix with (resamp * 7) columns and S rows, one row per score. The columns contain the following information per resampling: 1) pred_total: total num patients of predClass 2) pred_OL: num of pred_total with a CNV in the selected net 3) pred_OL_pct: 2) divided by 1) (percent) 4) other_total: total num patients of other class(non-predClass) 5) other_OL: num of other_total with CNV in selected net 6) other_OL_pct: 5) divided by 4) (percent) 7) relEnr: 6) divided by 3).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
data(cnv_patientNetCount) # patient presence/absence in nets
data(cnv_pheno)		# patient ID, label
data(cnv_netScores)	# network scores for resampling
data(cnv_TTstatus)	# train/test status
data(cnv_netPass) 	# nets passing label enrichment

d <- tempdir()
out <- RR_featureTally(cnv_patientNetCount,
		cnv_pheno,cnv_TTstatus,"case",cnv_netScores,
		outDir=d,enrichLabels=TRUE,enrichedNets=cnv_netPass,
		maxScore=30L)
print(summary(out))

netDx documentation built on Dec. 11, 2020, 2:01 a.m.