RUMIcurve: Information accretion based predictor assessment (across many...

Description Usage Arguments Value Author(s) See Also Examples

View source: R/findRuMi.R

Description

Reads in a (tab-delimited) file containing the true annotations for a set of sequences, a (tab-delimited) file containing the predicted annotations and corresponding scores for the same sequences. Calculates and outputs the average remaining uncertainty, misinformation, and semantic similarity at a series of user-specified thresholds.

Usage

1
2
3
RUMIcurve(ont, organism, increment = 0.05, truefile, predfiles, 
          IAccr = NULL, add.weighted = FALSE, 
          add.prec.rec = FALSE)

Arguments

ont

Character representation of ontology version to use. One of "CC", "MF", or "BP" , corresponding to Cellular Component, Molecular Function, and Biological Process.

organism

A character vector indicating which organism(s) annotation data to use.

increment

A numeric value between 0 and 1 indicating the distance between each threshold that should be calculated. Note that the iteration starts from a threshold of 1, so an increment value of 0.08 will result in the thresholds 0.92, 0.84, 0.76 ... being used.

truefile

A character vector indicating the file from which to read the true annotations for the given sequences. Should be tab-delimited, with the first column containing the sequence ids and the second containing GO accessions.

predfiles

A character vector containing which files to read in as the predicted annotations. Should be tab-delimited, with the first column containing sequences, the second column containing GO accessions, and the third column containing the predictors 0-1 score for that prediction.

IAccr

A variable containing a named numeric vector of IA values for all the GO terms being used that will be used for calculations instead of R packages. This argument is optional.

add.weighted

A boolean indicating whether or not to add calculation of information content weighted versions of RU, MI, and SS to the output.

add.prec.rec

A boolean indicating whether or not to calculate precision, recall and specificity values for the prediction at each threshold and add to the output.

Value

Returns a named list with the same number of elements as the input "predfiles". Each element is a data frame containing all of the user-requested values for the data at each threshold.

Author(s)

Ian Gonzalez and Wyatt Clark

See Also

computeIA findRUMI

Examples

1
2
3
4
5
6
# Using test data sets from SemDist, plot a RUMI curve:
truefile <- system.file("extdata", "MFO_LABELS_TEST.txt", package="SemDist")
predfile <- system.file("extdata", "MFO_PREDS_TEST.txt", package="SemDist")
avgRUMIvals <- RUMIcurve("MF", "human", 0.05, truefile, predfile)
firstset <- avgRUMIvals[[1]]
plot(firstset$RU, firstset$MI)

Example output

Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package:BiocGenericsThe following objects are masked frompackage:parallel:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked frompackage:stats:

    IQR, mad, sd, var, xtabs

The following objects are masked frompackage:base:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors

Attaching package:S4VectorsThe following object is masked frompackage:base:

    expand.grid

Loading required package: GO.db

Loading required package: annotate
Loading required package: XML
Working on data for file: /usr/lib/R/site-library/SemDist/extdata/MFO_PREDS_TEST.txt

Getting true terms

Getting true IAs

Now working on threshold: 0.95

Getting sequence predicted terms.

Getting IA values for predicted terms.

Doing the same for the intersect.

RU: 10.0425842861668, MI: 1.22712368033041

Now working on threshold: 0.9

Getting sequence predicted terms.

Getting IA values for predicted terms.

Doing the same for the intersect.

RU: 9.41517267591757, MI: 1.88394232952515

Now working on threshold: 0.85

Getting sequence predicted terms.

Getting IA values for predicted terms.

Doing the same for the intersect.

RU: 8.88119145459546, MI: 2.40819024669426

Now working on threshold: 0.8

Getting sequence predicted terms.

Getting IA values for predicted terms.

Doing the same for the intersect.

RU: 8.49203531817439, MI: 3.21601035997672

Now working on threshold: 0.75

Getting sequence predicted terms.

Getting IA values for predicted terms.

Doing the same for the intersect.

RU: 8.24157466528671, MI: 3.94991555376817

Now working on threshold: 0.7

Getting sequence predicted terms.

Getting IA values for predicted terms.

Doing the same for the intersect.

RU: 7.74350530058713, MI: 4.82597198319498

Now working on threshold: 0.65

Getting sequence predicted terms.

Getting IA values for predicted terms.

Doing the same for the intersect.

RU: 7.25367694130847, MI: 5.98367946270065

Now working on threshold: 0.6

Getting sequence predicted terms.

Getting IA values for predicted terms.

Doing the same for the intersect.

RU: 6.88641834354203, MI: 7.44642060437438

Now working on threshold: 0.55

Getting sequence predicted terms.

Getting IA values for predicted terms.

Doing the same for the intersect.

RU: 6.33710934362437, MI: 10.1202877672894

Now working on threshold: 0.5

Getting sequence predicted terms.

Getting IA values for predicted terms.

Doing the same for the intersect.

RU: 5.66909734568201, MI: 15.0188503448115

Now working on threshold: 0.45

Getting sequence predicted terms.

Getting IA values for predicted terms.

Doing the same for the intersect.

RU: 5.05564397771161, MI: 22.2381391115589

Now working on threshold: 0.4

Getting sequence predicted terms.

Getting IA values for predicted terms.

Doing the same for the intersect.

RU: 4.35446728729524, MI: 36.5943643395658

Now working on threshold: 0.35

Getting sequence predicted terms.

Getting IA values for predicted terms.

Doing the same for the intersect.

RU: 3.68508827269144, MI: 56.6821625238097

Now working on threshold: 0.3

Getting sequence predicted terms.

Getting IA values for predicted terms.

Doing the same for the intersect.

RU: 2.95026896346243, MI: 90.598480334505

Now working on threshold: 0.25

Getting sequence predicted terms.

Getting IA values for predicted terms.

Doing the same for the intersect.

RU: 2.30162606298156, MI: 123.861596712332

Now working on threshold: 0.2

Getting sequence predicted terms.

Getting IA values for predicted terms.

Doing the same for the intersect.

RU: 2.12285861213662, MI: 137.051546116007

Now working on threshold: 0.15

Getting sequence predicted terms.

Getting IA values for predicted terms.

Doing the same for the intersect.

RU: 2.11234725107343, MI: 137.547222728205

Now working on threshold: 0.0999999999999999

Getting sequence predicted terms.

Now working on threshold: 0.05

Getting sequence predicted terms.

SemDist documentation built on Nov. 8, 2020, 8:27 p.m.