matchValues: Pairing host gene and mature miRNA expression data

Description Usage Arguments Simple utilities to select values in a numeric vector Pairing data Note Author(s) See Also Examples

Description

These functions allow to pair/match values by name between any two named vectors. The transferValues method allows to map/transfer values from one type of id (e.g. mature miRNA name) to any other id defined in the MirhostDb database. Combined, these functions enable to map between expression/regulation values of miRNAs and their corresponding host genes.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
chooseAll(x, y)

chooseClosestForEach(x, y, abs=TRUE)

chooseOrderedValue(x, y, orderFun=function(z){order(z, decreasing=TRUE)})

doPairData(x, y, chooseFunX=chooseAll, chooseFunY=chooseAll)

doSelectData(x, chooseFunX=chooseAll)

## S4 method for signature 'numeric,MirhostDb'
transferValues(x, object,
                          xNamesAre="mat_mirna_name",
                          toNames="pre_mirna_name",
                          solveFun=chooseAll,
                          filter=list(),
                          na.rm=FALSE)

Arguments

(in alphabetical order)

abs

For chooseClosestBetween: if the values should be ordered by the absolute value of their differences.

chooseFunX

The function applied to input vector x to select a (or all) value(s) among all values with the same name. See description below for more details.

chooseFunY

The function to be applied to input vector y.

filter

For transferValues: filters to be used in the query to retrieve the mapping between ids. To map values from miRNAs to host genes or transcripts it is advisable to use a DatabaseFilter to restrict to genes defined in one of the databases ("core", "vega" or "otherfeatures"), otherwise the values are mapped to a redundant list of genes (different ids for the same genes defined in the different databases).

na.rm

For transferValues: for na.rm=FALSE (default) the function returns also input values for which no mapping between ids can be established. Otherwise, only values for which the names can be mapped to ids of the type toNames are returned.

object

For transferValues: the MirhostDb object.

orderFun

For chooseOrderedValue: the function how values should be ordered.

solveFun

For transferValues: the function that should be applied to resolve multi-mappings between ids. See description and examples below for more details.

toNames

Defines to which ids the values should be transfered/mapped. Allowed values are "pre_mirna_id", "pre_mirna_name", "tx_id", "mat_mirna_id", "mat_mirna_name", "gene_id", "gene_name" or "probeset_id".

x

A named numerical vector.

xNamesAre

Defines which type of ids the names of x are. See toNames for supported type of ids.

y

A named numerical vector.

Simple utilities to select values in a numeric vector

These functions allow to select specific, or all, values for same-named entities in a numeric vector.

chooseAll

This function returns a numeric sequence from 1 to length(x).

chooseOrderedValue

This function returns (by default) the index of the largest value in x.

chooseClosestForEach

This function returns for each value in y, the index of the closest value in x. The closest value is defined as the value with the smallest abolute difference between values. The length of the returned numeric vector corresponds to the length of y.

doSelectData

Selects values for same named entities in a submitted numeric vector. The function uses one of the choose functions defined above to select the value and returns a data.frame containing the selected values in column "x", their index in the original input argument x in column "x.idx" and their corresponding name in column "name".

Pairing data

doPairData

This function takes two named numeric vectors and matches/pairs values between them based on their names. The length of the two vectors do not have to match, names not present in one or the other vector are added. The aim of the function is to pair entries in input vector x with values in input vector y based on their names, i.e. values with the same name are matched with each other and returned in the same row in the result table. If x and/or y each contains entries with the same name, values are repeated such that each value in x is paired with each value in y with the same name (i.e. if x contains 2 values with the name "a", and y 3 with that name, the function pairs each of the 2 values in x with each of the 3 values in y resulting in 6 rows in the result table for name "a"). This default behaviour can be changed by specifying a function different than chooseAll for argument chooseFunX or chooseFunY. Note that chooseFunX is applied first, before chooseFunY, thus resulting eventually in different results when arguments x and y are swapped. See examples and note below for more information.

The function returns a data.frame with columns "name" (the common name for the values in x and y), "x" (the values from input vector x), "x.idx" (an integer value representing the original index of the value in the input vector x), "y" (the values from the input vector y, paired to x) and "y.idx" (the index of the value in the input vector y).

transferValues

This function takes a named numeric vector as input and maps the names to the new type of ids specified by toNames. Only mappings between ids defined in the MirhostDb are supported (i.e. "pre_mirna_id", "pre_mirna_name", "tx_id", "mat_mirna_id", "mat_mirna_name", "gene_id", "gene_name" or "probeset_id").

Multi-mappings between ids (e.g. "mat_mirna_name" with "pre_mirna_name" as each pre-miRNA encodes two mature miRNAs) can be resolved with solveFun. The same functions than for chooseFunX of the pairData are supported, i.e. chooseAll (selects all ids eventually repeating the value) or chooseOrderedValue (selects the value of the first id ordered by the sorting function). See below for examples.

The method returns a data.frame with 3 columns, the first ("x") with the numeric values from the input argument x, the second (named according to argument toName) with the ids to which the values have been mapped and the third (named according to argument xNamesAre) with the original names in the input argument x.

Note

The chooseFunX and chooseFunY can be used to modify the way values are selected for same named entries within the input vector x or y. The chooseAll function simply selects all values. chooseOrderedValue chooses the first value of all values with the same name ordered according to the specified scheme (selecting by default the largest value). chooseClosestBetween chooses the one value in its input argument x which is closest (most similar value) to any of the values in its input argument y.

Any function returning indices can be defined and used with arguments chooseFunX and chooseFunY.

Possible scenarios for pairing of same-name values:

Pair the largest values in each vector

Specify chooseOrderedValue for both, chooseFunX and chooseFunY.

Get the largest value in x and pair that with the most similar in y

Specify chooseOrderedValue for chooseFunX and chooseClosestForEach for chooseFunY.

Author(s)

Johannes Rainer

See Also

MirhostDb, listColumns, hostgenes, hosttx

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
########################
##
##  Simple value pairing between two numeric vectors.
##
## A and B are named numeric vectors, names corresponding e.g. to
## gene names and values to their expression. Each name can occur
## several times.
##
A <- 1:10
names(A) <- c("b", "a", "c", "b", "a", "a", "e", "f", "a", "c")

B <- c(2, 5, 3, 2, 5, 16, 4, 20)
names(B) <- c("a", "b", "a", "c", "c", "c", "n", "f")

## In the most generic case we want to pair each value in A with each
## value in B with the same name. Each value in A will be repeated as
## many times as there are values with the same name in B and vice
## versa.
doPairData(A, B)

## Next we want to pair all values in A to the value in B with the
## largest value (for the same name). This returns each value in A
## only once.
doPairData(A, B, chooseFunY=chooseOrderedValue)

## Next we select the largest value for all same-named entries for
## both input vectors.
doPairData(A, B, chooseFunX=chooseOrderedValue,
           chooseFunY=chooseOrderedValue)

## Next we first select the largest among all values with the same name
## in x and select from y the value that is closest to that (most similar)
doPairData(A, B, chooseFunX=chooseOrderedValue,
           chooseFunY=chooseClosestForEach)

## Next we reverse the order, i.e. choose first for each value in B the closest
## value in A and next select the largest balue in B.
doPairData(A, B, chooseFunX=chooseClosestForEach,
           chooseFunY=chooseOrderedValue)
## As we see, the result is different.

## At last we select all pairs of most similar values.
doPairData(A, B, chooseFunX=chooseClosestForEach,
              chooseFunY=chooseClosestForEach)
## The function does the following:
## for name b: A x(1, 4), B y(5):
##             choose x: (4) (closest to 5)
##             choose y: (5) (closest to 4)
## for name a: A x(2, 5, 6, 9), B y(2, 3)
##             choose x: (2, 2)
##             choose y: (2, 2)
## for name c: A x(3, 10), B y(2, 5, 8)
##             choose x: (3, 3, 10)
##             choose y: (2, 2, 8)
## for name f: A x(8), B y(20)
##             choose x: (8)
##             choose y: (20)
## for name e: A x(7), B y(NA)
## for name n: A x(NA), B y(4)
## At last unique is called on the resulting data.frame that reduced the
## values for a and c.

## another example:
C <- c(3, 8, 7, 1)
names(C) <- rep("a", length(C))
D <- c(9, 2, 4, 20, 3)
names(D) <- rep("a", length(D))
doPairData(C, D, chooseFunX=chooseClosestForEach,
           chooseFunY=chooseClosestForEach)


##################
##
##  simple examples for the "chooseFuns"
##
chooseAll(1:5)

chooseOrderedValue(1:5)

chooseClosestForEach(1:10, 5:1)

someVals <- 1:10
names(someVals) <- c("a", "c", "a", "b", "c", "a", "a", "b", "f", "d")
## just return all of them.
doSelectData(someVals)
## select for each the largest value
doSelectData(someVals, chooseFunX=chooseOrderedValue)



##################
##
##  mapping/transfering values
##
## Map mature miRNA expression values to pre-miRNA ids.
## Each pair of mature miRNAs is encoded in a single pre-miRNA. In addition,
## miR-16-5p is encoded in two different pre-miRNAs.
library(MirhostDb.Hsapiens.v75.v20)
mhg <- MirhostDb.Hsapiens.v75.v20
miRNA.exprs <- c(8, 8.3, 5.6, 9.5, 4.6, 13.1)
names(miRNA.exprs) <- c("hsa-miR-15b-3p", "hsa-miR-15b-5p", "hsa-miR-16-5p",
                        "hsa-miR-16-1-3p", "hsa-miR-223-3p", "hsa-miR-223-5p")

transferValues(miRNA.exprs, mhg, xNamesAre="mat_mirna_name",
               toNames="pre_mirna_name")
## we can also use chooseOrderedValue as solve function to select for each
## pre-miRNA only the mature miRNA with the highest expression. That way we get
## a unique value for each pre-miRNA.
transferValues(miRNA.exprs, mhg, xNamesAre="mat_mirna_name",
               toNames="pre_mirna_name", solveFun=chooseOrderedValue)

## A "standard" filter that might be useful to map miRNA data to genes.
df <- list(DatabaseFilter("core"),
           GeneBiotypeFilter("miRNA", condition="!="),
           ArrayFilter("HG-U133_Plus_2"))

jotsetung/mirhostgenes documentation built on May 19, 2019, 9:42 p.m.