Description Usage Arguments Simple utilities to select values in a numeric vector Pairing data Note Author(s) See Also Examples
These functions allow to pair/match values by name between any two
named vectors. The transferValues
method allows to map/transfer
values from one type of id (e.g. mature miRNA name) to any other id
defined in the MirhostDb
database. Combined, these functions
enable to map between expression/regulation values of miRNAs and their
corresponding host genes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | chooseAll(x, y)
chooseClosestForEach(x, y, abs=TRUE)
chooseOrderedValue(x, y, orderFun=function(z){order(z, decreasing=TRUE)})
doPairData(x, y, chooseFunX=chooseAll, chooseFunY=chooseAll)
doSelectData(x, chooseFunX=chooseAll)
## S4 method for signature 'numeric,MirhostDb'
transferValues(x, object,
xNamesAre="mat_mirna_name",
toNames="pre_mirna_name",
solveFun=chooseAll,
filter=list(),
na.rm=FALSE)
|
(in alphabetical order)
abs |
For |
chooseFunX |
The function applied to input vector |
chooseFunY |
The function to be applied to input vector |
filter |
For |
na.rm |
For |
object |
For |
orderFun |
For |
solveFun |
For |
toNames |
Defines to which ids the values should be transfered/mapped. Allowed values
are |
x |
A named numerical vector. |
xNamesAre |
Defines which type of ids the names of |
y |
A named numerical vector. |
These functions allow to select specific, or all, values for same-named entities in a numeric vector.
This function returns a numeric sequence from 1 to
length(x)
.
This function returns (by default) the index of the largest value
in x
.
This function returns for each value in y
, the index of the
closest value in x
. The closest value is defined as the value with
the smallest abolute difference between values. The length of the
returned numeric vector corresponds to the length of y
.
Selects values for same named entities in a submitted numeric
vector. The function uses one of the choose functions
defined above to select the value and returns a data.frame
containing the selected values in column "x"
, their index
in the original input argument x
in column "x.idx"
and their corresponding name in column "name"
.
This function takes two named numeric vectors and matches/pairs values
between them based on their names.
The length of the two vectors do not have to match, names not
present in one or the other vector are added.
The aim of the function is to pair entries in input vector x
with
values in input vector y
based on their names, i.e. values
with the same name are matched with each other and returned in the
same row in the result table. If x
and/or y
each
contains entries with the same name, values are repeated such that
each value in x
is paired with each value in y
with the
same name (i.e. if x
contains 2 values with the name
"a"
, and y
3 with that name, the function pairs
each of the 2 values in x
with each of the 3 values in
y
resulting in 6 rows in the result table for name
"a"
). This default behaviour can be changed by specifying a
function different than chooseAll
for argument chooseFunX
or chooseFunY
. Note that chooseFunX
is applied
first, before chooseFunY
, thus resulting eventually in
different results when arguments x
and y
are
swapped. See examples and note below for more information.
The function returns a data.frame
with columns
"name"
(the common name for the values in x
and
y
), "x"
(the values from input vector x
),
"x.idx"
(an integer value representing the original index of
the value in the input vector x
), "y"
(the values from
the input vector y
, paired to x) and "y.idx"
(the
index of the value in the input vector y
).
This function takes a named numeric vector as input and maps the
names to the new type of ids specified by toNames
. Only
mappings between ids defined in the MirhostDb
are supported
(i.e. "pre_mirna_id"
, "pre_mirna_name"
, "tx_id"
,
"mat_mirna_id"
, "mat_mirna_name"
, "gene_id"
,
"gene_name"
or "probeset_id"
).
Multi-mappings between ids (e.g. "mat_mirna_name"
with
"pre_mirna_name"
as each pre-miRNA encodes two mature
miRNAs) can be resolved with solveFun
. The same functions
than for chooseFunX
of the pairData
are supported,
i.e. chooseAll
(selects all ids eventually repeating the
value) or chooseOrderedValue
(selects the value of the first
id ordered by the sorting function). See below for examples.
The method returns a data.frame
with 3 columns, the first
("x"
) with the numeric values from the input argument
x
, the second (named according to argument toName
)
with the ids to which the values have been mapped and the third
(named according to argument xNamesAre
) with the original
names in the input argument x
.
The chooseFunX
and chooseFunY
can be used to modify
the way values are selected for same named entries within the
input vector x
or y
. The chooseAll
function
simply selects all values. chooseOrderedValue
chooses the
first value of all values with the same name ordered according to
the specified scheme (selecting by default the largest
value). chooseClosestBetween
chooses the one value in its
input argument x
which is closest (most similar value) to
any of the values in its input argument y
.
Any function returning indices can be defined and used with
arguments chooseFunX
and chooseFunY
.
Possible scenarios for pairing of same-name values:
Specify chooseOrderedValue
for both, chooseFunX
and chooseFunY
.
x
and pair that with the
most similar in y
Specify chooseOrderedValue
for chooseFunX
and
chooseClosestForEach
for chooseFunY
.
Johannes Rainer
MirhostDb
,
listColumns
,
hostgenes
,
hosttx
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | ########################
##
## Simple value pairing between two numeric vectors.
##
## A and B are named numeric vectors, names corresponding e.g. to
## gene names and values to their expression. Each name can occur
## several times.
##
A <- 1:10
names(A) <- c("b", "a", "c", "b", "a", "a", "e", "f", "a", "c")
B <- c(2, 5, 3, 2, 5, 16, 4, 20)
names(B) <- c("a", "b", "a", "c", "c", "c", "n", "f")
## In the most generic case we want to pair each value in A with each
## value in B with the same name. Each value in A will be repeated as
## many times as there are values with the same name in B and vice
## versa.
doPairData(A, B)
## Next we want to pair all values in A to the value in B with the
## largest value (for the same name). This returns each value in A
## only once.
doPairData(A, B, chooseFunY=chooseOrderedValue)
## Next we select the largest value for all same-named entries for
## both input vectors.
doPairData(A, B, chooseFunX=chooseOrderedValue,
chooseFunY=chooseOrderedValue)
## Next we first select the largest among all values with the same name
## in x and select from y the value that is closest to that (most similar)
doPairData(A, B, chooseFunX=chooseOrderedValue,
chooseFunY=chooseClosestForEach)
## Next we reverse the order, i.e. choose first for each value in B the closest
## value in A and next select the largest balue in B.
doPairData(A, B, chooseFunX=chooseClosestForEach,
chooseFunY=chooseOrderedValue)
## As we see, the result is different.
## At last we select all pairs of most similar values.
doPairData(A, B, chooseFunX=chooseClosestForEach,
chooseFunY=chooseClosestForEach)
## The function does the following:
## for name b: A x(1, 4), B y(5):
## choose x: (4) (closest to 5)
## choose y: (5) (closest to 4)
## for name a: A x(2, 5, 6, 9), B y(2, 3)
## choose x: (2, 2)
## choose y: (2, 2)
## for name c: A x(3, 10), B y(2, 5, 8)
## choose x: (3, 3, 10)
## choose y: (2, 2, 8)
## for name f: A x(8), B y(20)
## choose x: (8)
## choose y: (20)
## for name e: A x(7), B y(NA)
## for name n: A x(NA), B y(4)
## At last unique is called on the resulting data.frame that reduced the
## values for a and c.
## another example:
C <- c(3, 8, 7, 1)
names(C) <- rep("a", length(C))
D <- c(9, 2, 4, 20, 3)
names(D) <- rep("a", length(D))
doPairData(C, D, chooseFunX=chooseClosestForEach,
chooseFunY=chooseClosestForEach)
##################
##
## simple examples for the "chooseFuns"
##
chooseAll(1:5)
chooseOrderedValue(1:5)
chooseClosestForEach(1:10, 5:1)
someVals <- 1:10
names(someVals) <- c("a", "c", "a", "b", "c", "a", "a", "b", "f", "d")
## just return all of them.
doSelectData(someVals)
## select for each the largest value
doSelectData(someVals, chooseFunX=chooseOrderedValue)
##################
##
## mapping/transfering values
##
## Map mature miRNA expression values to pre-miRNA ids.
## Each pair of mature miRNAs is encoded in a single pre-miRNA. In addition,
## miR-16-5p is encoded in two different pre-miRNAs.
library(MirhostDb.Hsapiens.v75.v20)
mhg <- MirhostDb.Hsapiens.v75.v20
miRNA.exprs <- c(8, 8.3, 5.6, 9.5, 4.6, 13.1)
names(miRNA.exprs) <- c("hsa-miR-15b-3p", "hsa-miR-15b-5p", "hsa-miR-16-5p",
"hsa-miR-16-1-3p", "hsa-miR-223-3p", "hsa-miR-223-5p")
transferValues(miRNA.exprs, mhg, xNamesAre="mat_mirna_name",
toNames="pre_mirna_name")
## we can also use chooseOrderedValue as solve function to select for each
## pre-miRNA only the mature miRNA with the highest expression. That way we get
## a unique value for each pre-miRNA.
transferValues(miRNA.exprs, mhg, xNamesAre="mat_mirna_name",
toNames="pre_mirna_name", solveFun=chooseOrderedValue)
## A "standard" filter that might be useful to map miRNA data to genes.
df <- list(DatabaseFilter("core"),
GeneBiotypeFilter("miRNA", condition="!="),
ArrayFilter("HG-U133_Plus_2"))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.