knitr::opts_knit$set(root.dir = ".") knitr::opts_chunk$set(collapse = TRUE, warning = TRUE, comment = "#>")
This vignette explains how to work with sets using this package. The package provides a class to store the information efficiently and functions to work with it.
To create a TidySet
object, to store associations between elements and sets
image we have several genes associated with a characteristic.
library("BaseSet") gene_lists <- list( geneset1 = c("A", "B"), geneset2 = c("B", "C", "D") ) tidy_set <- tidySet(gene_lists) tidy_set
This is then stored internally in three slots relations()
, elements()
, and sets()
slots.
If you have more information for each element or set it can be added:
gene_data <- data.frame( stat1 = c( 1, 2, 3, 4 ), info1 = c("a", "b", "c", "d") ) tidy_set <- add_column(tidy_set, "elements", gene_data) set_data <- data.frame( Group = c( 100 , 200 ), Column = c("abc", "def") ) tidy_set <- add_column(tidy_set, "sets", set_data) tidy_set
This data is stored in one of the three slots, which can be directly accessed using their getter methods:
relations(tidy_set) elements(tidy_set) sets(tidy_set)
You can add as much information as you want, with the only restriction for a "fuzzy" column for the relations()
. See the Fuzzy sets vignette: vignette("Fuzzy sets", "BaseSet")
.
You can also use the standard R approach with [
:
gene_data <- data.frame( stat2 = c( 4, 4, 3, 5 ), info2 = c("a", "b", "c", "d") ) tidy_set$info1 <- NULL tidy_set[, "elements", c("stat2", "info2")] <- gene_data tidy_set[, "sets", "Group"] <- c("low", "high") tidy_set
Observe that one can add, replace or delete
As you can see it is possible to create a TidySet from a list. More commonly you can create it from a data.frame:
relations <- data.frame(elements = c("a", "b", "c", "d", "e", "f"), sets = c("A", "A", "A", "A", "A", "B"), fuzzy = c(1, 1, 1, 1, 1, 1)) TS <- tidySet(relations) TS
It is also possible from a matrix:
m <- matrix(c(0, 0, 1, 1, 1, 1, 0, 1, 0), ncol = 3, nrow = 3, dimnames = list(letters[1:3], LETTERS[1:3])) m tidy_set <- tidySet(m) tidy_set
Or they can be created from a GeneSet and GeneSetCollection objects.
Additionally it has several function to read files related to sets like the OBO files (getOBO
) and GAF (getGAF
)
It is possible to extract the gene sets as a list
, for use with functions such as lapply
.
as.list(tidy_set)
Or if you need to apply some network methods and you need a matrix, you can create it with incidence
:
incidence(tidy_set)
To work with sets several methods are provided. In general you can provide a new name for the resulting set of the operation, but if you don't one will be automatically provided using naming()
. All methods work with fuzzy and non-fuzzy sets
You can make a union of two sets present on the same object.
BaseSet::union(tidy_set, sets = c("C", "B"), name = "D")
intersection(tidy_set, sets = c("A", "B"), name = "D", keep = TRUE)
The keep argument used here is if you want to keep all the other previous sets:
intersection(tidy_set, sets = c("A", "B"), name = "D", keep = FALSE)
We can look for the complement of one or several sets:
complement_set(tidy_set, sets = c("A", "B"))
Observe that we haven't provided a name for the resulting set but we can provide one if we prefer to
complement_set(tidy_set, sets = c("A", "B"), name = "F")
This is the equivalent of setdiff
, but clearer:
out <- subtract(tidy_set, set_in = "A", not_in = "B", name = "A-B") out name_sets(out) subtract(tidy_set, set_in = "B", not_in = "A", keep = FALSE)
See that in the first case there isn't any element present in B not in set A, but the new set is stored. In the second use case we focus just on the elements that are present on B but not in A.
The number of unique elements and sets can be obtained using the nElements()
and nSets()
methods.
nElements(tidy_set) nSets(tidy_set) nRelations(tidy_set)
If you wish to know all in a single call you can use dim(tidy_set)
: r dim(tidy_set)
.
This summary doesn't provide the number of relations of each set.
You can quickly obtain that with lengths(tidy_set)
: r lengths(tidy_set)
The size of each set can be obtained using the set_size()
method.
set_size(tidy_set)
Conversely, the number of sets associated with each gene is returned by the
element_size()
function.
element_size(tidy_set)
The identifiers of elements and sets can be inspected and renamed using name_elements
and
name_elements(tidy_set) name_elements(tidy_set) <- paste0("Gene", seq_len(nElements(tidy_set))) name_elements(tidy_set) name_sets(tidy_set) name_sets(tidy_set) <- paste0("Geneset", seq_len(nSets(tidy_set))) name_sets(tidy_set)
dplyr
verbsYou can also use mutate()
, filter()
, select()
, group_by()
and other dplyr
verbs with TidySets.
You usually need to activate which three slots you want to affect with activate()
:
library("dplyr") m_TS <- tidy_set %>% activate("relations") %>% mutate(Important = runif(nRelations(tidy_set))) m_TS
You can use activate to select what are the verbs modifying:
set_modified <- m_TS %>% activate("elements") %>% mutate(Pathway = if_else(elements %in% c("Gene1", "Gene2"), "pathway1", "pathway2")) set_modified set_modified %>% deactivate() %>% # To apply a filter independently of where it is filter(Pathway == "pathway1")
If you think you need group_by
usually this could mean that you need a new set.
You can create a new one with group
.
# A new group of those elements in pathway1 and with Important == 1 set_modified %>% deactivate() %>% group(name = "new", Pathway == "pathway1")
set_modified %>% group("pathway1", elements %in% c("Gene1", "Gene2"))
You can use group_by()
but it won't return a TidySet
.
set_modified %>% deactivate() %>% group_by(Pathway, sets) %>% count()
After grouping or mutating sometimes we might be interested in moving a column describing something to other places. We can do by this with:
elements(set_modified) out <- move_to(set_modified, "elements", "relations", "Pathway") relations(out)
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.