knitr::opts_knit$set(root.dir = ".") knitr::opts_chunk$set(collapse = TRUE, warning = TRUE, comment = "#>")
This vignette explains how to work with sets using this package. The package provides a class to store the information efficiently and functions to work with it.
To create a TidySet object, to store associations between elements and sets
image we have several genes associated with a characteristic.
library("BaseSet") gene_lists <- list( geneset1 = c("A", "B"), geneset2 = c("B", "C", "D") ) tidy_set <- tidySet(gene_lists) tidy_set
This is then stored internally in three slots relations(), elements(), and sets() slots.
If you have more information for each element or set it can be added:
gene_data <- data.frame( stat1 = c( 1, 2, 3, 4 ), info1 = c("a", "b", "c", "d") ) tidy_set <- add_column(tidy_set, "elements", gene_data) set_data <- data.frame( Group = c( 100 , 200 ), Column = c("abc", "def") ) tidy_set <- add_column(tidy_set, "sets", set_data) tidy_set
This data is stored in one of the three slots, which can be directly accessed using their getter methods:
relations(tidy_set) elements(tidy_set) sets(tidy_set)
You can add as much information as you want, with the only restriction for a "fuzzy" column for the relations(). See the Fuzzy sets vignette: vignette("Fuzzy sets", "BaseSet").
You can also use the standard R approach with [:
gene_data <- data.frame( stat2 = c( 4, 4, 3, 5 ), info2 = c("a", "b", "c", "d") ) tidy_set$info1 <- NULL tidy_set[, "elements", c("stat2", "info2")] <- gene_data tidy_set[, "sets", "Group"] <- c("low", "high") tidy_set
Observe that one can add, replace or delete
As you can see it is possible to create a TidySet from a list. More commonly you can create it from a data.frame:
relations <- data.frame(elements = c("a", "b", "c", "d", "e", "f"), sets = c("A", "A", "A", "A", "A", "B"), fuzzy = c(1, 1, 1, 1, 1, 1)) TS <- tidySet(relations) TS
It is also possible from a matrix:
m <- matrix(c(0, 0, 1, 1, 1, 1, 0, 1, 0), ncol = 3, nrow = 3, dimnames = list(letters[1:3], LETTERS[1:3])) m tidy_set <- tidySet(m) tidy_set
Or they can be created from a GeneSet and GeneSetCollection objects.
Additionally it has several function to read files related to sets like the OBO files (getOBO) and GAF (getGAF)
It is possible to extract the gene sets as a list, for use with functions such as lapply.
as.list(tidy_set)
Or if you need to apply some network methods and you need a matrix, you can create it with incidence:
incidence(tidy_set)
To work with sets several methods are provided. In general you can provide a new name for the resulting set of the operation, but if you don't one will be automatically provided using naming(). All methods work with fuzzy and non-fuzzy sets
You can make a union of two sets present on the same object.
BaseSet::union(tidy_set, sets = c("C", "B"), name = "D")
intersection(tidy_set, sets = c("A", "B"), name = "D", keep = TRUE)
The keep argument used here is if you want to keep all the other previous sets:
intersection(tidy_set, sets = c("A", "B"), name = "D", keep = FALSE)
We can look for the complement of one or several sets:
complement_set(tidy_set, sets = c("A", "B"))
Observe that we haven't provided a name for the resulting set but we can provide one if we prefer to
complement_set(tidy_set, sets = c("A", "B"), name = "F")
This is the equivalent of setdiff, but clearer:
out <- subtract(tidy_set, set_in = "A", not_in = "B", name = "A-B") out name_sets(out) subtract(tidy_set, set_in = "B", not_in = "A", keep = FALSE)
See that in the first case there isn't any element present in B not in set A, but the new set is stored. In the second use case we focus just on the elements that are present on B but not in A.
The number of unique elements and sets can be obtained using the nElements() and nSets() methods.
nElements(tidy_set) nSets(tidy_set) nRelations(tidy_set)
If you wish to know all in a single call you can use dim(tidy_set): r dim(tidy_set).
This summary doesn't provide the number of relations of each set.
You can quickly obtain that with lengths(tidy_set): r lengths(tidy_set)
The size of each set can be obtained using the set_size() method.
set_size(tidy_set)
Conversely, the number of sets associated with each gene is returned by the
element_size() function.
element_size(tidy_set)
The identifiers of elements and sets can be inspected and renamed using name_elements and
name_elements(tidy_set) name_elements(tidy_set) <- paste0("Gene", seq_len(nElements(tidy_set))) name_elements(tidy_set) name_sets(tidy_set) name_sets(tidy_set) <- paste0("Geneset", seq_len(nSets(tidy_set))) name_sets(tidy_set)
dplyr verbsYou can also use mutate(), filter(), select(), group_by() and other dplyr verbs with TidySets.
You usually need to activate which three slots you want to affect with activate():
library("dplyr") m_TS <- tidy_set %>% activate("relations") %>% mutate(Important = runif(nRelations(tidy_set))) m_TS
You can use activate to select what are the verbs modifying:
set_modified <- m_TS %>% activate("elements") %>% mutate(Pathway = if_else(elements %in% c("Gene1", "Gene2"), "pathway1", "pathway2")) set_modified set_modified %>% deactivate() %>% # To apply a filter independently of where it is filter(Pathway == "pathway1")
If you think you need group_by usually this could mean that you need a new set.
You can create a new one with group.
# A new group of those elements in pathway1 and with Important == 1 set_modified %>% deactivate() %>% group(name = "new", Pathway == "pathway1")
set_modified %>% group("pathway1", elements %in% c("Gene1", "Gene2"))
You can use group_by() but it won't return a TidySet.
set_modified %>% deactivate() %>% group_by(Pathway, sets) %>% count()
After grouping or mutating sometimes we might be interested in moving a column describing something to other places. We can do by this with:
elements(set_modified) out <- move_to(set_modified, "elements", "relations", "Pathway") relations(out)
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.