In llrs/BaseSet: Working with Sets the Tidy Way

knitr::opts_knit$set(root.dir = ".")
knitr::opts_chunk$set(collapse = TRUE, 
                      warning = TRUE,
                      comment = "#>")

Getting started

This vignette explains how to work with sets using this package. The package provides a class to store the information efficiently and functions to work with it.

The TidySet class

To create a TidySet object, to store associations between elements and sets image we have several genes associated with a characteristic.

library("BaseSet")
gene_lists <- list(
    geneset1 = c("A", "B"),
    geneset2 = c("B", "C", "D")
)
tidy_set <- tidySet(gene_lists)
tidy_set

This is then stored internally in three slots relations(), elements(), and sets() slots.

If you have more information for each element or set it can be added:

gene_data <- data.frame(
    stat1     = c( 1,   2,   3,   4 ),
    info1     = c("a", "b", "c", "d")
)

tidy_set <- add_column(tidy_set, "elements", gene_data)
set_data <- data.frame(
    Group     = c( 100 ,  200 ),
    Column     = c("abc", "def")
)
tidy_set <- add_column(tidy_set, "sets", set_data)
tidy_set

This data is stored in one of the three slots, which can be directly accessed using their getter methods:

relations(tidy_set)
elements(tidy_set)
sets(tidy_set)

You can add as much information as you want, with the only restriction for a "fuzzy" column for the relations(). See the Fuzzy sets vignette: vignette("Fuzzy sets", "BaseSet").

You can also use the standard R approach with [:

gene_data <- data.frame(
    stat2     = c( 4,   4,   3,   5 ),
    info2     = c("a", "b", "c", "d")
)

tidy_set$info1 <- NULL
tidy_set[, "elements", c("stat2", "info2")] <- gene_data
tidy_set[, "sets", "Group"] <- c("low", "high")
tidy_set

Observe that one can add, replace or delete

Creating a TidySet

As you can see it is possible to create a TidySet from a list. More commonly you can create it from a data.frame:

relations <- data.frame(elements = c("a", "b", "c", "d", "e", "f"), 
                        sets = c("A", "A", "A", "A", "A", "B"), 
                        fuzzy = c(1, 1, 1, 1, 1, 1))
TS <- tidySet(relations)
TS

It is also possible from a matrix:

m <- matrix(c(0, 0, 1, 1, 1, 1, 0, 1, 0), ncol = 3, nrow = 3,  
               dimnames = list(letters[1:3], LETTERS[1:3]))
m
tidy_set <- tidySet(m)
tidy_set

Or they can be created from a GeneSet and GeneSetCollection objects. Additionally it has several function to read files related to sets like the OBO files (getOBO) and GAF (getGAF)

Converting to other formats

It is possible to extract the gene sets as a list, for use with functions such as lapply.

as.list(tidy_set)

Or if you need to apply some network methods and you need a matrix, you can create it with incidence:

incidence(tidy_set)

Operations with sets

To work with sets several methods are provided. In general you can provide a new name for the resulting set of the operation, but if you don't one will be automatically provided using naming(). All methods work with fuzzy and non-fuzzy sets

Union

You can make a union of two sets present on the same object.

BaseSet::union(tidy_set, sets = c("C", "B"), name = "D")

Intersection

intersection(tidy_set, sets = c("A", "B"), name = "D", keep = TRUE)

The keep argument used here is if you want to keep all the other previous sets:

intersection(tidy_set, sets = c("A", "B"), name = "D", keep = FALSE)

Complement

We can look for the complement of one or several sets:

complement_set(tidy_set, sets = c("A", "B"))

Observe that we haven't provided a name for the resulting set but we can provide one if we prefer to

complement_set(tidy_set, sets = c("A", "B"), name = "F")

Subtract

This is the equivalent of setdiff, but clearer:

out <- subtract(tidy_set, set_in = "A", not_in = "B", name = "A-B")
out
name_sets(out)
subtract(tidy_set, set_in = "B", not_in = "A", keep = FALSE)

See that in the first case there isn't any element present in B not in set A, but the new set is stored. In the second use case we focus just on the elements that are present on B but not in A.

Additional information

The number of unique elements and sets can be obtained using the nElements() and nSets() methods.

nElements(tidy_set)
nSets(tidy_set)
nRelations(tidy_set)

If you wish to know all in a single call you can use dim(tidy_set): r dim(tidy_set). This summary doesn't provide the number of relations of each set. You can quickly obtain that with lengths(tidy_set): r lengths(tidy_set)

The size of each set can be obtained using the set_size() method.

set_size(tidy_set)

Conversely, the number of sets associated with each gene is returned by the element_size() function.

element_size(tidy_set)

The identifiers of elements and sets can be inspected and renamed using name_elements and

name_elements(tidy_set)
name_elements(tidy_set) <- paste0("Gene", seq_len(nElements(tidy_set)))
name_elements(tidy_set)
name_sets(tidy_set)
name_sets(tidy_set) <- paste0("Geneset", seq_len(nSets(tidy_set)))
name_sets(tidy_set)

Using `dplyr` verbs

You can also use mutate(), filter(), select(), group_by() and other dplyr verbs with TidySets. You usually need to activate which three slots you want to affect with activate():

library("dplyr")
m_TS <- tidy_set %>% 
  activate("relations") %>% 
  mutate(Important = runif(nRelations(tidy_set)))
m_TS

You can use activate to select what are the verbs modifying:

set_modified <- m_TS %>% 
  activate("elements") %>% 
  mutate(Pathway = if_else(elements %in% c("Gene1", "Gene2"), 
                           "pathway1", 
                           "pathway2"))
set_modified
set_modified %>% 
  deactivate() %>% # To apply a filter independently of where it is
  filter(Pathway == "pathway1")

If you think you need group_by usually this could mean that you need a new set. You can create a new one with group.

# A new group of those elements in pathway1 and with Important == 1
set_modified %>% 
  deactivate() %>% 
  group(name = "new", Pathway == "pathway1")

set_modified %>% 
  group("pathway1", elements %in% c("Gene1", "Gene2"))

You can use group_by() but it won't return a TidySet.

set_modified %>% 
    deactivate() %>% 
    group_by(Pathway, sets) %>%  
    count()

After grouping or mutating sometimes we might be interested in moving a column describing something to other places. We can do by this with:

elements(set_modified)
out <- move_to(set_modified, "elements", "relations", "Pathway")
relations(out)

Session info {.unnumbered}

sessionInfo()

llrs/BaseSet documentation built on Feb. 22, 2025, 9:52 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

llrs/BaseSet
Working with Sets the Tidy Way

In llrs/BaseSet: Working with Sets the Tidy Way

Getting started

The TidySet class

Creating a TidySet

Converting to other formats

Operations with sets

Union

Intersection

Complement

Subtract

Additional information

Using `dplyr` verbs

Session info {.unnumbered}

R Package Documentation

Browse R Packages

We want your feedback!

llrs/BaseSet Working with Sets the Tidy Way

In llrs/BaseSet: Working with Sets the Tidy Way

Getting started

The TidySet class

Creating a TidySet

Converting to other formats

Operations with sets

Union

Intersection

Complement

Subtract

Additional information

Using dplyr verbs

Session info {.unnumbered}

R Package Documentation

Browse R Packages

We want your feedback!

llrs/BaseSet
Working with Sets the Tidy Way

Using `dplyr` verbs