make_comb_mat: Make a Combination Matrix for UpSet Plot

Description Usage Arguments Value Input Mode Examples

View source: R/Upset.R

Description

Make a Combination Matrix for UpSet Plot

Usage

1
2
3
4
make_comb_mat(..., mode = c("distinct", "intersect", "union"),
    top_n_sets = Inf, min_set_size = -Inf,
    universal_set = NULL, complement_size = NULL,
    value_fun = NULL, set_on_rows = TRUE)

Arguments

...

The input sets. If it is represented as a single variable, it should be a matrix/data frame or a list. If it is multiple variables, it should be name-value pairs, see Input section for explanation.

mode

The mode for forming the combination set, see Mode section.

top_n_sets

Number of sets with largest size.

min_set_size

Ths minimal set size that is used for generating the combination matrix.

universal_set

The universal set. If it is set, the size of the complement set of all sets is also calculated. It if is specified, complement_size is ignored.

complement_size

The size for the complement of all sets. If it is specified, the combination set name will be like "00...".

value_fun

For each combination set, how to calculate the size? If it is a scalar set, the length of the vector is the size of the set, while if it is a region-based set, (i.e. GRanges or IRanges object), the sum of widths of regions in the set is calculated as the size of the set.

set_on_rows

Used internally.

Value

A matrix also in a class of comb_mat.

Following functions can be applied to it: set_name, comb_name, set_size, comb_size, comb_degree, extract_comb and t.comb_mat.

Input

To represent multiple sets, the variable can be represented as:

1. A list of sets where each set is a vector, e.g.:

1
2
3
    list(set1 = c("a", "b", "c"),
         set2 = c("b", "c", "d", "e"),
         ...)  

2. A binary matrix/data frame where rows are elements and columns are sets, e.g.:

1
2
3
4
5
6
7
      a b c
    h 1 1 1
    t 1 0 1
    j 1 0 0
    u 1 0 1
    w 1 0 0
    ...  

If the variable is a data frame, the binary columns (only contain 0 and 1) and the logical columns are only kept.

The set can be genomic regions, then it can only be represented as a list of GRanges objects.

Mode

E.g. for three sets (A, B, C), the UpSet approach splits the combination of selecting elements in the set or not in the set and calculates the sizes of the combination sets. For three sets, all possible combinations are:

1
2
3
4
5
6
7
8
    A B C
    1 1 1
    1 1 0
    1 0 1
    0 1 1
    1 0 0
    0 1 0
    0 0 1  

A value of 1 means to select that set and 0 means not to select that set. E.g., "1 1 0" means to select set A, B while not set C. Note there is no "0 0 0", because the background size is not of interest here. With the code of selecting and not selecting the sets, next we need to define how to calculate the size of that combination set. There are three modes:

1. distinct mode: 1 means in that set and 0 means not in that set, then "1 1 0" means a set of elements also in set A and B, while not in C (i.e. setdiff(intersect(A, B), C)). Under this mode, the seven combination sets are the seven partitions in the Venn diagram and they are mutually exclusive.

2. intersect mode: 1 means in that set and 0 is not taken into account, then, "1 1 0" means a set of elements in set A and B, and they can also in C or not in C (i.e. intersect(A, B)). Under this mode, the seven combination sets can overlap.

3. union mode: 1 means in that set and 0 is not taken into account. When there are multiple 1, the relationship is OR. Then, "1 1 0" means a set of elements in set A or B, and they can also in C or not in C (i.e. union(A, B)). Under this mode, the seven combination sets can overlap.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
set.seed(123)
lt = list(a = sample(letters, 10),
          b = sample(letters, 15),
          c = sample(letters, 20))
m = make_comb_mat(lt)

mat = list_to_matrix(lt)
mat
m = make_comb_mat(mat)

## Not run: 
require(circlize)
require(GenomicRanges)
lt = lapply(1:4, function(i) generateRandomBed())
lt = lapply(lt, function(df) GRanges(seqnames = df[, 1], 
    ranges = IRanges(df[, 2], df[, 3])))
names(lt) = letters[1:4]
m = make_comb_mat(lt)

## End(Not run)

Example output

Loading required package: grid
========================================
ComplexHeatmap version 2.6.2
Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
Github page: https://github.com/jokergoo/ComplexHeatmap
Documentation: http://jokergoo.github.io/ComplexHeatmap-reference

If you use it in published research, please cite:
Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional 
  genomic data. Bioinformatics 2016.

This message can be suppressed by:
  suppressPackageStartupMessages(library(ComplexHeatmap))
========================================

  a b c
a 0 1 1
b 0 0 1
c 1 1 0
d 0 1 0
e 1 1 1
f 0 0 1
g 0 1 1
h 0 1 1
i 0 1 1
j 1 1 1
k 1 1 0
l 0 1 1
m 0 0 1
n 1 1 0
o 1 0 1
q 0 0 1
r 1 0 1
s 1 1 0
t 0 0 1
u 0 1 1
v 0 0 1
w 0 0 1
x 1 1 1
y 1 1 1
z 0 0 1
Loading required package: circlize
========================================
circlize version 0.4.11
CRAN page: https://cran.r-project.org/package=circlize
Github page: https://github.com/jokergoo/circlize
Documentation: https://jokergoo.github.io/circlize_book/book/

If you use it in published research, please cite:
Gu, Z. circlize implements and enhances circular visualization
  in R. Bioinformatics 2014.

This message can be suppressed by:
  suppressPackageStartupMessages(library(circlize))
========================================

Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package:BiocGenericsThe following objects are masked frompackage:parallel:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked frompackage:stats:

    IQR, mad, sd, var, xtabs

The following objects are masked frompackage:base:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package:S4VectorsThe following object is masked frompackage:base:

    expand.grid

Loading required package: IRanges
Loading required package: GenomeInfoDb

ComplexHeatmap documentation built on Nov. 14, 2020, 2:01 a.m.