regFromCMAP: Find regulators of gene sets using Connectivity Map...

Description Usage Arguments Value Author(s) Examples

View source: R/regFromCMAP.R

Description

regFromCMAP identifies regulators of gene sets from Connectivity Map perturbation data using one-tailed Wilcox, KS or other (not yet implemented) tests

regFromCMAPSingle: identifies regulators of a single gene set from Connectivity Map perturbation data using one-tailed Wilcox, KS or other (not yet implemented) tests

regFromCMAPmcell: identifies regulators of gene sets from multiple cell lines Connectivity Map perturbation data using one-tailed Wilcox, KS or other (not yet implemented) tests

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
regFromCMAP(cmap, gene_sets, gene_set_id_col = "phenotypes",
  gene_id_col = "entrezgene", method = c("ks", "wilcox", "GSEA")[1],
  GSEA_weighting = 1, cutoff = 1, pval_corr_method = "fdr",
  renormalise = F, n_cores = detectCores() - 1, clustermq = F,
  clustermq_seed = 128965, clustermq_memory = 2000,
  clustermq_job_size = 10)

regFromCMAPSingle(cmap_mat, gene_sets, gene_set_name, method = c("ks",
  "wilcox", "GSEA")[1], GSEA_weighting = 1, alternative = c("less",
  "greater")[1])

regFromCMAPmcell(CMap_files, cell_ids = c("A375", "A549", "HA1E",
  "HCC515", "HEPG2", "HT29", "MCF7", "PC3"), pert_types = c("trt_sh.cgs",
  "trt_oe", "trt_xpr"), pert_itimes = c("96 h"), is_touchstone = T,
  min_cell_lines = 1, max_cell_lines = 9, gene_sets,
  gene_set_id_col = "phenotypes", gene_id_col = "entrezgene",
  method = c("ks", "wilcox", "GSEA")[1], GSEA_weighting = 1,
  keep_one_oe = c("one", "other", "all")[1], cutoff = 0.05,
  pval_corr_method = "fdr", renormalise = F, n_cores = detectCores()
  - 1, gene_names = c("all", unique(gene_sets$hgnc_symbol)[3:22])[1],
  landmark_only = F)

Arguments

cmap

object of class 'GCT' [package "cmapR"] with 7 slots containing z-score matrix, perturbation and feature details of the Connectivity map project. Returned by readCMAP or readCMAPsubset

gene_sets

data.table containing gene set id and which genes are assigned to them

gene_set_id_col

name of the column in gene_sets that stores gene set id

gene_id_col

name of the column in gene_sets that stores gene id

method

method for detecting differentially expressed genes. Currently only Wilcox and KS tests are implemented wilcox.test, ks.test.

GSEA_weighting

Weighting of ranking/correlations, see gset, https://cran.r-project.org/web/packages/gsEasy/vignettes/gsEasy-guide.html or Subramanian et. al 2005.

cutoff

FDR-corrected p-value cutoff

pval_corr_method

multiple hypothesis p-value correction method. Details: p.adjust - method.

renormalise

renormalise z-scores for a Connectivity map subset being analysed. Z-score = (X-median(X)) / (mad(X)*1.4826)

n_cores

number of cores to be used in parallel processing (over combinations of clusters). More details: parLapply, makeCluster, detectCores

clustermq

Use clustermq LSF job scheduler (TRUE) as an alternative to parLapply (FALSE). Details: Q

clustermq_seed

When using clustermq: Seed for random number generation.

clustermq_memory

When using clustermq: memory requested for each job

clustermq_job_size

When using clustermq: The number of function calls per job

cmap_mat

Connectivity map data matrix (cmap@mat)

gene_set_name

name of one gene set in gene_sets

alternative

alternative hypothesis for statistical tests, wilcox.test, ks.test

CMap_files

a list of directories and urls produced by loadCMap

cell_ids

a character vector of cell line names, query for available cell line names using perturbTable(CMap_files, ~ cell_id)

pert_types

a character vector of perturbation types, query for available perturbation types using perturbTable(CMap_files, ~ pert_type)

pert_itimes

a character vector of perturbation times, query for available perturbation times using perturbTable(CMap_files, ~ pert_itime)

is_touchstone

logical, select only the perturbation data that is a part of the touchstone dataset (genes profiled across all 9 core cell lines)? Details: https://docs.google.com/document/d/1q2gciWRhVCAAnlvF2iRLuJ7whrGP6QjpsCMq1yWz7dU/

min_cell_lines

keep a gene=>gene-set interaction if at least min_cell_lines support it (inclusive)

max_cell_lines

keep a gene=>gene-set interaction if at most max_cell_lines support it (inclusive)

keep_one_oe

keep only one overexpression experiment per gene and condition. Applicable only when pert_types = "trt_oe". Perturbations where sig_id matches pert_id are retained ("one"). To invert the selection use "other". To select all use "all"

gene_names

a character vector of HUGO gene names. Use "all" to select all perturbations.

landmark_only

look only at landmark genes. Details: https://docs.google.com/document/d/1q2gciWRhVCAAnlvF2iRLuJ7whrGP6QjpsCMq1yWz7dU/edit

Value

data.table containing gene set id, which genes may regulate this set, test statistic and difference in medians between each gene set and all other genes

regFromCMAPSingle: data.table containing gene set id (single), which genes may regulate this set, test statistic and difference in medians between each gene set and all other genes

regFromCMAPmcell: data.table containing gene set id, which genes may regulate these sets and the cell line of origin, test statistic and difference in medians between each gene set and all other genes

Author(s)

Vitalii Kleshchevnikov

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
library(regNETcmap)
library(R.utils)
# read all knockdown perturbations (3.4GB)
CMap_files = loadCMap(directory = "../regulatory_networks_by_cmap/data/cmap/")
# first, let's find cell lines and measurement times
CMAPsub = readCMAPsubset(is_touchstone = T, pert_types = c("trt_sh.cgs"), pert_itimes = c("all"), cell_ids = c("all"), CMap_files)
pdata = as.data.table(CMAPsub@cdesc)
pdata$cell_id = as.character(pdata$cell_id)
pdata$pert_itime = as.character(pdata$pert_itime)
pdata$pert_type = as.character(pdata$pert_type)
perturbTable(formula = cell_id ~ pert_itime, pdata = pdata)
# Pick one cell line and measurement time
CMAPsub = readCMAPsubset(is_touchstone = T, pert_types = c("trt_sh.cgs"), pert_itimes = c("96 h"), cell_ids = c("A375"), CMap_files)
# Read gene sets
trPhe_pair_map_file = "../regulatory_networks_by_cmap/results/phenotypes_genes_pvals_pairwise_mapped"
trPhe_pair_map_file.zip = paste0(trPhe_pair_map_file, ".gz")
gunzip(trPhe_pair_map_file.zip, overwrite = T, remove = F)
pVals_human = fread(trPhe_pair_map_file, sep = "\t", stringsAsFactors = F)
unlink(trPhe_pair_map_file)
# Select genes based on corrected p-value and difference in medians
pVals_human = pVals_human[pVals_corr < 0.05 & diff_median > 0]
pVals_reg = regFromCMAP(cmap = CMAPsub,
    gene_sets = pVals_human,
    gene_set_id_col = "phenotypes", gene_id_col = "entrezgene",
    method = "ks", cutoff = 1, pval_corr_method = "fdr",
    n_cores = detectCores() - 1)
qplot(x = pVals_reg$diff_median, y = -log10(pVals_reg$pVals), geom = "bin2d", bins = 150) + theme_light()

vitkl/regNETcmap documentation built on Feb. 18, 2020, 3:43 a.m.