benchmarkMotifs: Benchmark linear motif instance found using QSLIMFinder...

Description Usage Arguments Value Author(s) See Also

Description

Benchmark linear motif instance found using QSLIMFinder (SLIMFinder)

Get motifs from the output of benchmarking linear motifs by id

Benchmark linear motif instance found using QSLIMFinder (SLIMFinder)

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
  benchmarkMotifs(occurence_file = "../viral_project/qslimfinder.Full_IntAct3.FALSE/result/occurence.txt",
  main_file = "../viral_project/qslimfinder.Full_IntAct3.FALSE/result/main_result.txt",
  domain_res_file = "../viral_project/processed_data_files/domain_res_count_20171019.RData",
  motif_setup = "../viral_project/processed_data_files/QSLIMFinder_instances_h2v_qslimfinder.Full_IntAct3.FALSE_clust201802.RData",
  neg_set = c("all_instances", "all_proteins", "random")[1],
  domain_results_obj = "res_count",
  motif_input_obj = "forSLIMFinder_Ready", motif_setup_obj2 = NULL,
  occurence_filt = NULL, one_from_cloud = T,
  dbfile_main = "../viral_project/data_files/instances_all.gff",
  dburl_main = "http://elm.eu.org/instances.gff?q=None&taxon=Homo%20sapiens&instance_logic=",
  dbfile_query = "../viral_project/data_files/instances_query.gff",
  dburl_query = "http://elm.eu.org/instances.gff?q=all&taxon=irus&instance_logic=",
  query_res_query_only = T, motif_types = c("DOC", "MOD", "LIG", "DEG",
  "CLV", "TRG"), all_res_excl_query = T, merge_motif_variants = F,
  seed = 21, N = 100, replace = T, within1sequence = T,
  query_predictor_col = "Sig", all_predictor_col = "Sig",
  normalise = T, minoverlap = 2, minoverlap_redundant = 5,
  merge_domain_data = T, merge_by_occurence_mcols = c("query",
  "interacts_with"), merge_by_domain_res_cols = c("IDs_interactor_viral",
  "IDs_interactor_human", "IDs_domain_human", "Taxid_interactor_human",
  "Taxid_interactor_viral"),
  merge_by_non_query_domain_res_cols = c("IDs_interactor_human_A",
  "IDs_interactor_human_B", "IDs_domain_human_B",
  "Taxid_interactor_human_A", "Taxid_interactor_human_B"),
  filter_by_domain_data = "p.value < 0.05", motif_pval_cutoff = 1,
  select_predictor_per_range = max,
  non_query_domain_res_file = "../viral_project/processed_data_files/predict_domain_human_clust20180819.RData",
  non_query_domain_results_obj = NULL, non_query_domains_N = 0,
  non_query_set_only = F, query_domains_only = F,
  min_non_query_domain_support = 0,
  min_top_domain_support4motif_nq = 0, select_top_domain = F, ...)

queryOCCByMCOL(res, keytype = "IDs_domain_human", key = "IPR032440")

mBenchmarkMotifs(datasets = c("qslimfinder.Full_IntAct3.FALSE"),
  descriptions = c("human network (full IntAct) searched \nfor motifs present in viral proteins"),
  dir = "./", motif_setup_months = "201802", ...)

Arguments

occurence_file

a path to a tsv (txt) file containing QSLIMFinder (SLIMFinder) occurence output

main_file

a path to a tsv (txt) file containing QSLIMFinder (SLIMFinder) main output

domain_res_file

path to RData containing objects generated by what_we_find_VS_ELM.Rmd script (specifically domain_results_obj object)

motif_setup

path to RData containing objects generated by PPInetwork2SLIMFinder pipeline (specifically motif_input_obj object)

domain_results_obj

character, name of the object containing domain enrichment results (class == XYZinteration_XZEmpiricalPval)

motif_input_obj

character, name of the object of class InteractionSubsetFASTA_list containing: FASTA sequences for interacting proteins, molecular interaction data they correspond to. Each element of a list contains input for individual QSLIMFinder run.

motif_setup_obj2

alternative way to provide motif_input_obj (class InteractionSubsetFASTA_list) directly. This object should not require matching domain-protein pairs. It must have been already processed by domainProteinPairMatch Can be useful for repeating benchmarking.

occurence_filt

QSLIMFinder (SLIMFinder) occurence output filtered by those that we could have found from motif_input_obj.

one_from_cloud

use only one top motif from motif cloud

dbfile_main

a path to a gff (txt) file containing ELM database motif occurrences (proteins in the main set)

dburl_main

url where to get ELM database containing motif occurrences (proteins in the main set)

dbfile_query

a path to a gff (txt) file containing ELM database motif occurrences (proteins in the query set)

dburl_query

url where to get ELM database containing motif occurrences (proteins in the query set)

query_res_query_only

return only GRanges for query proteins, passed to "GRangesINinteractionSubsetFASTA". Do not change the default value.

motif_types

character vector of motif types

all_res_excl_query

all results in the output is all occurences excluding the query proteins. If FALSE, all results include occurences in all proteins. Not implemented

merge_motif_variants

If FALSE (default) merge motif occurences only if motifs are variants of the same motif (such as TRG_NLS).

seed

when using random negative sets (neg_set = "random"): seed for RNG for sampling

N

when using random negative sets (neg_set = "random"): number of samples

replace

when using random negative sets (neg_set = "random"): sample starts of GRanges with replacement randomGRanges

within1sequence

when using random negative sets (neg_set = "random"): resample GRanges within one sequence or across sequences randomGRanges. If seq 1 has two motifs of length 4 and 7 and within1sequence = TRUE two motifs of the same length 4 and 7 will be sampled from the same protein. If within1sequence = FALSE two motifs of the same length 4 and 7 will be sampled from any protein in the set used for benchmarking.

query_predictor_col

"Sig" or "p.value" or "domain_motif_pval"

all_predictor_col

"Sig"

normalise

logical, normalise predictor value, just in case predictor doesn't span the full range between 0 ... 1

minoverlap

integer, passed to findOverlaps

minoverlap_redundant

for removing motif classes that match the same occurence

merge_domain_data

If TRUE, merge domain enrichment results to motif occurence

merge_by_occurence_mcols

columns of mcols (metadata of GRanges) that contain IDs of [1] protein with motif, [2] proteins with domain, e.g. c("query", "interacts_with"),

merge_by_domain_res_cols

columns of domain enrichment results that contain IDs of [1] protein with motif, [2] proteins with domain, [3] domain, [4] and [5] Taxid for proteins with motif and domain respectively, e.g. c("IDs_interactor_viral", "IDs_interactor_human", "IDs_domain_human", "Taxid_interactor_human","Taxid_interactor_viral"). If Taxid columns are not present - omit.

merge_by_non_query_domain_res_cols

columns of domain enrichment results for non-query proteins that contain IDs of [1] protein with motif, [2] proteins with domain, [3] domain, [4] and [5] Taxid for proteins with motif and domain respectively, e.g. c("IDs_interactor_human_A", "IDs_interactor_human_B", "IDs_domain_human_B", "Taxid_interactor_human_A","Taxid_interactor_human_B"). If Taxid columns are not present - omit.

filter_by_domain_data

criteria to filter domain data and restrict motif search datasets (for example, "p.value < 0.05" or "fdr_pval < 0.05 & domain_count_per_IDs_interactor_viral > 1")

select_predictor_per_range

function (such as min) that select predictor value if multiple values (such as returned by multiple datasets or multiple domains integrated) describe the same range

non_query_domain_res_file

path to RData file containing the result of domain enrichment analysis for non-query proteins

non_query_domain_results_obj

character, name of the object containing domain enrichment results for non-query proteins (class == XYZinteration_XZEmpiricalPval), when provided will be used for filtering datasets.

non_query_domains_N

the number of non-query proteins with predicted domains for each dataset. Used only when non_query_domain_results_obj is not NULL

non_query_set_only

If TRUE sequence sets searched for motif are filtered to contain only proteins from non_query_domain_results_obj (interacting partners of a seed), if FALSE - both from non_query_domain_results_obj and domain_res_obj. Used only when non_query_domain_results_obj is not NULL.

query_domains_only

If TRUE proteins whose sequences will be used for motif search must be predicted to bind the same domains in a seed protein as domains predicted for query protein. Used only when non_query_domain_results_obj is not NULL

min_non_query_domain_support

Minimal number of non-query proteins with the same motif as the query that are predicted to bind the same domain. Used to filter domains and proteins that do not predict top domains. Used only when non_query_domain_results_obj is not NULL.

min_top_domain_support4motif_nq

Similar to min_non_query_domain_support. Minimal number of non-query proteins with the same motif as the query which have the same top-1 domain predicted.

select_top_domain

If TRUE, top domain is selected using a product of domain p-values for all proteins with the same motif (min p-value) found using the same dataset. Used only when non_query_domain_results_obj is not NULL.

...

other arguments passed to passed to findOverlaps

res

object class (benchmarkMotifsResult), the output of benchmarkMotifs

keytype

character, name of the column that contains key identifiers

key

character, identifiers for which to retrieve the result

datasets

character vector, names of the datasets ("Vidal" in "./SLIMFinder_Vidal/result/occurence.txt" or "" in "./SLIMFinder/result/occurence.txt")

descriptions

character vector, description of the datasets (title of the ROC plot)

dir

character, base directory. For example, "./" in "./SLIMFinder_Vidal/result/occurence.txt"

Value

object class (benchmarkMotifsResult) containing occurence (GRanges, all, query, just after filtering by motif setup), instances_all (GRanges, known instances in all proteins or all excluding the query proteins), instances_query (GRanges, known instances in query proteins), predictions_all (for ROCR), labels_all (for ROCR), predictions_query (for ROCR), labels_query (for ROCR), overlapping_GRanges_all (GRanges, known instances that we also found), overlapping_GRanges_query(GRanges, known instances that we also found), N_query_prot_with_known_instances, N_query_known_instances, N_all_prot_with_known_instances, N_all_known_instances

GenomicRanges containing motifs for a given key

list of objects of class (benchmarkMotifsResult)

Author(s)

Vitalii Kleshchevnikov

Vitalii Kleshchevnikov

Vitalii Kleshchevnikov

See Also

ELMdb2GRanges, findOverlapsBench


vitkl/SLIMFinderR documentation built on May 3, 2019, 8:08 p.m.