PPInetwork2SLIMFinder: Find linear motifs (QSLIMFinder or SLIMFinder) in the protein...
In vitkl/SLIMFinderR: Short Linear Motif Search Using QSLIMFinder, Protein Interaction Network and Binding Domain Inference

Description Usage Arguments Details Value Author(s) See Also

Find linear motifs (QSLIMFinder or SLIMFinder) in the protein interaction network

PPInetwork2SLIMFinder(dataset_name = "SLIMFinder",
  interaction_main_set = all_human_interaction,
  interaction_query_set = all_viral_interaction,
  analysis_type = "qslimfinder",
  options = "dismask=T consmask=F cloudfix=T probcut=0.3 minwild=0 maxwild=2 slimlen=5 alphahelix=F maxseq=1500 savespace=1 iuchdir=T",
  domain_res_file = "./processed_data_files/what_we_find_VS_ELM_clust20171019.RData",
  domain_results_obj = "res_count", center_domains = F,
  filter_by_domain = F,
  fasta_path = "./data_files/all_human_viral_proteins.fasta",
  main_set_only = F, domain_pvalue_cutoff = 1,
  SLIMFinder_dir = paste0("./", dataset_name, "/"),
  LSF_project_path = "/hps/nobackup/research/petsalaki/users/vitalii/vitalii/viral_project/",
  software_path = "../software/cluster/", length_set1_min = 2,
  length_set2_min = 1, write_log = T, N_seq = 200,
  seed_list = NULL, query_list = NULL, memory_start = 350,
  memory_step = 100, compare_motifs = T, Njobs_limit = 490,
  CompariMotif3_dburl = "http://elm.eu.org/elms/elms_index.tsv",
  CompariMotif3_dbpath = "./data_files/",
  non_query_domain_res_file = "../viral_project/processed_data_files/predict_domain_human_clust20180819.RData",
  non_query_domain_results_obj = NULL, non_query_domains_N = 0,
  non_query_set_only = c(main_set_only), query_domains_only = T)

`dataset_name`	refer to `mBenchmarkMotifs`
`interaction_main_set`	clean_MItab class, use this set of protein interactions to construct QSLIMFinder datasets
`interaction_query_set`	clean_MItab class, use this set of protein interactions as a query (+ add to the QSLIMFinder datasets). Both interaction sets have shared seed proteins. SLIMFinder `analysis_type` also requires this option because it add proteins from these interactions to the SLIMFinder datasets
`analysis_type`	"qslimfinder" or "slimfinder"
`options`	any options from QSLIMFinder or SLIMFinder. Detail http://rest.slimsuite.unsw.edu.au/docs&page=module:qslimfinder or http://rest.slimsuite.unsw.edu.au/docs&page=module:slimfinder => Commandline
`domain_res_file`	relative path to domain enrichment results RData
`domain_results_obj`	which object contains domain enrichment results in `domain_res_file`, XYZinteration_XZEmpiricalPval?
`center_domains`	logical, center QSLIMFinder datasets at domains?
`filter_by_domain`	logical, filter by domain? If FALSE this function does not use `domain_res_file`.
`fasta_path`	relative path (from the project folder) to the FASTA file containing sequences for all proteins in `interaction_main_set` and `interaction_query_set`
`main_set_only`	logical, If TRUE sequence sets for motif search contain only proteins from `interaction_main_set`. If FALSE, non-query proteins from `interaction_query_set` are also included. Argument for `listInteractionSubsetFASTA`
`domain_pvalue_cutoff`	construct SLIMFinder datasets using interactions of proteins that contain domain associated to protein in the query set with p-value `domain_pvalue_cutoff` or lower
`SLIMFinder_dir`	directory to store SLIMFinder datasets and results within the project directory
`LSF_project_path`	full path to the project directory
`software_path`	relative path (from the project folder) to the directory containing slimsuite, blast, iupred # "../software/cluster/" or "../software/"
`length_set1_min`	mininal number of proteins in a QSLIMFinder dataset from `interaction_main_set`. Argument for `filterInteractionSubsetFASTA_list`
`length_set2_min`	mininal number of proteins in a QSLIMFinder dataset from `interaction_query_set`. Argument for `filterInteractionSubsetFASTA_list`
`write_log`	FALSE will not allow runQSLIMFinder to detect crashed jobs
`N_seq`	number of sequences per batch
`seed_list`	character vector of UniprotKB accesions that should serve as a seed for QSLIMFinder datasets. These proteins are supposed to recognise SLIMs. Overrides selection of seed protein by `domain_pvalue_cutoff`
`query_list`	character vector of UniprotKB accesions that should serve as a query for QSLIMFinder
`memory_start`	integer, how much memory each job should be given initially
`memory_step`	interger, increment by which to increase how much memory each job should be given if `memory_start` is not enough and the job has failed
`compare_motifs`	logical, compare motifs using CompariMotif3? The procedure is relatively fast but memory consuming.
`Njobs_limit`	integer, the number of LSF jobs allowed to run simultaneously
`CompariMotif3_dburl`	dburl url where to download database for CompariMotif V3. Argument for `runCompariMotif3`
`CompariMotif3_dbpath`	path to directory where to save and keep ELM database (http://elm.eu.org/) or other database of linear motifs in a format required by comparimotif_V3: http://rest.slimsuite.unsw.edu.au/docs&page=module:comparimotif_V3
`non_query_domain_res_file`	path to RData file containing the result of domain enrichment analysis for non-query proteins
`non_query_domain_results_obj`	character, name of the object containing domain enrichment results for non-query proteins (class == XYZinteration_XZEmpiricalPval), when provided will be used for filtering datasets.
`non_query_domains_N`	the number of non-query proteins with predicted domains for each dataset. Used only when non_query_domain_results_obj is not NULL
`non_query_set_only`	If TRUE sequence sets for motif search contain only proteins (interacting partners of a seed) from non_query_domain_results_obj, if FALSE - both from non_query_domain_results_obj and domain_res_obj. Used only when non_query_domain_results_obj is not NULL and by default equals to main_set_only
`query_domains_only`	If TRUE proteins whose sequences will be used for motif search must be predicted to bind the same domains in a seed protein as domains predicted for query protein. Used only when non_query_domain_results_obj is not NULL

QSLIMFinder command line options (http://rest.slimsuite.unsw.edu.au/docs&page=module:qslimfinder)

### Basic Input/Output Options ###

seqin=FILE : Sequence file to search [None]

batch=LIST : List of files to search, wildcards allowed. (Over-ruled by seqin=FILE.) [*.dat,*.fas]

query=LIST : Return only SLiMs that occur in 1+ Query sequences (Name/AccNum/Seq Number) [1]

addquery=FILE : Adds query sequence(s) to batch jobs from FILE [None]

maxseq=X : Maximum number of sequences to process [500]

maxupc=X : Maximum UPC size of dataset to process [0]

sizesort=X : Sorts batch files by size prior to running (+1 small->big; -1 big->small; 0 none) [0]

walltime=X : Time in hours before program will abort search and exit [1.0]

resfile=FILE : Main QSLiMFinder results table [qslimfinder.csv]

resdir=PATH : Redirect individual output files to specified directory (and look for intermediates) [QSLiMFinder/]

buildpath=PATH : Alternative path to look for existing intermediate files [SLiMFinder/]

force=T/F : Force re-running of BLAST, UPC generation and SLiMBuild [False]

pickup=T/F : Pick-up from aborted batch run by identifying datasets in resfile using RunID [False]

dna=T/F : Whether the sequences files are DNA rather than protein [False]

alphabet=LIST : List of characters to include in search (e.g. AAs or NTs) [default AA or NT codes]

megaslim=FILE : Make/use precomputed results for a proteome (FILE) in fasta format [None]

megablam=T/F : Whether to create and use all-by-all GABLAM results for (gablamdis) UPC generation [False]

ptmlist=LIST : List of PTM letters to add to alphabet for analysis and restrict PTM data []

ptmdata=DSVFILE : File containing PTM data, including AccNum, ModType, ModPos, ModAA, ModCode

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

SLiMBuild Options I

efilter=T/F : Whether to use evolutionary filter [True]

blastf=T/F : Use BLAST Complexity filter when determining relationships [True]

blaste=X : BLAST e-value threshold for determining relationships [1e=4]

altdis=FILE : Alternative all by all distance matrix for relationships [None]

gablamdis=FILE : Alternative GABLAM results file [None] (!!!Experimental feature!!!)

homcut=X : Max number of homologues to allow (to reduce large multi-domain families) [0]

SLiMBuild Options II

masking=T/F : Master control switch to turn off all masking if False [True]

dismask=T/F : Whether to mask ordered regions (see rje_disorder for options) [False]

consmask=T/F : Whether to use relative conservation masking [False]

ftmask=LIST : UniProt features to mask out (True=EM,DOMAIN,TRANSMEM) []

imask=LIST : UniProt features to inversely ("inclusively") mask. (Seqs MUST have 1+ features) []

compmask=X,Y : Mask low complexity regions (same AA in X+ of Y consecutive aas) [5,8]

casemask=X : Mask Upper or Lower case [None]

motifmask=X : List (or file) of motifs to mask from input sequences []

metmask=T/F : Masks the N-terminal M (can be useful if termini=T) [True]

posmask=LIST : Masks list of position-specific aas, where list = pos1:aas,pos2:aas [2:A]

aamask=LIST : Masks list of AAs from all sequences (reduces alphabet) []

qregion=X,Y : Mask all but the region of the query from (and including) residue X to residue Y [0,-1]

SLiMBuild Options III

termini=T/F : Whether to add termini characters (^ & $) to search sequences [True]

minwild=X : Minimum number of consecutive wildcard positions to allow [0]

maxwild=X : Maximum number of consecutive wildcard positions to allow [2]

slimlen=X : Maximum length of SLiMs to return (no. non-wildcard positions) [5]

minocc=X : Minimum number of unrelated occurrences for returned SLiMs. (Proportion of UP if < 1) [0.05]

absmin=X : Used if minocc<1 to define absolute min. UP occ [3]

alphahelix=T/F : Special i, i+3/4, i+7 motif discovery [False]

SLiMBuild Options IV

ambiguity=T/F : (preamb=T/F) Whether to search for ambiguous motifs during motif discovery [True]

ambocc=X : Min. UP occurrence for subvariants of ambiguous motifs (minocc if 0 or > minocc) [0.05]

absminamb=X : Used if ambocc<1 to define absolute min. UP occ [2]

equiv=LIST : List (or file) of TEIRESIAS-style ambiguities to use [AGS,ILMVF,FYW,FYH,KRH,DE,ST]

wildvar=T/F : Whether to allow variable length wildcards [True]

combamb=T/F : Whether to search for combined amino acid degeneracy and variable wildcards [False]

SLiMBuild Options V

musthave=LIST : Returned motifs must contain one or more of the AAs in LIST (reduces search space) []

focus=FILE : FILE containing focal groups for SLiM return (see Manual for details) [None]

focusocc=X : Motif must appear in X+ focus groups (0 = all) [0]

* See also rje_slimcalc options for occurrence-based calculations and filtering *

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

### SLiMChance Options ###

cloudfix=T/F : Restrict output to clouds with 1+ fixed motif (recommended) [False]

slimchance=T/F : Execute main QSLiMFinder probability method and outputs [True]

sigprime=T/F : Calculate more precise (but more computationally intensive) statistical model [False]

sigv=T/F : Use the more precise (but more computationally intensive) fix to mean UPC probability [False]

qexact=T/F : Calculate exact Query motif space (True) or over-estimate from dimers (False) (quicker) [True]

probcut=X : Probability cut-off for returned motifs [0.1]

maskfreq=T/F : Whether to use masked AA Frequencies (True), or (False) mask after frequency calculations [False]

aafreq=FILE : Use FILE to replace individual sequence AAFreqs (FILE can be sequences or aafreq) [None]

aadimerfreq=FILE: Use empirical dimer frequencies from FILE (fasta or *.aadimer.tdt) (!!!Experimental!!!) [None]

negatives=FILE : Multiply raw probabilities by under-representation in FILE (!!!Experimental!!!) [None]

smearfreq=T/F : Whether to "smear" AA frequencies across UPC rather than keep separate AAFreqs [False]

seqocc=T/F : Whether to upweight for multiple occurrences in same sequence (heuristic) [False]

probscore=X : Score to be used for probability cut-off and ranking (Prob/Sig) [Sig]

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

Advanced Output Options I

clouds=X : Identifies motif "clouds" which overlap at 2+ positions in X+ sequences (0=minocc / -1=off) [2]

runid=X : Run ID for resfile (allows multiple runs on same data) [DATE:TIME]

logmask=T/F : Whether to log the masking of individual sequences [True]

slimcheck=FILE : Motif file/list to add to resfile output []

Advanced Output Options II

teiresias=T/F : Replace TEIRESIAS, making *.out and *.mask.fasta files [False]

slimdisc=T/F : Emulate SLiMDisc output format (*.rank & *.dat.rank + TEIRESIAS *.out & *.fasta) [False]

extras=X : Whether to generate additional output files (alignments etc.) [1]

–1 = No output beyond main results file

- 0 = Generate occurrence file and cloud file

- 1 = Generate occurrence file, alignments and cloud file

- 2 = Generate all additional QSLiMFinder outputs

- 3 = Generate SLiMDisc emulation too (equiv extras=2 slimdisc=T)

targz=T/F : Whether to tar and zip dataset result files (UNIX only) [False]

savespace=0 : Delete "unneccessary" files following run (best used with targz): [0]

- 0 = Delete no files

- 1 = Delete all bar *.upc and *.pickle

- 2 = Delete all bar *.upc (pickle added to tar)

- 3 = Delete all dataset-specific files including *.upc and *.pickle (not *.tar.gz)

Advanced Output Options III

topranks=X : Will only output top X motifs meeting probcut [1000]

minic=X : Minimum information content for returned motifs [2.1]

allsig=T/F : Whether to also output all SLiMChance combinations (Sig/SigV/SigPrime/SigPrimeV) [False]

Memory requirements for jobs. How much memory is enough for most jobs? files = list.files("./qslimfinder.Full_IntAct.FALSE/log_dir/log/") times = sapply(files, function(file) system(paste0("cat ./qslimfinder.Full_IntAct.FALSE/log_dir/log/",file," | grep Requested"), intern = T)) > table(times) times 100.00 MB 200.00 MB 300.00 MB 400.00 MB 500.00 MB 14 61 53 25 3 > table(times) / sum(table(times)) times 100.00 MB 200.00 MB 300.00 MB 400.00 MB 500.00 MB 0.08974359 0.39102564 0.33974359 0.16025641 0.01923077 > cumsum(table(times) / sum(table(times))) 100.00 MB 200.00 MB 300.00 MB 400.00 MB 500.00 MB 0.08974359 0.48076923 0.82051282 0.98076923 1.00000000 > 1 - cumsum(table(times) / sum(table(times))) 100.00 MB 200.00 MB 300.00 MB 400.00 MB 500.00 MB 0.91025641 0.51923077 0.17948718 0.01923077 0.00000000

path to RData containing all objects used by this pipeline

Vitalii Kleshchevnikov

listInteractionSubsetFASTA, runQSLIMFinder, groupQSLIMFinderCommand, mQSLIMFinderCommand, runCompariMotif3, readQSLIMFinderMain, readQSLIMFinderOccurence, writeInteractionSubsetFASTA_list, domainProteinPairMatch, filterInteractionSubsetFASTA_list, removeInteractionNoFASTA, centerDomains

vitkl/SLIMFinderR documentation built on May 3, 2019, 8:08 p.m.

vitkl/SLIMFinderR index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

vitkl/SLIMFinderR
Short Linear Motif Search Using QSLIMFinder, Protein Interaction Network and Binding Domain Inference

PPInetwork2SLIMFinder: Find linear motifs (QSLIMFinder or SLIMFinder) in the protein...
In vitkl/SLIMFinderR: Short Linear Motif Search Using QSLIMFinder, Protein Interaction Network and Binding Domain Inference

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Related to PPInetwork2SLIMFinder in vitkl/SLIMFinderR...

R Package Documentation

Browse R Packages

We want your feedback!

vitkl/SLIMFinderR Short Linear Motif Search Using QSLIMFinder, Protein Interaction Network and Binding Domain Inference

PPInetwork2SLIMFinder: Find linear motifs (QSLIMFinder or SLIMFinder) in the protein... In vitkl/SLIMFinderR: Short Linear Motif Search Using QSLIMFinder, Protein Interaction Network and Binding Domain Inference

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Related to PPInetwork2SLIMFinder in vitkl/SLIMFinderR...

R Package Documentation

Browse R Packages

We want your feedback!

vitkl/SLIMFinderR
Short Linear Motif Search Using QSLIMFinder, Protein Interaction Network and Binding Domain Inference

PPInetwork2SLIMFinder: Find linear motifs (QSLIMFinder or SLIMFinder) in the protein...
In vitkl/SLIMFinderR: Short Linear Motif Search Using QSLIMFinder, Protein Interaction Network and Binding Domain Inference