call_homer: Motif Enrichment with Homer

Description Usage Arguments Details Value Author(s) See Also

Description

Call the findMotifsGenome.pl script from Homer directly from R.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
call_homer(pos_file, genome, output_dir = tempdir(), mask = NULL,
  bg = NULL, chopify = NULL, len = NULL, size = NULL, S = NULL,
  mis = NULL, norevopp = NULL, rna = NULL, mset = NULL, bits = NULL,
  mcheck = NULL, mknown = NULL, gc = NULL, cpg = NULL,
  noweight = NULL, h = NULL, N = NULL, local = NULL, redundant = NULL,
  maxN = NULL, maskMotif = NULL, rand = NULL, ref = NULL,
  oligo = NULL, dumpFasta = NULL, preparse = NULL, preparsedDir = NULL,
  keepFiles = NULL, fdr = NULL, nlen = NULL, nmax = NULL,
  neutral = NULL, olen = NULL, p = NULL, e = NULL, cache = NULL,
  quickMask = NULL, minlp = NULL)

Arguments

pos_file

<#> (Genomic Ranges object)

genome

<#> (Installed Homer genome) or (path to FASTA)

output_dir

Path to output dir for Homer analysis. Defaults to tempdir()

mask

(mask repeats/lower case sequence, can also add 'r' to genome, i.e. mm9r)

bg

<background position file> (genomic positions to be used as background, default=automatic) removes background positions overlapping with target positions

chopify

(chop up large background regions to the avg size of target regions)

len

<#>[,<#>,<#>...] (motif length, default=8,10,12) [NOTE: values greater 12 may cause the programto run out of memory - in these cases decrease the number of sequences analyzed (-N), or try analyzing shorter sequence regions (i.e. -size 100)]

size

<#> (fragment size to use for motif finding, default=200) or (i.e. -size -100,50 will get sequences from -100 to +50 relative from center) or given (uses the exact regions you give it)

S

<#> (Number of motifs to optimize, default: 25)

mis

<#> (global optimization: searches for strings with # mismatches, default: 2)

norevopp

(don't search reverse strand for motifs)

rna

(output RNA motif logos and compare to RNA motif database, automatically sets -norevopp)

mset

<vertebrates|insects|worms|plants|yeast|all> (check against motif collects, default: auto)

bits

(scale sequence logos by information content, default: doesn't scale)

mcheck

<motif file> (known motifs to check against de novo motifs)

mknown

<motif file> (known motifs to check for enrichment)

gc

(use GC-percentage for sequence content normalization, now the default)

cpg

(use CpG-percentage instead of GC-percentage for sequence content normalization)

noweight

(no CG correction)

h

(use hypergeometric for p-values, binomial is default)

N

<#> (Number of sequences to use for motif finding, default=max(50k, 2x input)

local

<#> (use local background, # of equal size regions around peaks to use i.e. 2)

redundant

<#> (Remove redundant sequences matching greater than # percent, i.e. -redundant 0.5)

maxN

<#> (maximum percentage of N's in sequence to consider for motif finding, default: 0.7)

maskMotif

<motif file1> [motif file 2]... (motifs to mask before motif finding)

rand

(randomize target and background sequences labels)

ref

<peak file> (use file for target and background - first argument is list of peak ids for targets)

oligo

(perform analysis of individual oligo enrichment)

dumpFasta

(Dump fasta files for target and background sequences for use with other programs)

preparse

(force new background files to be created)

preparsedDir

<directory> (location to search for preparsed file and/or place new files)

keepFiles

(keep temporary files)

fdr

<#> (Calculate empirical FDR for de novo discovery #=number of randomizations)

nlen

<#> (length of lower-order oligos to normalize in background, default: -nlen 3)

nmax

<#> (Max normalization iterations, default: 160)

neutral

(weight sequences to neutral frequencies, i.e. 25-percentage, 6.25-percentage, etc.)

olen

<#> (lower-order oligo normalization for oligo table, use if -nlen isn't working well)

p

<#> (Number of processors to use, default: 1)

e

<#> (Maximum expected motif instance per bp in random sequence, default: 0.01)

cache

<#> (size in MB for statistics cache, default: 500)

quickMask

(skip full masking after finding motifs, similar to original homer)

minlp

<#> (stop looking for motifs when seed logp score gets above #, default: -10)

Details

Simple R-wrapper for Homer's findMotifsGenome.pl. Instead of flags, it uses R-arguments which are pasted to a Homer command. Flags that modify output format are not implemented: -nomotif, -find, -enhancers, -enhancersOnly, -basic, -nocheck, -noknown, -nofacts, -opt, -peaks, -homer2.

Saves all temporary files to output_dir. Note these files are only deleted upon closing the R-session, which can in some cases lead to files from previous runs being reloaded.

Value

List with output: command line used, knowm motifs, Homer motifs (de-novo) and Homer PWMs.

Author(s)

Malte Thodberg

See Also

GR_to_BED tempdir


MalteThodberg/homeR documentation built on May 7, 2019, 2:09 p.m.