intersect.bedtools: Intersect two or more bed files (by 'bedtools intersect'...

View source: R/intersect.bedtools.R

intersect.bedtoolsR Documentation

Intersect two or more bed files (by bedtools intersect function).

Description

This function runs a command line that uses bedtools intersect to intersect one or more .bed files.

Usage

intersect.bedtools(
  a,
  b,
  outputFileName = paste(getwd(), "intersected.bed", sep = "/"),
  abam = FALSE,
  ubam = FALSE,
  bed = FALSE,
  wa = FALSE,
  wb = FALSE,
  loj = FALSE,
  wo = FALSE,
  wao = FALSE,
  u = FALSE,
  c = FALSE,
  C = FALSE,
  v = FALSE,
  f = NULL,
  F. = NULL,
  r = FALSE,
  e = FALSE,
  s = FALSE,
  S = FALSE,
  split = FALSE,
  sorted = FALSE,
  g = NULL,
  srun = FALSE,
  intersect.bedtools.command = paste0("/home/", Sys.getenv("USERNAME"),
    "/anaconda3/bin/intersectBed"),
  return.command = FALSE,
  return.bed = FALSE,
  delete.output = FALSE,
  run.command = TRUE
)

Arguments

a

A single string defining the BAM/BED/GFF/VCF file "A". Each feature in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe.

b

A character vector with one or more BAM/BED/GFF/VCF file(s) "B". It could be also a single string containing wildcard (*) character(s).

outputFileName

Full path to output file name. By default <working.directory>/intersected.bed.

abam

Logic value to define if file A is a BAM. Each BAM alignment in A is compared to B in search of overlaps. By default FALSE.

ubam

Logic value to define if to write the output as uncompressed BAM. The default is to write compressed BAM output (ubam = FALSE).

bed

Logic value to define whether to write output as BED when using a BAM input abam = TRUE. The default is to write output in BAM (bed = FALSE).

wa

Logic value to define if to write the original entry in A for each overlap. By default FALSE.

wb

Logic value to define if to write the original entry in B for each overlap. Useful for knowing what A overlaps. Restricted by -f and -r. By default FALSE.

loj

Logic value to define if to perform a "left outer join". That is, for each feature in A report each overlap with B. If no overlaps are found, report a NULL feature for B. By default FALSE.

wo

Logic value to define if to write the original A and B entries plus the number of base pairs of overlap between the two features. Only A features with overlap are reported. Restricted by -f and -r. By default FALSE.

wao

Logic value to define if to write the original A and B entries plus the number of base pairs of overlap between the two features. However, A features w/o overlap are also reported with a NULL B feature and overlap = 0. Restricted by -f and -r. By default FALSE.

u

Logic value to define if to write original A entry once if any overlaps found in B. In other words, just report the fact at least one overlap was found in B. Restricted by -f and -r. By default FALSE.

c

Logic value to define if to for each entry in A, report the number of hits in B while restricting to -f. Reports 0 for A entries that have no overlap with B. Restricted -f, -F, -r, and -s. By default FALSE.

C

Logic value to define if to for each entry in A, separately report the number of overlaps with each B file on a distinct line. Reports 0 for A entries that have no overlap with B. Overlaps restricted by -f, -F, -r, and -s. By default FALSE.

v

Logic value to define if to only report those entries in A that have no overlap in B. Restricted by -f and -r.

f

Numeric value defining the minimum overlap required as a fraction of A. Default is 1E-9 (i.e. 1bp). By default NULL.

F.

Numeric value defining the minimum overlap required as a fraction of B. Default is 1E-9 (i.e., 1bp). By default NULL.

r

Logic value defining if the fraction (parameter f) is required to be reciprocal fraction of overlap for A and B. In other words, if -f is 0.90 and -r is used, this requires that B overlap at least 90% of A and that A also overlaps at least 90% of B. By default NULL.

e

Logic value defining if the fraction (parameter f) must be satisfied for A _OR_ B. In other words, if -e is used with -f 0.90 and -F 0.10 this requires that either 90% of A is covered OR 10% of B is covered. Without -e, both fractions would have to be satisfied. By default NULL.

s

Logic value to define if to force "strandedness". That is, only report hits in B that overlap A on the same strand. By default, overlaps are reported without respect to strand. By default FALSE.

S

Logic value to define if to require different strandedness. That is, only report hits in B that overlap A on the _opposite_ strand. By default, overlaps are reported without respect to strand. By default FALSE.

split

Logic value to define if to treat "split" BAM (i.e., having an "N" CIGAR operation) or BED12 entries as distinct BED intervals. By default FALSE.

sorted

Logic value to define, for very large B files, if to invoke a "sweeping" algorithm that requires position-sorted input. When using -sorted, memory usage remains low even for very large files. By default FALSE. It is possible to sort a bed file on terminal by (sort -k1,1 -k2,2n unsorted.bed > sorted.bed) or by the function sort.bed.

g

Specify a genome file the defines the expected chromosome order in the input files for use with the -sorted option. By default NULL.

srun

Logic value to define whether the command should be run in srun mode. By default FALSE.

intersect.bedtools.command

String to define the command to use to recall the bedtools intersect function. An example: "/home/user/anaconda3/bin/intersectBed". By default "/home/USERNAME/anaconda3/bin/intersectBed".

return.command

Logic value to define whether to return the string corresponding to the command for bedtools. By default FALSE.

return.bed

Logic value to define whether to return the resulting bed as data.frame. By default FALSE. Parameter not active when inputs are bam files.

delete.output

Logic value to define whether to delete the exported intersected bed file. By default FALSE. Parameter active only when return.bed = TRUE. Useful when is sufficient to get the result as a data.frame without saving it.

run.command

Logic value to define whether to run the the command line on system terminal and generate the bed resulting from the intersection. By default TRUE.

Details

To know more about the bedtools intersect function see the package manual at the following link:
https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html.

Value

The function generates the files indicated by the output parameters. If required the command line used and/or the resulting intersected bed file. If both outputs are required, the output will be a named list with two values: "command" and "intersected.bed".

Examples

intersect.bedtools(a = bed_file1.bed,
                   b = c("bed_file2.bed", "bed_file3.bed"),
                   wb = TRUE,
                   intersect.bedtools.command = "/home/user/anaconda3/bin/intersectBed")

intersect.bedtools(a = bed_file1.bed,
                   b = c("bed_file2.bed", "bed_file3.bed"),
                   wa = TRUE,
                   return.bed = TRUE,
                   delete.output = T,
                   intersect.bedtools.command = "/home/user/anaconda3/bin/intersectBed")


sebastian-gregoricchio/Rseb documentation built on May 15, 2024, 5:45 a.m.