knitr::opts_chunk$set(
  collapse = F,
  results = 'hold',
  message = F,
  comment = "",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

RBedtools

RBedtools is a light wrapper for the bedtools software suite.

I wrote this package for several reasons

Installation

You can install RBedtools from here with:

# install.packages("devtools")
devtools::install_github("vinay-swamy/RBedtools")

Remember to install bedtools as well. Instructions can be found here

Design

I tried to make this package as simple to use as I could. I assumed that the user has some knowledge of how bedtools works and have very little error checking with regards to bedtools input. If you get an error message from bedtools, and not from this package, its likely your bedtools command was wrong. The actual format of bed files/ data frames is not checked, ie 0 based, number of columns etc.

objects

All RBedtools object are simple S3 objects.

bt_obj: an object that contains a charcter vector that is the path to a bed file.

bt_cmd: a object containing a stored and unrun bedtools command. See below for more info

functions

RBedtools' core function is RBedtools

library(RBedtools)
args('RBedtools')

tool is the bedtools function you would like to use, eg intersect, sort, closest etc

options are the options for selected tool. It must be formated as a single character vector in the following way: '-option1 input1 option2 input2' the default option is no option

output can one of three options: blank, a character vector specifying a location to write a bed file, or 'stdout' signifying the output of this bedtools command will be piped to another command. More on that below

... The actual arguments for the bedtools function, in the following format: arg1='<path_to_bed>', arg2='<path_to_bed>'

helpers

there are 2 functions to allow for use of data frame, from_data_frame, and to_data_frame

from_data_frame takes a data frame as an input and returns a bedtools object

to_data_frame takes a bedtools object and returns a data frame

examples

sort a bed file. specifying no output will write to a file in the /tmp/ directory

library(RBedtools)
sorted.bed <- RBedtools(tool = 'sort', i='data/gerp.chr1.bed.gz')
unclass(sorted.bed)

otherwise the bed is written to the output argument

sorted.bed <- RBedtools(tool = 'sort',output = 'sorted.bed' i='data/gerp.chr1.bed.gz')

use the to_data_frame command to convert a bt_obj to a dataframe

sorted.bed <- RBedtools(tool = 'sort', i='data/gerp.chr1.bed.gz')
sorted.bed.df <- to_data_frame(sorted.bed)
head(sorted.bed.df)

piping is also supported(and encouraged)

suppressMessages(library(dplyr))
sorted.bed.df <-  RBedtools(tool = 'sort', i='data/gerp.chr1.bed.gz') %>% to_data_frame
head(sorted.bed.df)

Use a single character vector for options to specfify options for bedtools command. format options as you would on the command line

intersect_loj <- RBedtools(tool = 'intersect',
                            options = '-loj',
                            a='data/gerp.chr1.bed.gz',
                            b='data/aluY.chr1.bed.gz') %>% to_data_frame
head(intersect_loj)

Using an R data frame

closest <- sorted.bed.df %>% from_data_frame %>% 
    RBedtools('closest', a=., b='data/gerp.chr1.bed.gz') %>% to_data_frame
head(closest)

An RBedtools command can also be piped to another RBedtools command, using 'stdout' as the output of the first command. When 'stdout' using . place holder for one of the arguments in the second command.

sort_and_intersect <- RBedtools(tool = 'sort', output = 'stdout', i='data/gerp.chr1.bed.gz') %>% 
    RBedtools(tool = 'intersect', a=., b='data/aluY.chr1.bed.gz') %>% 
    to_data_frame 
head(sort_and_intersect)

An importnant note about piping multiple RBedtools commands

Using 'stdout' makes RBedtools return a bt_cmd object, which is a constructed but unrun command

output_for_pipe <-  RBedtools(tool = 'sort', output = 'stdout', i='data/gerp.chr1.bed.gz')
class(output_for_pipe)

Commands are not run until the RBedtools commands is run. This allows for the contruction of a single command that uses shell pipes |

unclass(output_for_pipe)

if you do not specify 'stdout' when chaining commands, the code will still run, but the output of each command will be written to a file and read each time, which may significantly slow down your code

Full Disclaimer, I havent tested this on all the bedtools functions. I use the sorted, intersect and closest the most, so these are the ones I tested on. If you find a bug/have any improvements, let me know.



vinay-swamy/RBedtools documentation built on Feb. 19, 2020, 5:15 p.m.