find_bed_regions: Find genes, exons, and introns in a gff3 file

Description Usage Arguments Value Author(s) Examples

Description

If tsv files are written out by selecting "write_all" for 'out_type', they will overwrite any existing files with the same name in 'out_dir'.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
find_bed_regions(
  gff3_file,
  source_select = NULL,
  gene_label = "gene",
  exon_label = "exon",
  verbose = FALSE,
  prefix = NULL,
  out_dir = NULL,
  out_type = c("genes", "introns", "exons", "write_all"),
  ...
)

Arguments

gff3_file

Path to input file in 'gff3' format.

source_select

Character vector; only use regions from these sources. Must match values in 'source' column of gff3 file. Optional.

gene_label

String; value used to indicate genes in gff3 file. Must match at least one value in 'type' column of gff3 file. Default "gene".

exon_label

String; value used to indicate exons in gff3 file. Must match at least one value in 'type' column of gff3 file. Default "exon".

verbose

Logical; should 'bedr' functions output all messages?

prefix

String; prefix to attach to tsv files if 'out_type' is "write_all".

out_dir

Directory to write tsv files if 'out_type' is "write_all".

out_type

Type of output to return: "genes": dataframe in "bed" format of genes. "introns": dataframe in "bed" format of introns. "exons": dataframe in "bed" format of exons. "write_all": write tab-separated files for each of 'genes', 'introns', and 'exons' to 'out_dir'. The hash digest of the combined genes, introns, and exons will be returned.

...

Other arguments. Not used by this function, but meant to be used by drake_plan for tracking during workflows.

Value

Dataframe or character.

Author(s)

Joel H Nitta, joelnitta@gmail.com

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Find genes

arabidopsis_gff_file <- system.file("extdata", "Arabidopsis_thaliana_TAIR10_40_small.gff3", package = "baitfindR", mustWork = TRUE)

genes <- find_bed_regions(
  gff3_file = arabidopsis_gff_file,
  source_select = "araport11",
  out_type = "genes"
)
head(genes)

# Find introns
introns <- find_bed_regions(
  gff3_file = arabidopsis_gff_file,
  source_select = "araport11",
  out_type = "introns"
)
head(introns)

# Find exons
exons <- find_bed_regions(
  gff3_file = arabidopsis_gff_file,
  source_select = "araport11",
  out_type = "exons"
)
head(exons)

## Not run: 
# Write genes, introns, and exons out as tsv files
temp_dir <- tempdir()
find_bed_regions(
  gff3_file = arabidopsis_gff_file,
  source_select = "araport11",
  out_type = "write_all",
  out_dir = temp_dir,
  prefix = "arabidopsis"
)

## End(Not run)

joelnitta/baitfindR documentation built on May 7, 2020, 6:21 p.m.