bedify: Parse data by a bed file

Description Usage Arguments Details Examples

Description

Seperate a data matrix into list elements based on coordinates from bed format data.

Usage

1
bedify(myBed, myData, fill_missing = 0L, verbose = 0L)

Arguments

myBed

matrix of bed format data

myData

StringMatrix or IntegerMatrix to be sorted

fill_missing

include records for when there is no data (0, 1). By default these records are omitted.

verbose

should verbose output be generated (0, 1)

Details

Bed format data contain at least three columns. The first column indicates the chromosome (i.e., supercontig, scaffold, contig, etc.). The second cotains the starting positions. The third the ending positions. Optional columns are in columns four through nine. For example, the fourth column may contain the names of features. All subsequent columns are ignored here. In an attempt to optimize performance the data are expected to be formatted as a character matrix. The starting and end positions are converted to numerics internally.

The matrix format used here is based on vcf type data. Typically these data have a chromosome as the first column. Each chromosome has its own coordinate system which begins at one. This means that using multiple chromosomes will necessitate some fix to the coordinate systems. Here I take the perspective that you should simply work on one chromosome at a time, so the chromosome information is ignored. The first column is the chromosome, which I ignore. The second column is the position, which is used for sorting. Subsequent columns are not treated but are brought along with the subset.

When the matrix is of numeric form the first column, which contains the chromosome identifier (CHROM), must also be numeric. This is because matrix elements must all be of the same type.

Bed format at UCSC

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
bed <- structure(c("chr_290", "chr_4176", "chr_126921", "chr_126921", 
"chr_125157", "chr_125157", "chr_125157", "chr_125157", "chr_126888", 
"chr_126888", "47", "400", "4344", "1", "3712", "6025", "2269", 
"1779", "7930", "4637", "80", "500", "4967", "9066", "6566", 
"6450", "2933", "2226", "11939", "7913", "gene_1", "gene_2", 
"gene_3", "gene_4", "gene_5", "gene_6", "gene_7", "gene_8", "gene_9", 
"gene_10"), .Dim = c(10L, 4L), .Dimnames = list(NULL, c("chrom", 
"chromStart", "chromEnd", "name")))


vcf.matrix <- structure(c("chr_290", "chr_290", "chr_4176", "chr_4176", "chr_50514", 
"chr_64513", "chr_107521", "chr_121987", "chr_122006", "chr_122006", 
"78", "96", "406", "425", "863", "2853", "77", "103", "243", 
"636", "0/1:5,4:9:99:117,0,153", "0/0:9,0:9:99:0,27,255", "0/1:10,11:21:99:255,0,255", 
"0/1:10,11:21:99:255,0,255", "0/1:14,14:28:99:255,0,255", "0/1:29,13:42:99:255,0,255", 
"0/1:26,11:37:99:255,0,255", "0/1:21,14:35:99:255,0,255", "0/0:12,1:13:67:0,4,255", 
"0/1:55,8:63:99:99,0,255", "0/1:10,8:18:99:234,0,255", "0/0:17,0:17:99:0,51,255", 
"0/1:16,13:29:99:255,0,255", "0/1:16,13:29:99:255,0,255", "0/1:26,19:45:99:255,0,255", 
"0/1:50,19:69:99:255,0,255", "0/1:62,17:79:99:255,0,255", "0/1:95,22:117:99:255,0,255", 
"0/1:32,5:37:99:68,0,255", "0/1:69,21:90:99:255,0,255"), .Dim = c(10L, 
4L), .Dimnames = list(NULL, c("CHROM", "POS", "sample_1", "sample_2"
)))


class(bed)
is.character(bed)
class(vcf.matrix)
is.character(vcf.matrix)

var.list <- bedify(bed, vcf.matrix)
table(unlist(lapply(var.list, nrow)))

knausb/coveRage documentation built on May 20, 2019, 12:52 p.m.