read_validpairs: Read Valid-Pairs Files

Description Usage Arguments Details Value Validation Examples

View source: R/read_validpairs.R

Description

A 'VALIDPAIRS' file is for all valid pairs in each library.

Usage

1

Arguments

file

Pathname to a ‘*.validPairs.txt(.gz)’ file.

columns

(optional) Name of columns to be read.

...

Additional arguments passed to readr::read_tsv().

Details

The description here are adopted from GSE84920_README.txt. It is sparse on what 'VALIDPAIRS' files contain, but says that they "VALIDPAIRS files are for all valid pairs in each library. BEDS were paired based on identical read name and simultaneously associated with the proper cellular index. VALIDPAIRS also include the result of bedtools closest, to determine the closest DpnII restriction site and distance for each mate."

Value

A data.frame with 17 columns:

col1
col2
col3
col4
col5
col6
readname

Read names, e.g. D00584:136:HMTLJBCXX:1:1101:10000:101176

col8
col9
col10

Strand (- or +)

col11

Strand (- or +)

inner_barcode

Barcode ...

outer_barcode

Barcode ...

col14
col15
col16
col17

Validation

The read_validpairs() function does some basic validation on the values read.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
path <- system.file("extdata", package = "GSE84920.parser")
file <- file.path(path, "GSM2254215_ML1.rows=1-1000.validPairs.txt.gz")
data <- read_validpairs(file)
print(data)
# # A tibble: 1,000 x 17
#    chr_a start_a  end_a chr_b start_b  end_b readname  col8  col9 col10 col11 inner_barcode
#    <chr>   <int>  <int> <chr>   <int>  <int> <chr>    <int> <int> <chr> <chr> <chr>        
#  1 mous…  2.28e7 2.28e7 mous…  5.33e7 5.33e7 D00584:…    37    42 +     -     ATCCGCGG     
#  2 huma…  8.86e7 8.86e7 huma…  8.91e7 8.91e7 D00584:…    42    42 -     -     GAGGAGCA     
#  3 huma…  1.27e8 1.27e8 huma…  1.27e8 1.27e8 D00584:…    42    42 +     +     GCTACGGT     
#  4 huma…  4.21e7 4.21e7 huma…  4.27e7 4.27e7 D00584:…    42    42 +     +     AGGTGCGA     
#  5 huma…  2.09e8 2.09e8 huma…  2.32e8 2.32e8 D00584:…    42    35 +     -     GCCTCGAA     
#  6 huma…  5.57e6 5.57e6 huma…  1.50e8 1.50e8 D00584:…    42    42 +     -     GCTCGCTA     
#  7 mous…  4.12e7 4.12e7 mous…  4.12e7 4.12e7 D00584:…    42    42 +     -     GAGGAGCA     
#  8 mous…  6.54e7 6.54e7 mous…  6.59e7 6.59e7 D00584:…    42    42 -     +     TCCGGACA     
#  9 huma…  1.82e8 1.82e8 huma…  1.82e8 1.82e8 D00584:…    42    42 +     -     CAGGCTTG     
# 10 mous…  5.86e7 5.86e7 mous…  6.33e7 6.33e7 D00584:…    42    42 +     +     TCACGAGC     
# # … with 990 more rows, and 5 more variables: outer_barcode <chr>, col14 <chr>, col15 <int>,
# #   col16 <chr>, col17 <int>

HenrikBengtsson/ramani documentation built on March 27, 2021, 11:47 p.m.