read_percentages: Read Percentages Files

Description Usage Arguments Details Value Validation Examples

View source: R/read_percentages.R

Description

A 'percentages' file provides summary statistics for each processed cellular index in combinatorial single-cell Hi-C data.

Usage

1

Arguments

file

Pathname to a ‘*.percentages.txt(.gz)’ file.

columns

(optional) Name of columns to be read.

...

Additional arguments passed to readr::read_tsv().

Details

The description here are adopted from GSE84920_README.txt, which refers to the "manuscript" for description of the "PERCENTAGES" files. The column names returned are ours, because the data files do not provide column names.

Value

A data.frame with 17 columns:

hg19_frac

Fraction Reads Mapping to hg19

mm10_frac

Fraction Reads Mapping to mm10

hg19_count

Number of Reads Mapping to hg19

mm10_count

Number of Reads Mapping to mm10

hg19mm10_count

Total Number of Read Pairs filtering out interspecies (== read_hg19_count + read_mm10_count)

pair_count

Total Number of Reads Pairs

inner_barcode

Round 1 Barcode (Inner)

outer_barcode

Round 2 Barcode (Outer)

is_observed

True or Randomized; True are the observed data, Randomized are the result of shuffling all barcode assignments to new reads

Col10

All or Long; All are all intrachromosomal / interchromosomal reads associated with a cellular index; Long are only inter- and intra > 20 kbp

dpnii_1x

Number of times a DpnII fragment is observed once

dpnii_2x

Number of times a DpnII fragment is observed twice

dpnii_3x

Number of times a DpnII fragment is observed thrice

dpnii_4x

Number of times a DpnII fragment is observed four times

cistrans_ratio

Cis-trans ratio for that cellular index

hela_allele_frac

If applicable (only HeLa S3 and HAP1 cells), fraction of homozygous alternate HeLa allele calls

celltype

Programmed cell type assignment, if applicable. For ML libraries, only cells with >0.95 in hg19_frac or mm10_frac receive assignments.

Validation

The read_percentages() function does some basic validation on the values read.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
path <- system.file("extdata", package = "GSE84920.parser")
file <- file.path(path, "GSM2254215_ML1.rows=1-1000.percentages.txt.gz")
data <- read_percentages(file)
print(data)
# # A tibble: 1,000 x 16
#    hg19_frac mm10_frac hg19_count mm10_count pair_count inner_barcode outer_barcode is_observed
#        <dbl>     <dbl>      <int>      <int>      <int> <chr>         <chr>         <chr>      
#  1     1.000 0.0000356      28050          1      28052 ACCACCAC      TCAGATGC      True       
#  2     0.680 0.320          19081       8970      28052 ACCACCAC      TCAGATGC      Randomized 
#  3     1.000 0.0000370      27010          1      28052 ACCACCAC      TCAGATGC      True       
#  4     0.683 0.317          18444       8555      28052 ACCACCAC      TCAGATGC      Randomized 
#  5     0     1                  0          1          1 CATAGCGC      ACTTGATA      True       
#  6     1     0                  1          0          1 CATAGCGC      ACTTGATA      Randomized 
#  7     0     1                  0          1          1 CATAGCGC      ACTTGATA      True       
#  8     1     0                  1          0          1 CATAGCGC      ACTTGATA      Randomized 
#  9     1     0                  2          0          2 GGCCGTTC      GCCATTAA      True       
# 10     1     0                  2          0          2 GGCCGTTC      GCCATTAA      Randomized 
# # … with 990 more rows, and 8 more variables: Col10 <chr>, dpnii_1x <int>, dpnii_2x <int>,
# #   dpnii_3x <int>, dpnii_4x <int>, cistrans_ratio <dbl>, hela_allele_frac <dbl>,
# #   celltype <chr>

print(table(data$celltype))
###   HAP1    HeLa    MEF   Patski Undetermined 
###    156     163    174      152          355 

HenrikBengtsson/ramani documentation built on March 27, 2021, 11:47 p.m.