Description Usage Arguments Details Value Validation Examples
View source: R/read_percentages.R
A 'percentages' file provides summary statistics for each processed cellular index in combinatorial single-cell Hi-C data.
1 | read_percentages(file, columns = NULL, ...)
|
file |
Pathname to a ‘*.percentages.txt(.gz)’ file. |
columns |
(optional) Name of columns to be read. |
... |
Additional arguments passed to |
The description here are adopted from GSE84920_README.txt, which refers to the "manuscript" for description of the "PERCENTAGES" files. The column names returned are ours, because the data files do not provide column names.
A data.frame with 17 columns:
|
Fraction Reads Mapping to hg19 |
|
Fraction Reads Mapping to mm10 |
|
Number of Reads Mapping to hg19 |
|
Number of Reads Mapping to mm10 |
|
Total Number of Read Pairs filtering out
interspecies (== |
|
Total Number of Reads Pairs |
|
Round 1 Barcode (Inner) |
|
Round 2 Barcode (Outer) |
|
|
|
|
|
Number of times a DpnII fragment is observed once |
|
Number of times a DpnII fragment is observed twice |
|
Number of times a DpnII fragment is observed thrice |
|
Number of times a DpnII fragment is observed four times |
|
Cis-trans ratio for that cellular index |
|
If applicable (only HeLa S3 and HAP1 cells), fraction of homozygous alternate HeLa allele calls |
|
Programmed cell type assignment, if applicable.
For ML libraries, only cells with >0.95 in |
The read_percentages()
function does some basic validation on the
values read.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | path <- system.file("extdata", package = "GSE84920.parser")
file <- file.path(path, "GSM2254215_ML1.rows=1-1000.percentages.txt.gz")
data <- read_percentages(file)
print(data)
# # A tibble: 1,000 x 16
# hg19_frac mm10_frac hg19_count mm10_count pair_count inner_barcode outer_barcode is_observed
# <dbl> <dbl> <int> <int> <int> <chr> <chr> <chr>
# 1 1.000 0.0000356 28050 1 28052 ACCACCAC TCAGATGC True
# 2 0.680 0.320 19081 8970 28052 ACCACCAC TCAGATGC Randomized
# 3 1.000 0.0000370 27010 1 28052 ACCACCAC TCAGATGC True
# 4 0.683 0.317 18444 8555 28052 ACCACCAC TCAGATGC Randomized
# 5 0 1 0 1 1 CATAGCGC ACTTGATA True
# 6 1 0 1 0 1 CATAGCGC ACTTGATA Randomized
# 7 0 1 0 1 1 CATAGCGC ACTTGATA True
# 8 1 0 1 0 1 CATAGCGC ACTTGATA Randomized
# 9 1 0 2 0 2 GGCCGTTC GCCATTAA True
# 10 1 0 2 0 2 GGCCGTTC GCCATTAA Randomized
# # … with 990 more rows, and 8 more variables: Col10 <chr>, dpnii_1x <int>, dpnii_2x <int>,
# # dpnii_3x <int>, dpnii_4x <int>, cistrans_ratio <dbl>, hela_allele_frac <dbl>,
# # celltype <chr>
print(table(data$celltype))
### HAP1 HeLa MEF Patski Undetermined
### 156 163 174 152 355
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.