Home

/

GitHub

/

haizi-zh/cofragr

/

README.MD

README.MD
In haizi-zh/cofragr: R package for cofragmentation analysis

cofragr: cfDNA co-fragmentation analysis in R

Table of Contents

About The Project
Getting Started
- Prerequisites
- Installation
Understanding cofragr
Usage
License
Contact

cofragr is an R package for cofragmentation patterns of cell-free DNA.

The cofragr package is still under development and hasn’t been released to CRAN, so you must install cofragr by using the devtools workflow. You can install devtools two different ways:

You can install the library directly from CRAN:

install.packages("devtools")

You can also install the development version from GitHub:

devtools::install_github("r-lib/devtools")

You can learn more about devtools and get help with its installation here.

Now that you've downloaded devtools, you can use the following command to install the package:

devtools::install_github("epifluidlab/cofragr")

Congratulations! You've successfully downloaded and installed cofragr!

Note: If you're having any problems installing cofragr, make sure that you have the latest version of R installed.

Understanding how cofragr works can be crucial to its successful usage. There are three core functions to this R package:

read_fragments

Calling this function will read the fragment data file, which is typically a BED file. Since this functions uses the package bedtools to read the fragment files, the arguments/options for range and genome will follow the same format.

Options

cofragr::read_fragments(file_path, range = NULL, genome = NULL)

range; default: range = NULL

Specifies the ranges of the BED file that read_fragments will read.

Note that range follows the standard genomic range notation, ex. "chr1:1001-2000".

genome; default: genome = NULL

Specifies the reference genome for the BED file in question.

Note that genome can take any of the genome arguments that GenomeInfoDb::Seqinfo() can.

Again, you can read more about the options behind range and genome here.

contact_matrix

Calling this function will calculate the contact matrix from the fragment data file, which should have already been called by the read_fragments function.

Options

cofragr::contact_matrix(frag, bin_size = 500e3L, n_workers = 1L, subsample = 10e3L, min_sample_size = 100L, bootstrap = 1L, seed = NULL)

Note that frag is already defined as the fragment file as a result of the read_fragments function.

bin_size; default: bin_size = 500e3L

Specifies the size of the bins for the contact matrix in base-pair units.

n_workers; default: n_workers = 1L

Specifies the number of workers for the task at hand.

subsample; default: subsample = 10e3L

Specifies the sampling size for the contact matrix calculations.

Note: In this case, the subsample means for each genomic bin, we only randomly select 1 in every 10,000 fragments for the analysis. This counteracts the fact that the number of fragments are different between different bins.

bootstrap; default: bootstrap = 1L

Specifies the number of bootstrap samples in the contact matrix calculations.

write_contact_matrix

Calling this function will write the contact matrix to a specified filepath; the contact matrix should have already been calculated by the contact_matrix function.

Options

cofragr::write_contact_matrix(cm, file_path, comments = NULL)

comments; default: comments = NULL

Allows the addition of a character vector to be appended at the top of the written BED file as a header.

cofragr provides both R APIs and an R script for the analysis.

The pipeline requires fragment data files as input. A fragment data file is essentially a BED file, with each row representing a cfDNA fragment. Below is an example:

14      19000035        19000198        .       27      +
14      19000044        19000202        .       42      +
14      19000045        19000202        .       20      -
14      19000049        19000202        .       12      +

The six columns are: chrom, start, end, feature name (which we do not use), MAPQ (which is the smallest MAPQ of the two paired reads), and strand (which we do not use).

You can easily prepare the fragment data file from a query-sorted BAM file, or refer to our FinaleDB paper for more details (Zheng, Zhu, and Liu 2020).

The quickest way to start is using the RScript for your dataset.

You can find the script in R package installation:

system.file("extdata/scripts", "cofragr.R", package = "cofragr")

Or simply clone the source code from GitHub and find the script in the directory inst/ext/scripts/.

Then you can simply run some variation of the following, inputting any arguments as you desire.

Rscript cofragr.R 
-i examplefile.bed.gz \
-o output_file_path \
--min-mapq 30 \
--chroms 1:2:3 \
--subsample 10000 \
--min-fraglen 50 \
--max-fraglen 350 \
--sample-id example \

We encourage you to look through the script file, cofragr.R, to understand all of the available arguments.

Output

You will be outputted two files, the calculated contact matrix and its accompanying index file.

cofrag_cm.bed.gz is the calculated contact matrix:

#chrom  start   end     chrom2  start2  end2    score   n_frag1 n_frag2 p_value p_value_sd
1       0       500000  1       0       500000  15.14129840182        435     435     0.58172039812       .
1       0       500000  1       500000  1000000 0       435     2539    0       .
1       0       500000  1       1000000 1500000 0       435     3997    0       .
1       0       500000  1       1500000 2000000 0       435     3707    0       .
1       0       500000  1       2000000 2500000 0       435     4361    0       .

For large-scale real applications, we suggest use workflow management tools such as snakemake, nextflow, etc. We provide a snakemake file. Similar to cofragr.R, you can find it:

In R package installation:

system.file("extdata/scripts", "cofragr.smk", package = "cofragr")

Or simply clone the source code from GitHub and find the script in the directory inst/ext/scripts/.

Note: Usually snakemake workflow is more related to your specific computing environment. As a result, the example above is only for your reference. Have a look in both the snakemake file and the command above, and modify it accordingly.

See LICENSE for more information.

Haizi Zheng haizi.zh@gmail.com
Yaping Liu lyping1986@gmail.com
Ravi Bandaru ravi14.bandaru@gmail.com

haizi-zh/cofragr documentation built on Dec. 20, 2021, 2:45 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

haizi-zh/cofragr
R package for cofragmentation analysis

README.MD
In haizi-zh/cofragr: R package for cofragmentation analysis

cofragr: cfDNA co-fragmentation analysis in R

About The Project

Getting Started

Prerequisites

Installation

Understanding cofragr

read_fragments

Options

contact_matrix

Options

write_contact_matrix

Options

Usage

Preparation

Analysis with RScript

Output

Analysis with Snakemake

License

Contact

R Package Documentation

Browse R Packages

We want your feedback!

haizi-zh/cofragr R package for cofragmentation analysis

README.MD In haizi-zh/cofragr: R package for cofragmentation analysis

cofragr: cfDNA co-fragmentation analysis in R

About The Project

Getting Started

Prerequisites

Installation

Understanding cofragr

read_fragments

Options

contact_matrix

Options

write_contact_matrix

Options

Usage

Preparation

Analysis with RScript

Output

Analysis with Snakemake

License

Contact

R Package Documentation

Browse R Packages

We want your feedback!

haizi-zh/cofragr
R package for cofragmentation analysis

README.MD
In haizi-zh/cofragr: R package for cofragmentation analysis