README.md

RepliSeq

License: GPL v3

Analysis of Repli-Seq data to study DNA replication timing program in R.

Description:

An R package that features a set of functions to conduct Repli-seq data analysis.

We propose this package to analyze Repli-seq data within data.frames, which lets you easily complete your analysis with dplyr, calculate intersections with tidygenomics and plot your results with ggplot vizualizations.

RepliSeq functions include loading multi-fractions (from 2 to N fractions defined by your experiment dessign and your hardware capabilities) Repli-seq assay data as count matrices; rescaling profiles; smoothing profiles; calculting metrics such as Replication timing (calculated as the S50, on a scale from 0, early, to 1, late, which is the moment in S phase when a sequence has been replicated in 50% of cells replication timing) and URI (Under replication index got from two repliseq assays comparison).

Installation:

You can install this package by entering the following within R:


devtools::install_github("CL-CHEN-Lab/RepliSeq")

Requirements:

This package depends on:

As mentionned in the DESCRIPTION, this packages imports:

In addition, the function writeBigWig() requires UCSC's wigToBigWig application to be installed on the computer. It can be found at encodeproject

Authors:

Sami EL HILALI and Chunlong CHEN (Institut Curie)

Don't hesitate to contact the authors or open an issue for a question or if you wish to see new features to be added to this package.

References:

Brison O., El-Hilali S., Azar, D., Koundrioukoff1 S., Schmidt M., Naehse-Kumpf V., Jaszczyszyn Y., Lachages A.M., Dutrillaux B., Thermes C., Debatisse M. and Chen C.L. (2019) Transcription-Mediated Organization of the Replication Initiation Program Across Large Genes Sets Up Common Fragile Sites Genome-Wide. Nat. Commun. 10, 5693

Chen C.L., Rappailles A., Duquenne L., Huvet M., Guilbaud G., Farinelli L, Audit B, d'Aubenton-Carafa Y., Arneodo A., Hyrien O. and Thermes C. (2010) Impact of replication timing on non-CpG and CpG substitution rates in mammalian genomes. Genome. Res. 20, 447-457.

Usage examples:

We propose an overview of some function usage. For extended documentation, please refer to the Vignette how-to-use.

readRS(path_data,fractions):

This function reads Repli-seq assays from multiple files (one file for one fraction) and outputs a dataframe from it. It requires bedgraph inputs (see bedgraph spec) with a one line header but no other comments such as:

track type=bedGraph name=NT_chr22-s1 description=50kbprofile chr22 0 50000 0 chr22 50000 100000 0


### args :
temp_paths <- c("../inst/extdata/NT_chr22-s1.bdg","../inst/extdata/NT_chr22-s2.bdg",
                "../inst/extdata/NT_chr22-s3.bdg","../inst/extdata/NT_chr22-s4.bdg",
                "../inst/extdata/NT_chr22-s5.bdg","../inst/extdata/NT_chr22-s6.bdg")
temp_fractions <- c("S1","S2","S3","S4","S5","S6")

### 2 fractions RepliSeq
# apply function :
RS_early <- readRS(temp_paths[1:2],temp_fractions[1:2])

### 6 fractions RepliSeq
# apply function :
RS_all <- readRS(temp_paths,temp_fractions)

### 1 fraction RepliSeq ( for S0 controls )
# apply function : 
RS_S0 <- readRS("../inst/extdata/NT_chr22-s0.bdg","S0")

### Result :

tail(RS_early)

| chr | start | stop | S1 | S2 | |--------|----------|----------|--------|-------| | | | | | | | chr22 | 51000000 | 51050000 | 12.392 | 4.929 | | chr22 | 51050000 | 51100000 | 11.604 | 5.887 | | chr22 | 51100000 | 51150000 | 12.568 | 7.941 | | chr22 | 51150000 | 51200000 | 9.853 | 5.887 | | chr22 | 51200000 | 51250000 | 2.584 | 1.711 | | chr22 | 51250000 | 51300000 | 0.000 | 0.000 |

calculateS50(rs_assay):

This function returns a dataframe composed of genomic coordinates associated with replication timing as an S50 value comprised within 0 (early replicating) and 1 (late replicating).


temp_rs <- data.frame(chr = rep("chr1",7),
                      start = seq(0,6000,1000),
                      stop = seq(1000,7000,1000),
                      S1 = c(0,0,0,1,1,1,1),
                      S2 = c(0,0,1,1,1,1,0),
                      S3 = c(0,1,1,1,1,0,0),
                      S4 = c(1,1,1,1,0,0,0))


temp_S50 <- RepliSeq::calculateS50(temp_rs)

# Result :

print(temp_S50)

| chr | start | stop | S50 | |--------|-------|-------|-------| | | | | | | chr1 | 0 | 1000 | 0.875 | | chr1 | 1000 | 2000 | 0.750 | | chr1 | 2000 | 3000 | 0.625 | | chr1 | 3000 | 4000 | 0.500 | | chr1 | 4000 | 5000 | 0.375 | | chr1 | 5000 | 6000 | 0.250 | | chr1 | 6000 | 7000 | 0.125 |

calculateURI(rs_x, rs_y):

This function calculates URI between two Repli-seq assays. It returns a dataframe with the following columns: chr,start,stop,sum_x,sum_y,mean_xy,URI

####### load second Repli-seq assay for comparison 
####### 6 fractions RepliSeq

# args :

aph_paths <- c("../inst/extdata/Aph_chr22-s1.bdg","../inst/extdata/Aph_chr22-s2.bdg",
               "../inst/extdata/Aph_chr22-s3.bdg","../inst/extdata/Aph_chr22-s4.bdg",
               "../inst/extdata/Aph_chr22-s5.bdg","../inst/extdata/Aph_chr22-s6.bdg")
aph_fractions <- temp_fractions

# read :

RS_aph_all <- readRS(aph_paths,aph_fractions)

# apply function :

aph_nt_uri <- calculateURI(RS_aph_all,RS_all)

# Result :

tail(aph_nt_uri)


| chr | start | stop | sum_x | sum_y | mean_xy | URI | |--------|----------|----------|--------|--------|---------|-------------| | | | | | | | | | chr22 | 51000000 | 51050000 | 27.581 | 30.869 | 29.225 | -1.37048107 | | chr22 | 51050000 | 51100000 | 30.556 | 31.274 | 30.915 | -0.66372243 | | chr22 | 51100000 | 51150000 | 38.770 | 36.226 | 37.498 | 0.05718338 | | chr22 | 51150000 | 51200000 | 32.394 | 26.028 | 29.211 | 1.24529116 | | chr22 | 51200000 | 51250000 | 10.063 | 8.533 | 9.298 | 0.82273039 | | chr22 | 51250000 | 51300000 | 0.000 | 0.000 | 0.000 | NaN |



CL-CHEN-Lab/RepliSeq documentation built on Sept. 11, 2021, 12:04 p.m.