bismark2segment: Extract 4-CpG segments information from bismark reporting...

Description Usage Arguments Value Note Examples

View source: R/bismark2segment.R

Description

This function is used for processing data into the format that could be recognised by beta mixture model or nonparametric Bayesian clustering algorithm to identify pCSM loci.

Usage

1
bismark2segment(files,file_type="regular",CpG_file,tmp_folder="temp",split_by_chrom=FALSE)

Arguments

files

File or files with CpG methylation information in each sequenced read generated by bismark_extractor in ".gz" compressed format. Note that only one filename with full path is needed for regular methylation dataset and a vector containning the filename with full path of each single-cell is needed for processing single-cell datasets.

file_type

Type of input dataset with "regular" represents the regular methylation data and "single-cell" represents single-cell mathylation data.

CpG_file

File includes the coordinate of all the CpG loci in forward strand of the target genome, with two columns separated by tab, the chromsome id in the first column and location in the second column.

split_by_chrom

Logical; Used for single-cell datasets when the number of cells is huge. Note that by setting split_by_chrom=TRUE, a list will be returned with each elements represents the input of beta mixture model for one chromsome.

Value

segment

A matrix or a list containning the 4CpG segments infromation for pCSM loci identification.

Note

loading and processing the CpG index may need several minutes

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
library('csmFinder')
#For bulk methylome, the 4-CpG segments could be extracted as follow:

#get the demo datasets
bismark_result=paste(system.file(package="csmFinder"),"extdata/bulk_CpG_extract_file/demo.dataset.gz",sep='/')
CpG_ref=paste(system.file(package="csmFinder"),"extdata/CpG_plus.reference",sep='/')

#generate the 4-CpG segment
segment <- bismark2segment(files=bismark_result,CpG_file=CpG_ref)

#############################
#For single-cell methylomes, file_type="single-cell" argument is needed, 
#  and the 4-CpG segments could be extracted as follow:

#get the demo datasets
scDataDir <- paste(system.file(package="csmFinder"),
                   "extdata/single_cell_CpG_extract_file",sep='/')
file_list <- paste(scDataDir,list.files(scDataDir),sep='/')

#generate the 4-CpG segment
scSegment <- bismark2segment(files=file_list,file_type="single-cell",
                             CpG_file=CpG_ref)

Gavin-Yinld/csmFinder documentation built on Sept. 16, 2019, 3:31 p.m.