Data can be imported to gpatterns from raw reads (.fastq files), mapped reads (.bam files) or from tidy_cpgs files were generated by the gpatterns package.

Data: UMI-RRBS of Breast and Lung tumors

To explore gpatterns' import functions, we'll start with a dataset of UMI-RRBS of breast and lung tumors. The dataset contains 16 breast tumors samples from 4 patients, with multiple sections from each sample.

Importing from raw reads (.fastq)

bissli2

bissli2 is a wrapper around bowtie2 that maps bisulfite converted reads and extracts methylation calls. - more about bissli params and usage

gpatterns.bissli2(r1_fastq = 'Breast_2460_1.3_R1/fastq/Breast_2460_1.3_R1.fastq', r2_fastq = 'Breast_2460_1.3_R1/fastq/Breast_2460_1.3_R2.fastq', out_bam = 'Breast_2460_1.3_R1/bam/Breast_2460_1.3.bam', genome_seq = "/net/mraid14/export/data/db/tgdb/hg19/seq", genome_type="ct_ga", bissli2_idx = "/net/mraid14/export/data/tools/bissli2/hg19/hg19")

Importing from mapped reads (.bam)

gpatterns.import_from_bam('bam/Breast_2460_1.3.bam', workdir='Breast_2460_1.3_R1', trim=1, cgs_mask_file='/net/mraid14/export/tgdata/users/aviezerl/proj/gpatterns_nugget/msp1_stickey_ends', frag_intervs='intervs.msp1.fid', rm_off_target=T, maxdist=4, use_sge=F, parallel=TRUE, paired_end=TRUE)

Importing from tidy_cpgs files

gpatterns.import_from_tidy_cpgs(tidy_cpgs="Breast_2460_1.3_R1/tidy_cpgs", track="gpatterns_nugget.Breast_2460_1.3_R1", description="", groot="/home/aviezerl/hg19", max_span=600, parallel=TRUE)

Generating methylation patterns

In order to generate methylation patterns we first need to define the CpG scope, or 'pattern space', i.e. the set of adjacent CpGs we will be looking at. In theory, we could choose a different set of CpGs for each sample separately, in fact, gpatterns does this routinely by generating .patX tracks for each sample with patterns of length X for each CpG.

However, this approach becomes problematic if we want to compare the pattern distribution of samples directly. Therefore, the best practice is to define the 'pattern space' based on all the samples together. gpatterns provides two functions to calculate such space:

Since our example is from RRBS data we would use gpatterns.intervs_to_pat_space with the msp1 fragments:

pat_space <- gpatterns.intervs_to_pat_space(tracks, intervals='intervs.msp1.fid', adjacent=TRUE, pat_len=5, parallel=TRUE)

And now we can create the patterns attributes for each of our tracks:

purrr::walk(tracks, ~ gpatterns.create_patterns_track(track = .x, description = paste0('patterns of ', .x), pat_space=pat_space, max_missing=0))


tanaylab/gpatterns documentation built on May 15, 2023, 6:23 p.m.