slidingWindow: Sliding Windows

Description Usage Arguments Details Value Examples

View source: R/slidingWindow.r

Description

This function allows to calculate the sliding window based on the chromosome length of the desired chromosome to analyse. A data frame containing the mean SNP-index values of both bulks gets generated.

Usage

1
2
3
4
5
6
7
8
slidingWindow(
  meta,
  chrList,
  chrID,
  windowSize = 1e+06,
  windowStep = 10000,
  vcf.df.SNPindex.filt
)

Arguments

meta

meta information stored inside the vcf file

chrList

list of chromosome IDs

chrID

chromosome ID of interest

windowSize

window size (default=1000000)

windowStep

window step (default=10000)

vcf.df.SNPindex.filt

filtered SNP-index data frame

Details

Firstly, the length of the chosen chromosome is extracted from the meta information store inside the VCF file. To this end, the lines from the meta information which contain a sequence of key words (including ID and length) that make them unique from others, are extracted into a character vector. For each element of that vector, the characters before and after the chromosome length are removed, so the final character vector contains all chromosomes lengths in the way they are named and ordered in the VCF file (which matches the order of the elements in chrList).

The length of the chosen chromosome is then found by extracting the index of the chrIDin chrListsince that index will be equal to the index of the length of the chosen chromosome in the vector of lengths.

Once the length of the chromosome has been extracted, the start, mid and stop positions of each window are calculated across the chromosome length. The stop position of each window is calculated by adding the window size (either the default value or the one specified by the user) to the start position of the window. The start position of the first window is 1 and the start position of the following window is calculated by adding the step size to the start position of the previous window.

The windows where the stop position falls past the chromosome length are removed. The start, mid and stop positions of each window in the chromosome are stored in a data frame.

Then, the input data frame, which is the one returned by filter_SNPindex(), is filtered by the chosen chromosome to restrict the data frame to variants specific to that chromosome.

Next, for each window, only the SNP-indexes of the variants comprised between the initial and final position of the window are considered, and the mean SNP-index of the variants between those positions is calculated for both wild-type and mutant bulks. The mean SNP-index of both the wild-type bulk and the mutant bulk in each window are added in separate columns to the data frame containing the start, mid and stop positions. In case no variants are found in a specific window, 0.5 will be the value added to the data frame as the mean SNP-index - to avoid gaps in the plot corresponding to "Not a Number" (NaN) values. The final data frame with the window positions and mean SN-indexes gets returned.

Value

Data frame containing start, mid and stop positions of each window, as well as the corresponding mean SNP-index value for each bulk.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## Default parameters
SNPindex_windows <- slidingWindow(meta=vcf_list$meta, 
                                  chrList=chromList, 
                                  chrID="SL4.0ch03",  
                                  vcf.df.SNPindex.filt=vcf_df_SNPindex_filt)
## Custom parameters
SNPindex_windows <- slidingWindow(meta=vcf_list$meta, 
                                  chrList=chromList, 
                                  chrID="SL4.0ch03", 
                                  windowSize=2000000, 
                                  windowStep=20000, 
                                  vcf.df.SNPindex.filt=vcf_df_SNPindex_filt)

EG-lisy/BSAvis documentation built on Dec. 17, 2021, 5:38 p.m.