clusterMultiMappingReads_stringent: Clustering of contigs

Description Usage Arguments Value

View source: R/RNASeqUtility.R

Description

Contigs that contain reads that match to multiple locations are clustered to the contig with the highest read count. If more contigs have the same read count, the longest of them is chosen and then the first one in the list. Starting with the first contig (sorted by read count and length) in the unclustered list: All contigs containing a read of the chosen contig are reported. If these contigs are composed by x representative contig, they are removed from the list, including the representative contig. The representative contig is stored in the clustered list. Then the next contig of the unclustered list is chosen, until the unclustered list is empty.

Usage

1
2
clusterMultiMappingReads_stringent(contigForCountingGR_unclustered, allReads,
  readCompositionIdentity = 0.95)

Arguments

contigForCountingGR_unclustered

GRanges object of unclustered contigs

allReads

GRanges object of the single reads (obtained from each sample in bed format with bamToBed)

readCompositionIdentity

The percentage of read similarity of contigs that need to be reached in order to get clustered (default 0.95) When 0 is specified than one shared read leads to clustering/removing of non-representative contigs.

Value

clustered GRanges object


SimonSchafferer/RNASeqUtility documentation built on May 10, 2017, 1:41 p.m.