clusterMultiMappingReads_stringent: Clustering of contigs

Description Usage Arguments Value

View source: R/RNASeqUtility.R


Contigs that contain reads that match to multiple locations are clustered to the contig with the highest read count. If more contigs have the same read count, the longest of them is chosen and then the first one in the list. Starting with the first contig (sorted by read count and length) in the unclustered list: All contigs containing a read of the chosen contig are reported. If these contigs are composed by x representative contig, they are removed from the list, including the representative contig. The representative contig is stored in the clustered list. Then the next contig of the unclustered list is chosen, until the unclustered list is empty.


clusterMultiMappingReads_stringent(contigForCountingGR_unclustered, allReads,
  readCompositionIdentity = 0.95)



GRanges object of unclustered contigs


GRanges object of the single reads (obtained from each sample in bed format with bamToBed)


The percentage of read similarity of contigs that need to be reached in order to get clustered (default 0.95) When 0 is specified than one shared read leads to clustering/removing of non-representative contigs.


clustered GRanges object

SimonSchafferer/RNASeqUtility documentation built on May 10, 2017, 1:41 p.m.