neighborhoodSplit: Split gene groups by neighborhood synteny
In FindMyFriends: Microbial Comparative Genomics in R

Description Usage Arguments Value Methods (by class) See Also Examples

This function evaluates already created gene groups and splits the members into new groups based on the synteny of the flanking genes and the similarity of the sequences. In general the splitting is based on multiple stages that all gene pairs must pass in order to remain in the same group. First the link between the genes is removed if they are part of the same organism. Then the synteny of the flanking genes are assessed and if it doesn't passes the defined threshold the link between the gene pair is removed. Then the kmer similarity of the two sequences are compared and if below a certain threshold the link is removed. Lastly the length of the two sequences are compared and if below a certain threshold the link is removed. Based on this new graph cliques are detected and sorted based on the lowest within-clique sequence similarity and neighborhood synteny. The cliques are then added as new groups if the members are not already members of a new group until all members are part of a new group. This approach ensures that all members of the new groupings passes certain conditions when compared to all other members of the same group. After the splitting a refinement step is done where gene groups with high similarity and sharing a neighbor either up- or downstream are merged together to avoid spurius errors resulting from the initial grouping.

neighborhoodSplit(object, ...)

## S4 method for signature 'pgVirtualLoc'
neighborhoodSplit(object, flankSize,
  forceParalogues, kmerSize, lowerLimit, maxLengthDif,
  guideGroups = NULL, cdhitOpts = list())

`object`	A pgVirtualLoc subclass
`...`	parameters passed on.
`flankSize`	The number of flanking genes on each side of the gene to use for comparison.
`forceParalogues`	Force similarity of paralogue genes to 0
`kmerSize`	The length of kmers used for sequence similarity
`lowerLimit`	The lower limit of sequence similarity below which it will be set to 0
`maxLengthDif`	The maximum deviation in sequence length to allow. Between 0 and 1 it describes a percentage. Above 1 it describes a fixed length
`guideGroups`	An integer vector with prior grouping that, all else being equal, should be prioritized. Used internally.
`cdhitOpts`	A list of options to pass on to CD-Hit during the merging step. "l", "n" and "s"/"S" will be overridden.

An object with the same class as object containing the new grouping.

pgVirtualLoc: Neighborhood-based gene group splitting for pgVirtualLoc subclasses

Other group-splitting: kmerSplit

testPG <- .loadPgExample(geneLoc=TRUE, withGroups=TRUE)

# Too heavy to run
## Not run: 
testPG <- neighborhoodSplit(testPG, lowerLimit=0.75)

## End(Not run)