R/orderCross.R

Defines functions clusterOrderCross orderCross

Documented in clusterOrderCross orderCross

#' @title Order markers
#' 
#' Order markers within linkage groups using simulated annealing
#' 
#' @description This function orders markers within linkage groups using a simulated annealing heuristic. The underlying implementation is a C++ reimplementation of the fortran code \code{arsa.f} from the \code{seriation} package. The reimplementation allows for multithreading, and is therefore much faster. It also fixes a couple of bugs in the original code. 
#'
#' Parameters \code{cool} and \code{tmin} are standard simulated annealing parameters, and decreasing \code{cool} increases the amount of computation effort. Parameter \code{nReps} gives the number of independent replications of the simulated annealing algorithm to be used. The result of the best replication is then chosen. 
#'
#' Parameter \code{maxMove} gives the maximum number of positions by which to shift a marker, as part of a step within the simulated annealing algorithm. The computational effort of determining whether a proposed move of a particular marker should be accepted, depends on the number of positions by which it is moved. So if the ordering is already approximately correct at the start of the algorithm, proposals that move markers by large distances are expensive, and also unneccessary. These types of proposed changes to the ordering can be avoided by setting \code{maxMove} to some positive value, maybe one tenth of the number of markers. 
#'
#' Parameter \code{effortMultiplier} simply increases or decreases the amount of computational effort. A value of 0.5 requires half as much effort, a value of 1.0 uses the default amount of effort, and a value of 2.0 requires twice as much computational effort. 
#'
#' Parameter \code{randomStart} controls the starting point of each replication of the algorithm. If this parameter is TRUE, then every replication starts form an independent random ordering. If this parameter is FALSE, then every replication starts from the marker ordering given in the input object. 
#' @param mpcrossLG An object of class \code{mpcrossLG}, containing genetic data and linkage groups. 
#' @param cool Rate of cooling
#' @param tmin Minimum temperature
#' @param nReps Number of independent replications of the simulated annealing algorithm
#' @param maxMove Maximum number of positions by which to shift a single marker, as part of the simulated annealing. A value of zero indicates no limit. 
#' @param effortMultiplier Multiplier for the amount of computational effort
#' @param randomStart If TRUE, start from the current ordering
#' @param verbose If TRUE, generate more detailed output
#' @return An object of class \code{mpcrossLG}, identical to the input except with the markers rearranged.
#' @export
orderCross <- function(mpcrossLG, cool = 0.5, tmin = 0.1, nReps = 1, maxMove = 0, effortMultiplier = 1, randomStart = TRUE, verbose = FALSE)
{
	if(!is(mpcrossLG, "mpcrossLG"))
	{
		stop("Input object must have linkage groups")
	}
	if(is.null(mpcrossLG@rf) && is.null(mpcrossLG@lg@imputedTheta))
	{
		stop("Input mpcrossLG object did not contain recombination fraction information")
	}
	if(nMarkers(mpcrossLG) == 1)
	{
		return(mpcrossLG)
	}
	mpcrossLG <- as(mpcrossLG, "mpcrossLG")
	permutation <- .Call("order", mpcrossLG, mpcrossLG@lg@allGroups, cool, tmin, nReps, maxMove, effortMultiplier, randomStart, verbose, PACKAGE="mpMap2")
	return(subset(mpcrossLG, markers = permutation))
}
#' @title Group markers into blocks and arrange those blocks
#' @description Group markers into blocks and arrange those blocks
#' @details In some cases the number of markers is too large to reorder all markers on a chromosome. However, the problem becomes more tractable if the markers are already in a roughly correct ordering to start with. This function is intended to generate that roughly accurate ordering, and then subsequenty local reordering using \code{\link{orderCross}} can be applied to generate a final marker ordering. 
#'
#' The rough ordering is generated by forming some number of groups of markers, using hierarchical clustering. A consensus disimilarity between every group of markers is formed, and this is used to order the groups. That is, we decide whether the markers will be ordered as group 1, group 2, group 3, etc, or group 2, group 1, group 3, etc. The ordering of the markers within each group is unchanged. 
#' @param mpcrossLG An object of class \code{mpcrossLG}, containing genetic data and linkage groups. 
#' @param cool Rate of cooling
#' @param tmin Minimum temperature
#' @param nReps Number of independent replications of the simulated annealing algorithm
#' @param maxMove Maximum number of positions by which to shift a single marker, as part of the simulated annealing. A value of zero indicates no limit. 
#' @param effortMultiplier Multiplier for the amount of computational effort
#' @param randomStart If TRUE, start from the current ordering
#' @param nGroups The number of groups to form using hierarchical clustering
#' @return An object of class \code{mpcrossLG}, identical to the input except with the markers rearranged.
#' @export
clusterOrderCross <- function(mpcrossLG, cool = 0.5, tmin = 0.1, nReps = 1, maxMove = 0, effortMultiplier = 1, randomStart = TRUE, nGroups)
{
	if(!is(mpcrossLG, "mpcrossLG"))
	{
		stop("Input object must have linkage groups")
	}
	if(is.null(mpcrossLG@rf) && is.null(mpcrossLG@lg@imputedTheta))
	{
		stop("Input mpcrossLG object did not contain recombination fraction information")
	}
	if(missing(nGroups) || nGroups < 1)
	{
		stop("Input nGroups must be a positive integer")
	}
	mpcrossLG <- as(mpcrossLG, "mpcrossLG")
	orderedMarkers <- vector(mode = "list")
	for(group in mpcrossLG@lg@allGroups)
	{
		groupAsCharacter <- as.character(group)
		if(!is.null(mpcrossLG@lg@imputedTheta))
		{
			underlying <- mpcrossLG@lg@imputedTheta[[groupAsCharacter]]
		}
		else
		{
			currentGroupCross <- subset(mpcrossLG, groups = group)
			currentGroupCross <- impute(currentGroupCross)
			underlying <- currentGroupCross@lg@imputedTheta[[1]]
		}
		if(nGroups >= length(underlying@markers))
		{
			orderedMarkers[[groupAsCharacter]] <- underlying@markers
		}
		else
		{
			distMatrix <- .Call("rawSymmetricMatrixToDist", underlying, PACKAGE="mpMap2")
			clustered <- fastcluster::hclust(distMatrix, method = "average")
			groupings <- cutree(clustered, nGroups)
			dissimilarityMatrix <- .Call("constructDissimilarityMatrix", underlying, groupings, PACKAGE="mpMap2")
	
			diag(dissimilarityMatrix) <- 0
			orderingOfGroups <- .Call("arsa", nGroups, dissimilarityMatrix[upper.tri(dissimilarityMatrix, diag = TRUE)], cool, tmin, nReps, maxMove, effortMultiplier, randomStart, PACKAGE="mpMap2")
			ordering <- unlist(lapply(1:length(orderingOfGroups), function(x) which(groupings == orderingOfGroups[x]+1)))
			orderedMarkers[[groupAsCharacter]] <- underlying@markers[ordering]
		}
	}
	return(subset(mpcrossLG, markers = unlist(orderedMarkers)))
}
rohan-shah/mpMap2 documentation built on July 21, 2020, 8:58 p.m.