Neuron similarity, search and clustering tools

The main entry point for similarity and search functions is
`nblast`

. Traced neurons will normally be converted to the
`dotprops`

format for search. When multiple neurons are
compared they should be in a `neuronlist`

object.

The current nblast version (2) depends on a scoring matrix. Default
matrices trained using *Drosophila* neurons in the FCWB template brain
space are distributed with this package (see `smat.fcwb`

); see
**Scoring Matrices** section below for creating new scoring matrices.

`nblast`

makes use of a more flexible but more complicated function
`NeuriteBlast`

which includes several additional options. The function
`WeightedNNBasedLinesetMatching`

provides the primitive functionality
of finding the nearest neighbour distances and absolute dot products for
two sets of segments. Neither of these functions are intended for end use.

Calculating all by all similarity scores is facilitated by the
`nblast_allbyall`

function which can take either a neuronlist
as input or a character vector naming (a subset) of neurons in a (large)
neuronlist. The neuronlist containing the input neurons should be resident
in memory i.e. not the `neuronlistfh`

Once an all by all similarity score matrix is available it can be used as
the input to a variety of clustering algorithms. `nhclust`

provides a convenient wrapper for R's hierarchical clustering function
`hclust`

. If you wish to use another clustering function, then
you can use the `sub_dist_mat`

to convert a raw similarity
score matrix into a normalised distance matrix (or R `dist`

object) suitable for clustering. If you need a similarity matrix or want to
modify the normalisation then you can use `sub_score_mat`

.

Note tha raw nblast scores are not symmetric (i.e. S(A,B) is not equal to
S(B,A)) so before clustering we construct a symmetric similarity/distance
matrix `1/2 * ( S(A,B)/S(A,A) + S(B,A)/S(B,B) )`

. See
`sub_score_mat`

's documentation for details.

Although nblast is fast and can be parallelised, it makes sense to cache to
disk all by all similarity scores for a group of neurons that will be
subject to repeated clustering or other analysis. The matrix can simply be
saved to disk and then reloaded using base R functions like
`save`

and `load`

. `sub_score_mat`

and
`sub_dist_mat`

can be used to extract a subset of scores from
this raw score matrix. For large matrices, the `bigmemory`

or
`ff`

packages allow matrices to be stored on disk and portions loaded
into memory on demand. `sub_score_mat`

and
`sub_dist_mat`

work equally well for regular in-memory matrices
and these disk-backed matrices.

To give an example, for 16,129 neurons from the flycircuit.tw dataset, the
260,144,641 comparisons took about 250 hours of compute time (half a day on
~20 cores). When saved to disk as single precision (i.e. 4 bytes per score)
`ff`

matrix they occupy just over 1Gb.

The nblast algorithm depends on appropriately calibrated scoring matrices.
These encapsulate the log odds ratio that a pair of segments come from two
structurally related neurons rather than two unrelated neurons, given the
observed distance and absolute dot product of the two segments. Scoring
matrices can be constructed using the `create_scoringmatrix`

function, supplying a set of matching neurons and a set of non-matching
neurons. See the `create_scoringmatrix`

documentation for links to
lower-level functions that provide finer control over construction of the
scoring matrix.

There is one package option `nat.nblast.defaultsmat`

which is
`NULL`

by default, but could for example be set to one of the scoring
matrices included with the package such as code"smat.fcwb" or to a new
user-constructed matrix.

Costa, M., Ostrovsky, A.D., Manton, J.D., Prohaska, S., and Jefferis, G.S.X.E. (2014). NBLAST: Rapid, sensitive comparison of neuronal structure and construction of neuron family databases. Biorxiv preprint. doi: 10.1101/006346.

`nblast`

, `smat.fcwb`

,
`nhclust`

, `sub_dist_mat`

,
`sub_score_mat`

, `create_scoringmatrix`

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.