Description Usage Arguments Details Value References See Also Examples
An implementation of the complete pipeline of the CHICKN algorithm.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
Data |
A Filebacked Big Matrix n x N. |
K |
Number of cluster at each call of clustering method. Default is 2. |
k_total |
An upper bound of the total number of clusters. |
K_W1 |
A Filebacked Big Matrix. Nystrom kernel matrix s \times N,
where N is the number of signals in the training collection and s is the Nystrom sample size.
By default is NULL and it is generated using |
kernel_type |
Kernel function type c('Gaussian', 'Laplacian'). |
distance_type |
Distance function type. The available types are Wasserstein-1 ('W1') and Euclidean ('Euclide'). The default value is 'W1'. |
Freq |
A frequency matrix m x n with frequency vectors in rows.
If NULL, the frequency vectors are generated by |
ncores |
Number of cores. Default is 2. |
max_neighbors |
Number of neighbors used to estimate the kernel parameter |
nblocks |
Number of blocks, on which the regression is performed. Default is 32. |
N0 |
Number of data vectors used for the variance estimation in |
max_Nsize |
Number of neighbors used to compute consensus chromatograms. |
DoPreimage |
logical that controls whether to compute the consensus chromatograms. Default is TRUE. |
DIR_output |
A directory to save the results. |
DIR_tmp |
A directory for temporal files. |
BIG |
logical parameter that controls whether the resulting consensus chromatograms are stored as a Filebacked Big Matrix ('Centroid_preimage.bk'). Default is FALSE. |
verbose |
logical that indicates whether dysplay the processing steps. |
... |
Additional arguments passed on to |
CHICKN_W1
compresses the data by computing a Nystrom kernel approximation and
applying the sketching operator from \insertCiteDBLP:journals/corr/KerivenBGP16chickn.
See Nystrom_kernel
and Sketch
functions.
Then clusters are recovered by operating on the compressed data version.
It can use the kernel function based on the
Wasserstein-1 or the Euclidean distances. It generates in DIR_output
directory the following files:
'Cluster_assign_out.bk' is a Filebacked Big Matrix N x maxLevel
+1, which stores the cluster assignment at each hierarchical level.
'Centroids_out.bk' is a Filebacked Big Matrix with the resulting cluster centroids in columns.
A list with the following attributes:
gamma
is the estimated kernel parameter.
CompressedData
is the Nystrom kernel matrix.
sigma
is the estimated variance.
Frequency
is the frequency matrix m x n.
Clusters
is the cluster assignment.
Permiakova O, Guibert R, Kraut A, Fortin T, Hesse AM, Burger T (2020) "CHICKN: Extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis." BMC Bioinformatics (under revision).
Nystrom_kernel
, GenerateFrequencies
,
hcc_parallel
, Preimage
, bigstatsr
1 2 3 4 5 6 7 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.