View source: R/motifs_search.R
motifs_search | R Documentation |
The 'motifs_search' function identifies and ranks motifs within a set of curves based on their frequencies and dissimilarity measures. It processes candidate motifs clustered from hierarchical clustering results, selects optimal motifs within each cluster, and determines their occurrences in the original curves. The function supports parallel processing to enhance computational efficiency and offers flexibility in handling different dissimilarity metrics and motif selection criteria.
motifs_search(
cluster_candidate_motifs_results,
R_all = cluster_candidate_motifs_results$R_all,
R_m = NULL,
different_R_m_finding = FALSE,
R_m_finding = NULL,
use_real_occurrences = FALSE,
length_diff = Inf,
worker_number = NULL
)
cluster_candidate_motifs_results |
A list containing the output from the 'cluster_candidate_motifs' function. This list must include elements such as:
|
R_all |
A numeric value representing the global radius used to cut the dendrogram, ensuring that clusters are at least twice this radius apart. This parameter defines the grouping of motifs into clusters. |
R_m |
A numeric vector containing group-specific radii used to identify motif occurrences within each cluster. The length of this vector must match the number of clusters obtained by cutting the dendrogram at a height of '2 * R_all'. If 'NULL', the function automatically determines 'R_m' for each group based on the distances between motifs within the same cluster and all curves. |
different_R_m_finding |
A logical value indicating whether to use a different radius ('R_m_finding') for finding motif occurrences compared to the initial radius ('R_m'). If 'TRUE', 'R_m_finding' is used; otherwise, 'R_m' is employed. This allows for separate tuning of motif occurrence detection. |
R_m_finding |
A numeric vector containing group-specific radii used specifically for finding motif occurrences when 'different_R_m_finding' is set to 'TRUE'. The length of this vector must match the number of clusters obtained by cutting the dendrogram at a height of '2 * R_all'. If 'NULL', 'R_m_finding' is determined automatically for each group based on distances between motifs within the same cluster and all curves. |
use_real_occurrences |
A logical value indicating whether to compute real occurrences of candidate motifs within the curves. If 'TRUE', the function calculates actual frequencies and mean dissimilarities for motif selection, providing more accurate results at the cost of increased computation time. If 'FALSE', it uses approximate frequencies and mean dissimilarities for faster execution. Defaults to 'FALSE'. |
length_diff |
A numeric value specifying the minimum percentage difference in length required among motifs within the same group to retain multiple motifs. This parameter ensures diversity in motif selection by preventing motifs of similar lengths from being selected simultaneously. It is defined as a percentage relative to the length of the most frequent motif. Defaults to 'Inf', meaning no additional motifs are selected based on length differences. |
worker_number |
An integer indicating the number of CPU cores to utilize for parallel processing. By default, the function uses one less than the total number of available cores ('detectCores() - 1'). Setting 'worker_number = 1' forces the function to run sequentially without parallelization. If 'NULL', the function automatically determines the optimal number of workers based on the system's available cores. |
The 'motifs_search' function operates through the following steps:
**Parallelization Setup**: Determines the number of worker cores to use based on 'worker_number'. If 'worker_number > 1', it initializes a cluster for parallel processing.
**Input Preparation**: Depending on the dissimilarity metric ('d0_L2', 'd1_L2', or 'd0_d1_L2'), it prepares the data structures 'Y' and 'V' for processing.
**Dendrogram Cutting**: Cuts the hierarchical clustering dendrogram at a height of '2 * R_all' to define clusters of motifs.
**Radius Determination**: If 'R_m' or 'R_m_finding' is not provided, the function calculates these radii for each cluster based on motif distances and K-Nearest Neighbors (KNN) classification.
**Candidate Motif Selection**: Depending on 'use_real_occurrences', the function either computes real occurrences and uses actual frequencies and mean dissimilarities to select motifs, or it uses approximate measures for faster processing.
**Motif Filtering**: Within each cluster, motifs are ranked based on their frequency and mean dissimilarity. Additional motifs can be selected if their lengths differ sufficiently from the most frequent motif, as defined by 'length_diff'.
**Output Compilation**: The selected motifs and their associated properties are compiled into a comprehensive list for further analysis or visualization.
A list containing:
A list of selected motifs derived from 'Y0'.
A list of selected motifs derived from 'Y1' (if applicable).
A numeric vector representing the real lengths of the selected motifs.
A list detailing the occurrences of each selected motif within the curves.
A numeric vector indicating the real frequencies of each selected motif.
A numeric vector representing the average dissimilarity of each selected motif.
A list of matrices corresponding to the original curves, as provided in 'cluster_candidate_motifs_results'.
A list of matrices corresponding to the derivatives of the curves (if applicable), as provided in 'cluster_candidate_motifs_results'.
A numeric vector containing the radii associated with each selected motif.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.