View source: R/calculate_related_studies.R
calculate_related_studies | R Documentation |
Processes study summary text to identify clusters of related studies. Calculates tf-idf values for 1 and 2 length ngrams, and clusters studies using the ward.D clustering method. Adds results as annotations to the studies.
calculate_related_studies(
study_table_id,
n_clust = NULL,
n_k = NULL,
dry_run = TRUE
)
study_table_id |
The synapse id of the portal study table. Must have write access. |
n_clust |
Target number of clusters to generate using hierarchical clustering.
In practice, the number of total summaries divided by 3 is a good starting point (100 studies = 33 clusters).
If given |
n_k |
Generate target number of most closely related studies using k-nearest-neighbors instead;
since the number of desired related studies is specified, this may be preferable over using |
dry_run |
Default = TRUE. Skips annotating the studies and instead prints study tibble. |
If dry_run == T, returns study tibble and skips upload.
## Not run:
result1 <- calculate_related_studies(study_table_id = "syn16787123",
n_clust = 40,
dry_run = T)
result2 <- calculate_related_studies(study_table_id = "syn16787123",
n_k = 4,
dry_run = T)
x <- lapply(result1$relatedStudies, jsonlite::fromJSON)
y <- lapply(result2$relatedStudies, jsonlite::fromJSON)
# Compare
mapply(function(x, y) sum(y %in% x), x, y)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.