R/get_mds_article_fits.R

Defines functions get_mds_article_fits

Documented in get_mds_article_fits

#' Multidimensional scaling for article similarity
#'
#' @param num_articles integer, number of articles to select from entire dataframe
#' @param num_clusters integer, number of clusters to generate
#' @param year_range vector, c(1947,1948) containing beginning to end year
#' @param input_df dataframe an articles dataframe containing article info and Similarity info (e.g., like the one generated by `get_search_article_similarities()`)
#' @param a_vectors matrix, containing the article vectors (e.g., `article_vectors`)
#'
#' @return dataframe, a modified version of `input_df` containing MDS coordinates and cluster information useful for plotting
#' @details columns are:
#' - `formatted_column` for printing abstract info HTML
#' - `title` abstract title
#' - `wrap_title` useful for plotly hover labels
#' - `year` abstract year
#' - `index` abstract index
#' - `Similarity` of search term to abstract
#' - `X` x coordinate from MDS
#' - `Y` y coodinate from MDS
#' - `cluster` from k-means clustering
#'
#' @export
#'
#' @examples
#' 
#' search <- c("president")
#' article_dataframe<-get_search_article_similarities(search,query_type=1)
#' MDS <- get_mds_article_fits(10,2,c(1947,1948),article_dataframe)
get_mds_article_fits <- function(num_articles,
                                 num_clusters,
                                 year_range,
                                 input_df,
                                 a_vectors=article_vectors){
  if(!is.null(input_df)){
    article_ids <- input_df %>%
      dplyr::filter(year >= year_range[1],
             year <= year_range[2]) %>%
      dplyr::slice(1:num_articles) %>%
      dplyr::select(index)
    # input_df[1:num_articles,]$index
    temp_article_matrix <- a_vectors[article_ids$index,]
    mdistance <- lsa::cosine(t(temp_article_matrix))
    fit <- cmdscale(1-mdistance,eig=TRUE, k=2)
    colnames(fit$points) <- c("X","Y")
    cluster <- kmeans(fit$points,num_clusters)
    input_df <- input_df %>%
      dplyr::filter(year >= year_range[1],
             year <= year_range[2]) %>%
      dplyr::slice(1:num_articles)
    input_df <- cbind(input_df,fit$points,
                      cluster=cluster$cluster)
    return(input_df)
  }
}
CrumpLab/RsemanticLibrarian documentation built on Nov. 11, 2019, 1:04 p.m.