get_most_similar: Most Similar Documents

Description Usage Arguments Details Value See Also Examples

Description

get_most_similar Finds the documents with the highest cosine-similarity score among their tf-idf vectors.

Usage

1
get_most_similar(tfidf_vector, abstracts_dataframe)

Arguments

abstracts_dataframe

A DataFrame returned from the get_pubmed_abstracts function. Must already have tf-idf scores calculated from append_tfidf.

tf-idf_vector

A numeric vector of tf-idf weights that has the same number of columns as those in the abstracts DataFrame.

Details

append_tfidf Given a dataframe returned from the get_pubmed_abstracts function that already has tf-idf scores calculated from append_tfidf, and a vector of tf-idf scores that have the same number of columns, return the indices of the documents that have the highest cosine-similarity scores in descending order.

Value

A DataFrame.

See Also

get_pubmed_abstracts append_tfidf

Examples

1
indices <-get_most_similar(abstracts_df_with_tfidf,tfidf_vector)

jlee118/tmine documentation built on May 15, 2019, 9:14 p.m.