Description Usage Arguments Value Author(s) See Also Examples
View source: R/1_createScicloudList.R
The first function to be called to perform the analysis with
scicloud. It outputs a list of 3 components: metaMatrix, Tf_Idf and
wordList for further use with runAnalysis
.
The function takes all scientific papers as PDF files from the
"PDFs" folder in your working directory or any other specified directory to
create a metaMatrix. It then further pre-processes the text (e.g. by stemming
words with stemWords) and outputs a tf-idf matrix. As a last step, it fetches the papers' metadata
from Scopus for which you'll need an Elsevier API key
(https://dev.elsevier.com/index.jsp).
You have the option to limit the words to be used in the analysis with the
argument 'keepWordsFile'.
1 2 3 4 5 6 7 8 9 10 11 |
directory |
per default, the PDFs are expected to be in a folder named "PDFs", can be changed ad. lib. |
scopusList |
a finished metaMatrix from |
myAPIKey |
your private API key for communicating with the Scopus API. You can request one at https://dev.elsevier.com/. |
language |
this defines the language of the stopwords to be filtered.
The default is "SMART". Look at |
stemWords |
logical variable which is passed to processMetaDataMatrix. |
saveToWd |
a logical parameter whether or not to save the output of the
function to the working directory. This is especially useful for later
analysis steps. The file can be read in by using
|
ignoreWords |
a vector of words to be ignored which is passed to processMetaDataMatrix. |
keepWordsFile |
path to a .csv-file that specifies which words to keep
for the analysis. Accepts 0/1 behind each word or takes the words as
they are and disregards all other words for the analysis. If no word list is
provided, all words are used. |
generateWordlist |
logical, if set to |
Returns a list with the following components:
Tf_Idf
: the tf-idf document term matrix.
wordList
: a list of all words that have been used in the
analysis.
metaMatrix
: a matrix with 21 columns that contains
information (DOI, Year, Authors, etc.) and each pdf's full text
that has been pre-processed and filtered.
Information (Title, Abstract, Journal, etc.) are retrieved through the
Scopus API. Please note that without a proper API and a valid connection to
Scopus within a recognized network these information will not be retrieved
successfully
Creator of the scicloud workflow: Henrik von Wehrden,
henrik.von_wehrden@leuphana.de
Code by: Jia Yan Ng, Jia.Y.Ng@stud.leuphana.de,
Johann Julius Beeck, johann.j.beeck@stud.leuphana.de,
Lisa Gotzian, lisa.gotzian@stud.leuphana.de,
Prabesh Dhakal, prabesh.dhakal@stud.leuphana.de
First version of scicloud: Matthias Nachtmann, matthias.nachtmann@stud.leuphana.de
Other scicloud functions:
deleteRDS()
,
inspectScicloud()
,
runAnalysis()
,
searchScopus()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ## Not run:
### Workflow of performing analysis using scicloud
myAPIKey <- "YOUR_API_KEY"
# retrieving data from PDFs and Scorpus website using API
scicloudList <- createScicloudList(myAPIKey = myAPIKey)
# Run the analysis with a specified no. of cluster
scicloudAnalysis <- runAnalysis(scicloudList = scicloudList, numberOfClusters = 4)
# Generate a summary of the analysis
scicloudSpecs <- inspectScicloud(scicloudAnalysis)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.