dtm: Base document-term matrix

Description Details

Description

The input data for the LDA model is a document-term matrix. The rows in this matrix correspond to the documents and the columns to the terms. The entry f_i,j indicates the frequency of jth term in the ith document. The number of rows is equal to the size of the corpus and the number of columns to the size of the vocabulary.

Details

dtm has 15,485 rows and 52,504 columns. It describes the original document-term matrix as obtained by applying the corpus2dtm function to the original corpus consisting of 15,485 documents relating to final decisions (sentences) in civil matters delivered by the Italian Supreme Court during the year 2013. Each row of this base matrix represents a document as a simple bag of words after removing punctuation, numbers, stopwords and white spaces.


paolofantini/Supreme documentation built on May 24, 2019, 6:14 p.m.