assoc_prepare: Generate frequency table for association measure

Description Usage Arguments Value Examples

View source: R/assoc_prepare.R

Description

The function to produce frequency table required as input for association measures for collocations

Usage

1
2
3
4
5
6
7
assoc_prepare(
  colloc_out = NULL,
  window_span = NULL,
  per_corpus = FALSE,
  stopword_list = NULL,
  float_digits = 3
)

Arguments

colloc_out

The output list of colloc_leipzig.

window_span

Specify the window and span combination of the collocates to focus on for the measure (e.g., "r1" for 1 word to the right of the node; or a set of values as in c("r1", "r2")). The default is NULL.

per_corpus

Logical; whether to process the collocates per corpus file (TRUE) or aggregate the data across the corpus files (FALSE).

stopword_list

Character vectors containing list of stopwords to be removed from the collocation measures.

float_digits

The numeric vector for floating digits of the expected frequency values. The default is 3.

Value

A tbl_df of two columns. One of them is nested columns with input-data for row-wise association measure calculation (e.g., the Fisher-Exact Test with collex_fye).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
 # Apology that I commented the examples due to error in parsing
 # the examples section for assoc_prepare and colloc_leipzig
 # when building the website using pkgdown.
 # I still cannot get solution to this issue.

 # If the colloc_leipzig output is stored as list on console, run as follows
 #assoc_tb <- assoc_prepare(colloc_out = colloc_leipzig_output,
 #                          window_span = "r1",
 #                          per_corpus = FALSE,
 #                          stopword_list = NULL,
 #                          float_digits = 3)

# If the output of colloc_leipzig is saved into disk
# supply the vector of output file names
## Example of running colloc_leipzig with "save_interim = TRUE"
# outfiles <- colloc_leipzig(leipzig_path = c('corp_path1.txt', 'corp_path2.txt'),
#                            pattern = "mengatakan",
#                            window = "r",
#                            span = 3,
#                            save_interim = TRUE # save interim results to disk
#                            freqlist_output_file = "~/Desktop/out_1_freqlist.txt",
#                            colloc_output_file = "~/Desktop/out_2_collocates.txt",
#                            corpussize_output_file = "~/Desktop/out_3_corpus_size.txt",
#                            search_pattern_output_file = "~/Desktop/out_4_search_pattern.txt"
#                            )

## Example of supplying colloc_out with "outfiles"
#assoc_tb <- assoc_prepare(colloc_out = outfiles,
#                           window_span = "r1",
#                           per_corpus = FALSE,
#                          stopword_list = stopwords,
#                           float_digits = 3)

gederajeg/collogetr documentation built on April 16, 2020, 11:58 a.m.