tokens_replace: Replace types in tokens object

Description Usage Arguments Examples

View source: R/tokens_replace.R

Description

Substitute token types based on vectorized one-to-one matching. Since this function is created for lemmatization or user-defined stemming, it does not support multi-word features, or glob and regex patterns. Please use tokens_lookup with exclusive = FALSE for substitutions of more complex patterns.

Usage

1
2
tokens_replace(x, pattern, replacement = NULL, case_insensitive = TRUE,
  verbose = quanteda_options("verbose"))

Arguments

x

tokens object whose token elements will be replaced

pattern

a character vector or dictionary. See pattern for more details.

replacement

if pattern is a character vector, then replacement must be character vector of equal length, for a 1:1 match. If pattern is a dictionary, then replacement should not be used.

case_insensitive

ignore case when matching, if TRUE

verbose

print status messages if TRUE

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
toks <- tokens(data_corpus_irishbudget2010)

# lemmatization
infle <- c("foci", "focus", "focused", "focuses", "focusing", "focussed", "focusses")
lemma <- rep("focus", length(infle))
toks2 <- tokens_replace(toks, infle, lemma)
kwic(toks2, "focus*")

# stemming
type <- types(toks)
stem <- char_wordstem(type, "porter")
toks3 <- tokens_replace(toks, type, stem, case_insensitive = FALSE)
identical(toks3, tokens_wordstem(toks, "porter"))

Example output

Package version: 1.3.13
Parallel computing: 1 of 1 threads used.
See https://quanteda.io for tutorials and examples.

Attaching package: 'quanteda'

The following object is masked from 'package:utils':

    View

                                                                           
   [2010_BUDGET_01_Brian_Lenihan_FF, 1092]            . A key feature and |
   [2010_BUDGET_01_Brian_Lenihan_FF, 5477] , our investment projects will |
  [2010_BUDGET_02_Richard_Bruton_FG, 2133]        budget and see that the |
     [2010_BUDGET_03_Joan_Burton_LAB, 927]         therefore, be the main |
    [2010_BUDGET_03_Joan_Burton_LAB, 3592]        the budget had just one |
    [2010_BUDGET_03_Joan_Burton_LAB, 4115]           county, however, the |
    [2010_BUDGET_03_Joan_Burton_LAB, 4997]           That is too narrow a |
    [2010_BUDGET_03_Joan_Burton_LAB, 5210]    economic revival that has a |
   [2010_BUDGET_04_Arthur_Morgan_SF, 3141]          new jobs. Instead the |
   [2010_BUDGET_04_Arthur_Morgan_SF, 3721]        what should be the main |
   [2010_BUDGET_04_Arthur_Morgan_SF, 6796]  must be completely redrawn to |
     [2010_BUDGET_05_Brian_Cowen_FF, 3114]         . The scheme will also |
     [2010_BUDGET_05_Brian_Cowen_FF, 3786] to maximise the efficiency and |
     [2010_BUDGET_05_Brian_Cowen_FF, 4466]       place, with a particular |
 [2010_BUDGET_07_Kieran_ODonnell_FG, 1953]  coherent plan which should be |
  [2010_BUDGET_08_Eamon_Gilmore_LAB, 2628]         " More recent studies, |
                                          
 focus | of today's budget is regaining   
 focus | on labour-intensive areas such as
 focus | has been on the front            
 focus | of policy. The Labour            
 focus | and that was just too            
 focus | of the feature is not            
 focus | . There is a character           
 focus | other than the dream of          
 focus | was on rates of pay              
 focus | of economic recovery, which      
 focus | on the more labour intensive     
 focus | on providing information, via    
 focus | of our investment and ensure     
 focus | on some of the worst             
 focus | on jobs. The Taoiseach           
 focus | on country cases, provide        
[1] TRUE

quanteda documentation built on Nov. 20, 2018, 1:04 a.m.