split_token: Split Tokens

Description Usage Arguments Value Examples

View source: R/split_token.R

Description

Split tokens.

Usage

1
2
3
4
5
6
7
split_token(x, ...)

## Default S3 method:
split_token(x, lower = TRUE, ...)

## S3 method for class 'data.frame'
split_token(x, text.var = TRUE, lower = TRUE, ...)

Arguments

x

A data.frame or character vector with tokens.

lower

logical. If TRUE the words are converted to lower case.

text.var

The name of the text variable. If TRUE split_token tries to detect the text column with tokens.

...

Ignored.

Value

Returns a list of vectors of tokens or an expanded data.table with tokens split apart.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
(x <- c(
    "Mr. Brown comes! He says hello. i give him coffee.",
    "I'll go at 5 p. m. eastern time.  Or somewhere in between!",
    "go there"
))
split_token(x)
split_token(x, lower=FALSE)

data(DATA)
split_token(DATA)
split_token(DATA, lower=FALSE)

## Larger data set
split_token(hamlet)

Example output

[1] "Mr. Brown comes! He says hello. i give him coffee."        
[2] "I'll go at 5 p. m. eastern time.  Or somewhere in between!"
[3] "go there"                                                  
[[1]]
 [1] "mr"     "."      "brown"  "comes"  "!"      "he"     "says"   "hello" 
 [9] "."      "i"      "give"   "him"    "coffee" "."     

[[2]]
 [1] "i'll"      "go"        "at"        "5"         "p"         "."        
 [7] "m"         "."         "eastern"   "time"      "."         "or"       
[13] "somewhere" "in"        "between"   "!"        

[[3]]
[1] "go"    "there"

[[1]]
 [1] "Mr"     "."      "Brown"  "comes"  "!"      "He"     "says"   "hello" 
 [9] "."      "i"      "give"   "him"    "coffee" "."     

[[2]]
 [1] "I'll"      "go"        "at"        "5"         "p"         "."        
 [7] "m"         "."         "eastern"   "time"      "."         "Or"       
[13] "somewhere" "in"        "between"   "!"        

[[3]]
[1] "go"    "there"

        person sex adult    state code element_id token_id
 1:        sam   m     0 computer   K1          1        1
 2:        sam   m     0       is   K1          1        2
 3:        sam   m     0      fun   K1          1        3
 4:        sam   m     0        .   K1          1        4
 5:        sam   m     0      not   K1          1        5
 6:        sam   m     0      too   K1          1        6
 7:        sam   m     0      fun   K1          1        7
 8:        sam   m     0        .   K1          1        8
 9:       greg   m     0       no   K2          2        1
10:       greg   m     0     it's   K2          2        2
11:       greg   m     0      not   K2          2        3
12:       greg   m     0        ,   K2          2        4
13:       greg   m     0     it's   K2          2        5
14:       greg   m     0     dumb   K2          2        6
15:       greg   m     0        .   K2          2        7
16:    teacher   m     1     what   K3          3        1
17:    teacher   m     1   should   K3          3        2
18:    teacher   m     1       we   K3          3        3
19:    teacher   m     1       do   K3          3        4
20:    teacher   m     1        ?   K3          3        5
21:        sam   m     0      you   K4          4        1
22:        sam   m     0     liar   K4          4        2
23:        sam   m     0        ,   K4          4        3
24:        sam   m     0       it   K4          4        4
25:        sam   m     0   stinks   K4          4        5
26:        sam   m     0        !   K4          4        6
27:       greg   m     0        i   K5          5        1
28:       greg   m     0       am   K5          5        2
29:       greg   m     0  telling   K5          5        3
30:       greg   m     0      the   K5          5        4
31:       greg   m     0    truth   K5          5        5
32:       greg   m     0        !   K5          5        6
33:      sally   f     0      how   K6          6        1
34:      sally   f     0      can   K6          6        2
35:      sally   f     0       we   K6          6        3
36:      sally   f     0       be   K6          6        4
37:      sally   f     0  certain   K6          6        5
38:      sally   f     0        ?   K6          6        6
39:       greg   m     0    there   K7          7        1
40:       greg   m     0       is   K7          7        2
41:       greg   m     0       no   K7          7        3
42:       greg   m     0      way   K7          7        4
43:       greg   m     0        .   K7          7        5
44:        sam   m     0        i   K8          8        1
45:        sam   m     0 distrust   K8          8        2
46:        sam   m     0      you   K8          8        3
47:        sam   m     0        .   K8          8        4
48:      sally   f     0     what   K9          9        1
49:      sally   f     0      are   K9          9        2
50:      sally   f     0      you   K9          9        3
51:      sally   f     0  talking   K9          9        4
52:      sally   f     0    about   K9          9        5
53:      sally   f     0        ?   K9          9        6
54: researcher   f     1    shall  K10         10        1
55: researcher   f     1       we  K10         10        2
56: researcher   f     1     move  K10         10        3
57: researcher   f     1       on  K10         10        4
58: researcher   f     1        ?  K10         10        5
59: researcher   f     1     good  K10         10        6
60: researcher   f     1     then  K10         10        7
61: researcher   f     1        .  K10         10        8
62:       greg   m     0      i'm  K11         11        1
63:       greg   m     0   hungry  K11         11        2
64:       greg   m     0        .  K11         11        3
65:       greg   m     0    let's  K11         11        4
66:       greg   m     0      eat  K11         11        5
67:       greg   m     0        .  K11         11        6
68:       greg   m     0      you  K11         11        7
69:       greg   m     0  already  K11         11        8
70:       greg   m     0        ?  K11         11        9
        person sex adult    state code element_id token_id
        person sex adult    state code element_id token_id
 1:        sam   m     0 Computer   K1          1        1
 2:        sam   m     0       is   K1          1        2
 3:        sam   m     0      fun   K1          1        3
 4:        sam   m     0        .   K1          1        4
 5:        sam   m     0      Not   K1          1        5
 6:        sam   m     0      too   K1          1        6
 7:        sam   m     0      fun   K1          1        7
 8:        sam   m     0        .   K1          1        8
 9:       greg   m     0       No   K2          2        1
10:       greg   m     0     it's   K2          2        2
11:       greg   m     0      not   K2          2        3
12:       greg   m     0        ,   K2          2        4
13:       greg   m     0     it's   K2          2        5
14:       greg   m     0     dumb   K2          2        6
15:       greg   m     0        .   K2          2        7
16:    teacher   m     1     What   K3          3        1
17:    teacher   m     1   should   K3          3        2
18:    teacher   m     1       we   K3          3        3
19:    teacher   m     1       do   K3          3        4
20:    teacher   m     1        ?   K3          3        5
21:        sam   m     0      You   K4          4        1
22:        sam   m     0     liar   K4          4        2
23:        sam   m     0        ,   K4          4        3
24:        sam   m     0       it   K4          4        4
25:        sam   m     0   stinks   K4          4        5
26:        sam   m     0        !   K4          4        6
27:       greg   m     0        I   K5          5        1
28:       greg   m     0       am   K5          5        2
29:       greg   m     0  telling   K5          5        3
30:       greg   m     0      the   K5          5        4
31:       greg   m     0    truth   K5          5        5
32:       greg   m     0        !   K5          5        6
33:      sally   f     0      How   K6          6        1
34:      sally   f     0      can   K6          6        2
35:      sally   f     0       we   K6          6        3
36:      sally   f     0       be   K6          6        4
37:      sally   f     0  certain   K6          6        5
38:      sally   f     0        ?   K6          6        6
39:       greg   m     0    There   K7          7        1
40:       greg   m     0       is   K7          7        2
41:       greg   m     0       no   K7          7        3
42:       greg   m     0      way   K7          7        4
43:       greg   m     0        .   K7          7        5
44:        sam   m     0        I   K8          8        1
45:        sam   m     0 distrust   K8          8        2
46:        sam   m     0      you   K8          8        3
47:        sam   m     0        .   K8          8        4
48:      sally   f     0     What   K9          9        1
49:      sally   f     0      are   K9          9        2
50:      sally   f     0      you   K9          9        3
51:      sally   f     0  talking   K9          9        4
52:      sally   f     0    about   K9          9        5
53:      sally   f     0        ?   K9          9        6
54: researcher   f     1    Shall  K10         10        1
55: researcher   f     1       we  K10         10        2
56: researcher   f     1     move  K10         10        3
57: researcher   f     1       on  K10         10        4
58: researcher   f     1        ?  K10         10        5
59: researcher   f     1     Good  K10         10        6
60: researcher   f     1     then  K10         10        7
61: researcher   f     1        .  K10         10        8
62:       greg   m     0      I'm  K11         11        1
63:       greg   m     0   hungry  K11         11        2
64:       greg   m     0        .  K11         11        3
65:       greg   m     0    Let's  K11         11        4
66:       greg   m     0      eat  K11         11        5
67:       greg   m     0        .  K11         11        6
68:       greg   m     0      You  K11         11        7
69:       greg   m     0  already  K11         11        8
70:       greg   m     0        ?  K11         11        9
        person sex adult    state code element_id token_id
        act    tot    scene                                location
    1: Act1    1.1  Scene I Elsinore. A platform before the castle.
    2: Act1    1.1  Scene I Elsinore. A platform before the castle.
    3: Act1    1.1  Scene I Elsinore. A platform before the castle.
    4: Act1    2.1  Scene I Elsinore. A platform before the castle.
    5: Act1    2.1  Scene I Elsinore. A platform before the castle.
   ---                                                             
35950: Act5 1150.3 Scene II                   A hall in the castle.
35951: Act5 1150.3 Scene II                   A hall in the castle.
35952: Act5 1150.3 Scene II                   A hall in the castle.
35953: Act5 1150.3 Scene II                   A hall in the castle.
35954: Act5 1150.3 Scene II                   A hall in the castle.
                  person  died dialogue element_id token_id
    1:          Bernardo FALSE    who's          1        1
    2:          Bernardo FALSE    there          1        2
    3:          Bernardo FALSE        ?          1        3
    4:         Francisco FALSE      nay          2        1
    5:         Francisco FALSE        ,          2        2
   ---                                                     
35950: Prince Fortinbras FALSE      bid       2007        3
35951: Prince Fortinbras FALSE      the       2007        4
35952: Prince Fortinbras FALSE soldiers       2007        5
35953: Prince Fortinbras FALSE    shoot       2007        6
35954: Prince Fortinbras FALSE        .       2007        7

textshape documentation built on May 29, 2021, 1:07 a.m.