DiscoverSubgroups: Performs Subgroup Discovery

Description Usage Arguments See Also Examples

View source: R/subgroup.R

Description

Performs subgroup discovery according to the given target and the configuration on the data.

Usage

1
DiscoverSubgroups(source, target, config= SDTaskConfig(), as.df=FALSE)

Arguments

source

a data.frame or the a character string giving the filename of an ARFF file to use. Providing a file name directly provides the data to the subgroup discovery algorithms on the Java side, which is more memory efficient than converting the data frame to the Java representation.

target

the target variable (constructed by as.target) to consider for subgroup discovery.

config

an instance of SDTaskConfig providing various parameters for subgroup discovery.

as.df

TRUE, if the result patterns should be returned as a data.frame using ToDataFrame

See Also

DiscoverSubgroupsByTask. as.target CreateSDTask SDTaskConfig

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# subgroup discovery on a data.frame, for binary target
data(credit.data)
result1 <- DiscoverSubgroups(
    credit.data, as.target("class", "good"), new("SDTaskConfig",
    attributes=c("checking_status", "credit_amount", "employment", "purpose")))
result2 <- DiscoverSubgroups(
    credit.data, as.target("class", "good"), new("SDTaskConfig",
    attributes=c("checking_status", "employment")))

ToDataFrame(result1)
ToDataFrame(result2)

# subgroup discovery for numeric target variable
result3 <- DiscoverSubgroups(
    credit.data, as.target("credit_amount"), new("SDTaskConfig",
    attributes=c("checking_status", "employment")))

ToDataFrame(result3)

Example output

Loading required package: rJava
Loading required package: foreign
OpenJDK 64-Bit Server VM warning: Can't detect initial thread stack location - find_vma failed
   quality    p size
1     72.2 0.88  394
2     31.1 0.94  127
3     26.5 0.93  115
4     22.0 0.78  280
5     21.7 0.86  139
6     19.5 0.96   75
7     19.0 0.94   80
8     14.2 0.96   54
9     13.9 0.83  103
10    13.2 0.78  174
11    12.0 0.85   80
12    12.0 0.94   50
13    11.9 0.75  253
14    10.1 0.97   37
15     8.7 0.85   59
16     5.9 0.96   23
17     5.7 0.82   49
18     5.4 0.89   28
19     4.9 0.78   63
20     4.4 0.79   48
                                                        description
1                                       checking_status=no checking
2                     purpose=radio/tv, checking_status=no checking
3                       employment=>=7, checking_status=no checking
4                                                  purpose=radio/tv
5                    employment=1<=X<4, checking_status=no checking
6                    employment=4<=X<7, checking_status=no checking
7                                  employment=>=7, purpose=radio/tv
8                     purpose=used car, checking_status=no checking
9                                                  purpose=used car
10                                                employment=4<=X<7
11                     purpose=new car, checking_status=no checking
12    employment=>=7, purpose=radio/tv, checking_status=no checking
13                                                   employment=>=7
14 purpose=radio/tv, employment=1<=X<4, checking_status=no checking
15         purpose=furniture/equipment, checking_status=no checking
16 employment=4<=X<7, purpose=radio/tv, checking_status=no checking
17                              employment=4<=X<7, purpose=radio/tv
18                              purpose=used car, employment=1<=X<4
19                                            checking_status=>=200
20                       employment=<1, checking_status=no checking
   quality    p size                                        description
1     72.2 0.88  394                        checking_status=no checking
2     26.5 0.93  115        employment=>=7, checking_status=no checking
3     21.7 0.86  139     employment=1<=X<4, checking_status=no checking
4     19.5 0.96   75     employment=4<=X<7, checking_status=no checking
5     13.2 0.78  174                                  employment=4<=X<7
6     11.9 0.75  253                                     employment=>=7
7      4.9 0.78   63                              checking_status=>=200
8      4.4 0.79   48         employment=<1, checking_status=no checking
9      3.1 0.88   17              checking_status=>=200, employment=>=7
10     2.8 0.76   46        employment=4<=X<7, checking_status=0<=X<200
11     1.9 0.78   23           checking_status=>=200, employment=1<=X<4
12     0.5 0.73   15               checking_status=>=200, employment=<1
13     0.3 1.00    1       employment=unemployed, checking_status=>=200
14     0.1 0.71   17 employment=unemployed, checking_status=no checking
     quality    mean size                                     description
1  149645.60 3827.56  269                        checking_status=0<=X<200
2   61284.07 5935.78   23 employment=unemployed, checking_status=0<=X<200
3   58621.00 4216.76   62                           employment=unemployed
4   57496.11 3601.70  174                               employment=4<=X<7
5   56784.07 3939.31   85     checking_status=0<=X<200, employment=1<=X<4
6   32618.13 3980.35   46     employment=4<=X<7, checking_status=0<=X<200
7   19554.13 3696.35   46           employment=4<=X<7, checking_status=<0
8   14361.65 3462.75   75  employment=4<=X<7, checking_status=no checking
9   13606.97 3477.42   66              employment=>=7, checking_status=<0
10   8177.81 3419.95   55        employment=>=7, checking_status=0<=X<200
Warning message:
system call failed: Cannot allocate memory 

rsubgroup documentation built on Feb. 23, 2021, 3 a.m.