In Fredo-XVII/TestContR: Randomly Select Test and Control Groups

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Before you use this package you need to answer the following 2 questions:

Do I want a randomized list of individuals or do I want the Top N observations for a list of individuals that I already have?
If randomized, then use a match_* function.
if top n, then use a top_n* function.
Do you have data with only numeric variables, or does your data have both numeric and categorical?
If numeric, then use a *_numeric function.
If mixed, then use a *_mixed function.

So your choices for how you want to proceed are listed below:

match_numeric(): randomized output, numeric input.
match_mixed(): randomized output, mixed input.
topn_numeric(): list of top n matches for output, numeric input.
topn_mixed(): list of top n matches for output, mixed input.

Randomized Sample, match_* functions

R contains a crime data set for the all 50 states. This data set contains data on murder rates, assaults, urban population and the occurrences of rape. The TestContR can be used to match states that have similar crime rates.

library(dplyr)
library(TestContR)

match_numeric(): Random selection of test and control groups/individuals for numeric metrics/variables.

Numeric only dataframe:

df <- datasets::USArrests %>% dplyr::mutate(state = base::row.names(USArrests)) %>%
                               dplyr::select(state, everything())

Expected data set format with individuals/labels/names/id in the first column:

knitr::kable(head(df, n = 10))

Build Test and Control list:

# defaults to 10 obs for the test group with matching controls. Change the size of the test group w/ param "n".

set.seed(99)
TEST_CONTROL_LIST <- TestContR::match_numeric(df)

Results of random selection option:

knitr::kable(TEST_CONTROL_LIST)

Providing a list of Test Groups/Individuals (No randomization of the test group)

TEST_GRP <- tribble(~'TEST','Colorado','Minnesota','Florida','South Carolina')

Example of data frame for the "test_list" input parameter:

knitr::kable(TEST_GRP)

set.seed(99)
TEST_CONTROL_LIST <- TestContR::match_numeric(df, test_list = TEST_GRP)

Results for the "test_list" input parameter:

knitr::kable(TEST_CONTROL_LIST)

match_mixed(): Random selection of test and control groups/individuals with mixed metrics/variables, meaning both numeric and categorical.

Numeric and categorical dataframe:

df <- datasets::USArrests %>% dplyr::mutate(state = base::row.names(datasets::USArrests)) %>%
  base::cbind(datasets::state.division) %>%
  dplyr::select(state, dplyr::everything())

Expected data set format with individuals/labels/names/id in the first column:

knitr::kable(head(df, n = 10))

Build Test and Control list from mixed metrics:

# defaults to 10 obs for the test group with matching controls. Change the size of the test group w/ param "n".

set.seed(99)
TEST_CONTROL_LIST <- TestContR::match_mixed(df)

Results of random selection option:

knitr::kable(TEST_CONTROL_LIST)

Providing a list of Test Groups/Individuals (No randomization of the test group)

TEST_GRP <- tribble(~'TEST','Colorado','Minnesota','Florida','South Carolina')

Example of data frame for the "test_list" input parameter:

knitr::kable(TEST_GRP)

set.seed(99)
TEST_CONTROL_LIST <- TestContR::match_mixed(df, test_list = TEST_GRP)

Results for the "test_list" input parameter:

knitr::kable(TEST_CONTROL_LIST)

Top N matches for individuals or groups, the topn_* functions

NOTE: You can provide more than one group to the topn_* functions, but the function does not remove duplicates in the control list for the more than 1 group or individual. WARNING Because of this, if you provide topn_* functions with a full dataset of size M with a function parameter "n" ~ M and no "test_list", then you will get an Mx~M matrix, where n is the function parameter that determines the size of the list of matches. For the topn_mixed function this may take a very long time to complete. In other words, you should be selective of the size of Top N matches you want to create and it highly advised to use the "test_list" parameter when possible. More on this below.

topn_numeric(): Select Top N Controls for a set of groups or individuals

Build/provide a list of the obs of interest in the test_list:

test_list <- tribble(~"TEST","Colorado")

Numeric only dataframe:

df <- datasets::USArrests %>% dplyr::mutate(state = base::row.names(USArrests)) %>%
                               dplyr::select(state, everything())

Expected data set format with individuals/labels/names/id in the first column:

knitr::kable(head(df, n = 10))

Build the list of Top N matches: Provide the test_list dataframe to the test_list parameter in the function as below.

TOPN_CONTROL_LIST <- TestContR::topn_numeric(df, topN = 10, test_list = test_list)

Results of Top N selection option:

knitr::kable(head(TOPN_CONTROL_LIST,20))

Top N without a Test List: Don't be concerned about the warning; I just wanted to let users know that it would use all the labels in the dataframe.

TOPN_CONTROL_LIST <- TestContR::topn_numeric(df, topN = 10)

Results of Top N selection without Test List:

knitr::kable(head(TOPN_CONTROL_LIST,20))

topN_mixed(): Random selection of test and control groups/individuals with mixed metrics/

Numeric and categorical dataframe:

df <- datasets::USArrests %>% dplyr::mutate(state = base::row.names(datasets::USArrests)) %>%
  base::cbind(datasets::state.division) %>%
  dplyr::select(state, dplyr::everything())

Expected data set format with individuals/labels/names/id in the first column:

knitr::kable(head(df, n = 10))

Build Test and Control list from mixed metrics:

set.seed(99)
TOPN_CONTROL_LIST <- TestContR::topn_mixed(df, topN = 10, test_list = test_list)

Results of Top N selection without Test List:

knitr::kable(head(TOPN_CONTROL_LIST,20))

Top N Mixed without a Test List Don't be concerned about the warning; I just wanted to let users know that it would use all the labels in the dataframe.

TOPN_CONTROL_LIST <- TestContR::topn_mixed(df, topN = 10)

Results of Top N selection without Test List:

knitr::kable(head(TOPN_CONTROL_LIST,20))

Conclusion

Depending on your experiment, it may be prudent to add categorical metrics/variables that will help align your data better. In the above examples, when only using the numerical data Alabama's nearest match is Louisiana, but once region is taken into consideration, Alabama's nearest match is Tennessee. Now you have the tools to create a list of nearest matches for your data whether it is numeric or mixed.

Fredo-XVII/TestContR documentation built on Nov. 5, 2022, 5:54 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Fredo-XVII/TestContR
Randomly Select Test and Control Groups

In Fredo-XVII/TestContR: Randomly Select Test and Control Groups

Randomized Sample, match_* functions

match_numeric(): Random selection of test and control groups/individuals for numeric metrics/variables.

Providing a list of Test Groups/Individuals (No randomization of the test group)

match_mixed(): Random selection of test and control groups/individuals with mixed metrics/variables, meaning both numeric and categorical.

Providing a list of Test Groups/Individuals (No randomization of the test group)

Top N matches for individuals or groups, the topn_* functions

topn_numeric(): Select Top N Controls for a set of groups or individuals

topN_mixed(): Random selection of test and control groups/individuals with mixed metrics/

Conclusion

R Package Documentation

Browse R Packages

We want your feedback!

Fredo-XVII/TestContR Randomly Select Test and Control Groups

In Fredo-XVII/TestContR: Randomly Select Test and Control Groups

Randomized Sample, match_* functions

match_numeric(): Random selection of test and control groups/individuals for numeric metrics/variables.

Providing a list of Test Groups/Individuals (No randomization of the test group)

match_mixed(): Random selection of test and control groups/individuals with mixed metrics/variables, meaning both numeric and categorical.

Providing a list of Test Groups/Individuals (No randomization of the test group)

Top N matches for individuals or groups, the topn_* functions

topn_numeric(): Select Top N Controls for a set of groups or individuals

topN_mixed(): Random selection of test and control groups/individuals with mixed metrics/

Conclusion

R Package Documentation

Browse R Packages

We want your feedback!

Fredo-XVII/TestContR
Randomly Select Test and Control Groups