knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Before you use this package you need to answer the following 2 questions:
So your choices for how you want to proceed are listed below:
R contains a crime data set for the all 50 states. This data set contains data on murder rates, assaults, urban population and the occurrences of rape. The TestContR can be used to match states that have similar crime rates.
library(dplyr) library(TestContR)
Numeric only dataframe:
df <- datasets::USArrests %>% dplyr::mutate(state = base::row.names(USArrests)) %>% dplyr::select(state, everything())
Expected data set format with individuals/labels/names/id in the first column:
knitr::kable(head(df, n = 10))
Build Test and Control list:
# defaults to 10 obs for the test group with matching controls. Change the size of the test group w/ param "n". set.seed(99) TEST_CONTROL_LIST <- TestContR::match_numeric(df)
Results of random selection option:
knitr::kable(TEST_CONTROL_LIST)
TEST_GRP <- tribble(~'TEST','Colorado','Minnesota','Florida','South Carolina')
Example of data frame for the "test_list" input parameter:
knitr::kable(TEST_GRP)
set.seed(99) TEST_CONTROL_LIST <- TestContR::match_numeric(df, test_list = TEST_GRP)
Results for the "test_list" input parameter:
knitr::kable(TEST_CONTROL_LIST)
Numeric and categorical dataframe:
df <- datasets::USArrests %>% dplyr::mutate(state = base::row.names(datasets::USArrests)) %>% base::cbind(datasets::state.division) %>% dplyr::select(state, dplyr::everything())
Expected data set format with individuals/labels/names/id in the first column:
knitr::kable(head(df, n = 10))
Build Test and Control list from mixed metrics:
# defaults to 10 obs for the test group with matching controls. Change the size of the test group w/ param "n". set.seed(99) TEST_CONTROL_LIST <- TestContR::match_mixed(df)
Results of random selection option:
knitr::kable(TEST_CONTROL_LIST)
TEST_GRP <- tribble(~'TEST','Colorado','Minnesota','Florida','South Carolina')
Example of data frame for the "test_list" input parameter:
knitr::kable(TEST_GRP)
set.seed(99) TEST_CONTROL_LIST <- TestContR::match_mixed(df, test_list = TEST_GRP)
Results for the "test_list" input parameter:
knitr::kable(TEST_CONTROL_LIST)
Build/provide a list of the obs of interest in the test_list:
test_list <- tribble(~"TEST","Colorado")
Numeric only dataframe:
df <- datasets::USArrests %>% dplyr::mutate(state = base::row.names(USArrests)) %>% dplyr::select(state, everything())
Expected data set format with individuals/labels/names/id in the first column:
knitr::kable(head(df, n = 10))
Build the list of Top N matches: Provide the test_list dataframe to the test_list parameter in the function as below.
TOPN_CONTROL_LIST <- TestContR::topn_numeric(df, topN = 10, test_list = test_list)
Results of Top N selection option:
knitr::kable(head(TOPN_CONTROL_LIST,20))
Top N without a Test List: Don't be concerned about the warning; I just wanted to let users know that it would use all the labels in the dataframe.
TOPN_CONTROL_LIST <- TestContR::topn_numeric(df, topN = 10)
Results of Top N selection without Test List:
knitr::kable(head(TOPN_CONTROL_LIST,20))
Numeric and categorical dataframe:
df <- datasets::USArrests %>% dplyr::mutate(state = base::row.names(datasets::USArrests)) %>% base::cbind(datasets::state.division) %>% dplyr::select(state, dplyr::everything())
Expected data set format with individuals/labels/names/id in the first column:
knitr::kable(head(df, n = 10))
Build Test and Control list from mixed metrics:
set.seed(99) TOPN_CONTROL_LIST <- TestContR::topn_mixed(df, topN = 10, test_list = test_list)
Results of Top N selection without Test List:
knitr::kable(head(TOPN_CONTROL_LIST,20))
Top N Mixed without a Test List Don't be concerned about the warning; I just wanted to let users know that it would use all the labels in the dataframe.
TOPN_CONTROL_LIST <- TestContR::topn_mixed(df, topN = 10)
Results of Top N selection without Test List:
knitr::kable(head(TOPN_CONTROL_LIST,20))
Depending on your experiment, it may be prudent to add categorical metrics/variables that will help align your data better. In the above examples, when only using the numerical data Alabama's nearest match is Louisiana, but once region is taken into consideration, Alabama's nearest match is Tennessee. Now you have the tools to create a list of nearest matches for your data whether it is numeric or mixed.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.