data.md
In byuidatascience/data4benfords: What the Package Does (One Line, Title Case)

The data is called cities.

The data is built to have the count in the number column with the first and last digit separated

The source of this data is < https://github.com/midnightradio/cse140-data-programming and https://simplemaps.com/data/us-cities >

A data frame with columns:

|variable |class |description | |:--------|:---------|:-------------------------------------------| |country |character |Either US or fiction | |city |character |The city within the country | |location |character |The region within which the city is located | |number |numeric |The population of that city | |first |character |The first digit of number | |last |character |The last digit of number |

The data is called cities_us.

The data is built to have the count in the number column with the first and last digit separated

The source of this data is < https://simplemaps.com/data/us-cities >

A data frame with columns:

|variable |class |description | |:--------|:---------|:-------------------------------------------| |city |character |The city within the country | |location |character |The region within which the city is located | |number |numeric |The population of that city | |first |character |The first digit of number | |last |character |The last digit of number |

The data is called cities_fiction.

The data is built to have the count in the number column with the first and last digit separated

The source of this data is < https://github.com/midnightradio/cse140-data-programming >

A data frame with columns:

The data is called waitlist.

The data is built to have the count in the number column with the first and last digit separated

The source of this data is < https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5942457/ >

A data frame with columns:

|variable |class |description | |:--------|:---------|:-------------------------------------------| |country |character |The source country of the data | |type |character |The type of medical procedure | |details |character |Further details about the medical procedure | |month |character |The month of the year | |year |numeric |Year | |number |numeric |The number of people on the waitlist | |first |character |The first digit of number | |last |character |The last digit of number |

The data is called waitlist_finland.

The data is built to have the count in the number column with the first and last digit separated

The source of this data is < https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5942457/ >

A data frame with columns:

The data is called waitlist_spain.

The data is built to have the count in the number column with the first and last digit separated

The source of this data is < https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5942457/ >

A data frame with columns:

The data is called election.

The data is built to have the count in the number column with the first and last digit separated

The source of this data is < https://github.com/midnightradio/cse140-data-programming >

A data frame with columns:

|variable |class |description | |:---------|:---------|:-------------------------------------------------------| |country |character |The source country of the data | |region |character |The region within which the election votes were tallied | |candidate |character |The name of the electoral candidate | |number |numeric |The number of votes cast for the candidate | |first |character |The first digit of number | |last |character |The last digit of number |

The data is called election_iran.

The data is built to have the count in the number column with the first and last digit separated

The source of this data is < https://github.com/midnightradio/cse140-data-programming >

A data frame with columns:

|variable |class |description | |:---------|:---------|:-------------------------------------------------------| |region |character |The region within which the election votes were tallied | |candidate |character |The name of the electoral candidate | |number |numeric |The number of votes cast for the candidate | |first |character |The first digit of number | |last |character |The last digit of number |

The data is called election_us.

The data is built to have the count in the number column with the first and last digit separated

The source of this data is < https://github.com/midnightradio/cse140-data-programming >

A data frame with columns:

The data is called benford.

This data has to counts by first digit for the election, waitlist, and cities data

The source of this data is < https://github.com/midnightradio/cse140-data-programming, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5942457/, and https://simplemaps.com/data/us-cities >

A data frame with columns:

|variable |class |description | |:---------------|:---------|:--------------------------------------------------------| |data |character |The data object used to calculate digit counts | |country |character |The location or group within each data object | |first |character |The first digit number | |n |integer |The count of numbers that started with that digit | |percent |numeric |The percent of the total for each data and country group | |benford_percent |numeric |The expected propoprtion under Benford's law |

The data is called pick_random.

This data has to counts by last digit for the random guesses

The source of this data is < https://docs.google.com/spreadsheets/d/1TasFdyWr9xN7uWiWw0PkaFDwHYgQiC3y41YKR9CFRlA/edit#gid=0 and https://www.reddit.com/r/dataisbeautiful/comments/acow6y/asking_over_8500_students_to_pick_a_random_number/ >

A data frame with columns:

|variable |class |description | |:------------|:---------|:----------------------------------------------------------------------| |digit |character |The number of interest between 0-9 | |n_09 |integer |The count of people that picked that digit. Note 10s were changed to 0 | |percent_09 |numeric |The percentage of each digit of the total for the 0-9 digit counts | |n_last |integer |The count of the last digit of numbers picked between 0 and 1 million. | |percent_last |numeric |The percentage of each digt of the total for the last digit counts. |

The data is called last_digit.

This data has to counts by last digit for the election, waitlist, and cities data

The source of this data is < https://github.com/midnightradio/cse140-data-programming, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5942457/, and https://simplemaps.com/data/us-cities >

A data frame with columns:

|variable |class |description | |:------------|:---------|:--------------------------------------------------------| |data |character |The data object used to calculate digit counts | |country |character |The location or group within each data object | |last |character |The last digit number | |n |integer |The count of numbers that ended with that digit | |percent |numeric |The percent of the total for each data and country group | |last_percent |numeric |The expected propoprtion under complete randomness |

The data is called accounting.

The data is built to have the count in the number column with the first and last digit separated

The source of this data is < https://github.com/carloscinelli/benford.analysis and https://www.amazon.com/Benfords-Law-Applications-Accounting-Detection/dp/1118152859 >

A data frame with columns:

|variable |class |description | |:--------|:---------|:----------------------------------------------| |data |character |The data object used to calculate digit counts | |number |numeric |The number of votes cast for the candidate | |first |character |The first digit of number | |last |character |The last digit of number |

The data is called accounting_gm.

The data is built to have the count in the number column with the first and last digit separated

The source of this data is < https://github.com/carloscinelli/benford.analysis and https://www.amazon.com/Benfords-Law-Applications-Accounting-Detection/dp/1118152859 >

A data frame with columns:

|variable |class |description | |:--------|:---------|:------------------------------------------| |number |numeric |The number of votes cast for the candidate | |first |character |The first digit of number | |last |character |The last digit of number |

The data is called accounting_government.

The data is built to have the count in the number column with the first and last digit separated

The source of this data is < https://github.com/carloscinelli/benford.analysis and https://www.amazon.com/Benfords-Law-Applications-Accounting-Detection/dp/1118152859 >

A data frame with columns:

The data is called accounting_sino.

The data is built to have the count in the number column with the first and last digit separated

The source of this data is < https://github.com/carloscinelli/benford.analysis and https://www.amazon.com/Benfords-Law-Applications-Accounting-Detection/dp/1118152859 >

A data frame with columns:

The data is called accounting_utility.

The data is built to have the count in the number column with the first and last digit separated

The source of this data is < https://github.com/carloscinelli/benford.analysis and https://www.amazon.com/Benfords-Law-Applications-Accounting-Detection/dp/1118152859 >

A data frame with columns:

The data is called benford_accounting.

This data has to counts by first digit for the accounting data

The source of this data is < https://github.com/carloscinelli/benford.analysis and https://www.amazon.com/Benfords-Law-Applications-Accounting-Detection/dp/1118152859 >

A data frame with columns:

|variable |class |description | |:---------------|:---------|:--------------------------------------------------------| |data |character |The data object used to calculate digit counts | |first |character |The first digit number | |n |integer |The count of numbers that started with that digit | |percent |numeric |The percent of the total for each data and country group | |benford_percent |numeric |The expected propoprtion under Benford's law |

The data is called last_digit_accounting.

This data has to counts by last digit for the accounting data

The source of this data is < https://github.com/carloscinelli/benford.analysis and https://www.amazon.com/Benfords-Law-Applications-Accounting-Detection/dp/1118152859 >

A data frame with columns:

|variable |class |description | |:------------|:---------|:--------------------------------------------------------| |data |character |The data object used to calculate digit counts | |last |character |The last digit number | |n |integer |The count of numbers that ended with that digit | |percent |numeric |The percent of the total for each data and country group | |last_percent |numeric |The expected propoprtion under complete randomness |

The data is called utility_data.

This data adds a few more variables beyond accounting_utility

The source of this data is < https://github.com/carloscinelli/benford.analysis and https://www.amazon.com/Benfords-Law-Applications-Accounting-Detection/dp/1118152859 >

A data frame with columns:

|variable |class |description | |:---------|:---------|:-------------------------| |vendornum |character |Vendor Number | |date |Date |Date of the invioce | |invnum |character |The invoice number | |amount |numeric |The amount on the invoice |

The data is called government_data.

This data adds a few more variables beyond accounting_government

The source of this data is < https://github.com/carloscinelli/benford.analysis and https://www.amazon.com/Benfords-Law-Applications-Accounting-Detection/dp/1118152859 >

A data frame with columns:

|variable |class |description | |:----------------|:---------|:--------------------------------------------------------------| |cardnum |character |Credit card number used for the purchase | |date |Date |The date of the transaction | |merchnum |character |The merchant number | |merchdescription |character |the merchant name and details | |merchstate |character |The state where the merchant is located | |merchzip |character |The zipcode of the merchant | |transtype |character |The transaction type. A, D, P, Y | |amount |numeric |the amount ot the transaction | |merch_clean |character |A cleaned merchant name | |merch_other200 |character |All merchants with less than 200 transactions grouped to other | |merch_other100 |character |All merchants with less than 100 transactions grouped to other | |merch_other50 |character |All merchants with less than 50 transactions grouped to other | |merch_other10 |character |All merchants with less than 10 transactions grouped to other |

byuidatascience/data4benfords documentation built on May 6, 2020, 10:10 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

byuidatascience/data4benfords What the Package Does (One Line, Title Case)

data.md In byuidatascience/data4benfords: What the Package Does (One Line, Title Case)

The population of US cities and cities from fictional sources

Description

Data format

The population of US cities

Description

Data format

The population of cities from fictional sources

Description

Data format

The count of citizens on waitlists for medical procedures

Description

Data format

The count of Finish citizens on waitlists for medical procedures

Description

Data format

The count of Spanish citizens on waitlists for medical procedures

Description

Data format

The election results for Iran and US presidential elections

Description

Data format

The election results for the 2009 presidential elections in Iran

Description

Data format

The election results for the Obama McCain presidential elections in the US

Description

Data format

The counts and percentage of first digits for all data objects

Description

Data format

The counts and percentage of last digits for college students asked to pick random numbers

Description

Data format

The counts and percentage of last digits for all data objects

Description

Data format

The combined accounting data sets

Description

Data format

The amounts paid to vendors for the 90 days preceding General Motor's 2009 liquidation.

Description

Data format

A dataset containing the card transactions for a government entity - 2010.

Description

Data format

Financial Statements numbers of Sino Forest Corporation's 2010 Report.

Description

Data format

A dataset of the 2010's payments data of a division of a West Coast utility company.

Description

Data format

The counts and percentage of first digits for all data objects

Description

Data format

The counts and percentage of last digits for all data objects

Description

Data format

A full dataset of the 2010's payments data of a division of a West Coast utility company.

Description

Data format

A full dataset containing the card transactions for a government entity - 2010.

Description

Data format

R Package Documentation

Browse R Packages

We want your feedback!

byuidatascience/data4benfords
What the Package Does (One Line, Title Case)

data.md
In byuidatascience/data4benfords: What the Package Does (One Line, Title Case)