profiles: Cleaned OkCupid profile data

Description Usage Format Details Source Examples

Description

Cleaned profile data 59,946 OkCupid users who were living within 25 miles of San Francisco, had active profiles on June 26, 2012, were online in the previous year, and had at least one picture in their profile. The original data and codebook can be found at https://github.com/rudeboybert/JSE_OkCupid.

Usage

1

Format

A data.frame with 59946 rows and 22 variables:

age

Age

body_type

Body type

diet

Dietary habits

drinks

Drinking habits

drugs

Drug usage habits

education

Education level

ethnicity

Ethnicity

height

Height in inches

income

Income

job

Job

last_online

Date/time of last login to OkCupid

location

Location

offspring

Number of offspring

orientation

Sexual orientation

pets

Number of pets

religion

Religious affiliation

sex

Sex. Note at the time OkCupid only allowed for male/female binary. This has since been relaxed.

sign

Astrological sign

smokes

Smoking habits

speaks

Languages spoken

status

Relationship status

essay0

Response to first essay question (my self summary), trimmed to 140 characters

Details

The differences between the cleaned and original version of profiles data are:

Essay Responses

Due to file size restrictions, only the first 140 characters of each user's first essay response (my self summary) is included

Missing income values

Previously coded as -1, they are now coded as NA

All other missing values

Previously coded as "", they are now coded as NA

offspring and sign

String instances of "?’" are replaced with apostrophes

last_online

Date/time strings are converted to USA/Pacific timezone POSIXct date-time objects using parse_date_time

Source

https://github.com/rudeboybert/JSE_OkCupid

Examples

1
2
3
4
5
library(okcupiddata)
data(profiles)
# If using RStudio:
# View(profiles)
summary(profiles$income)

Example output

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  20000   20000   50000  104395  100000 1000000   48442 

okcupiddata documentation built on May 2, 2019, 3:31 p.m.