spark_read_csv: Read a CSV file into a 'spark_tbl'
In danzafar/tidyspark: A Tidy Interface to Spark

Description Usage Arguments Value Examples

Read a CSV file into a spark_tbl

spark_read_csv(
  path,
  schema = NULL,
  na = "NA",
  header = FALSE,
  delim = ",",
  guess_max = 1000,
  ...
)

`path`	string, the path to the file. Needs to be accessible from the cluster.
`schema`	StructType, a schema used to read the data, will be inferred if not specified
`na`	string, the string value used to signify NA values.
`header`	boolean, whether to read the first line of the file, Default to FALSE.
`delim`	string, the character used to delimit each column. Defaults to ','.
`guess_max`	int, the maximum number of records to use for guessing column types.
`...`	named list, optional arguments to the reader

a spark_tbl

## Not run: 
path_csv <- tempfile()
iris_fix <- iris %>%
  setNames(names(iris) %>% sub("[//.]", "_", .)) %>%
  mutate(Species = levels(Species)[Species])
write.csv(iris_fix, path_csv, row.names = F)

csv_schema <- SparkR::schema(SparkR::createDataFrame(iris_fix))

# without specified schema
spark_read_csv(path_csv, header = T) %>% collect

# with specified schema
csv_schema <- SparkR::schema(SparkR::createDataFrame(iris_fix))
spark_read_csv(path_csv, csv_schema, header = T) %>% collect

## End(Not run)