Description Usage Arguments Value Examples
Read a CSV file into a spark_tbl
1 2 3 4 5 6 7 8 9 | spark_read_csv(
path,
schema = NULL,
na = "NA",
header = FALSE,
delim = ",",
guess_max = 1000,
...
)
|
path |
string, the path to the file. Needs to be accessible from the cluster. |
schema |
StructType, a schema used to read the data, will be inferred if not specified |
na |
string, the string value used to signify NA values. |
header |
boolean, whether to read the first line of the file, Default to FALSE. |
delim |
string, the character used to delimit each column. Defaults to ','. |
guess_max |
int, the maximum number of records to use for guessing column types. |
... |
named list, optional arguments to the reader |
a spark_tbl
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ## Not run:
path_csv <- tempfile()
iris_fix <- iris %>%
setNames(names(iris) %>% sub("[//.]", "_", .)) %>%
mutate(Species = levels(Species)[Species])
write.csv(iris_fix, path_csv, row.names = F)
csv_schema <- SparkR::schema(SparkR::createDataFrame(iris_fix))
# without specified schema
spark_read_csv(path_csv, header = T) %>% collect
# with specified schema
csv_schema <- SparkR::schema(SparkR::createDataFrame(iris_fix))
spark_read_csv(path_csv, csv_schema, header = T) %>% collect
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.