spark_read_csv: Read a CSV file into a 'spark_tbl'

Description Usage Arguments Value Examples

View source: R/read-write.R

Description

Read a CSV file into a spark_tbl

Usage

1
2
3
4
5
6
7
8
9
spark_read_csv(
  path,
  schema = NULL,
  na = "NA",
  header = FALSE,
  delim = ",",
  guess_max = 1000,
  ...
)

Arguments

path

string, the path to the file. Needs to be accessible from the cluster.

schema

StructType, a schema used to read the data, will be inferred if not specified

na

string, the string value used to signify NA values.

header

boolean, whether to read the first line of the file, Default to FALSE.

delim

string, the character used to delimit each column. Defaults to ','.

guess_max

int, the maximum number of records to use for guessing column types.

...

named list, optional arguments to the reader

Value

a spark_tbl

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## Not run: 
path_csv <- tempfile()
iris_fix <- iris %>%
  setNames(names(iris) %>% sub("[//.]", "_", .)) %>%
  mutate(Species = levels(Species)[Species])
write.csv(iris_fix, path_csv, row.names = F)

csv_schema <- SparkR::schema(SparkR::createDataFrame(iris_fix))

# without specified schema
spark_read_csv(path_csv, header = T) %>% collect

# with specified schema
csv_schema <- SparkR::schema(SparkR::createDataFrame(iris_fix))
spark_read_csv(path_csv, csv_schema, header = T) %>% collect

## End(Not run)

danzafar/tidyspark documentation built on Sept. 30, 2020, 12:19 p.m.