View source: R/DSD_ReadStream.R
DSD_ReadStream | R Documentation |
A DSD class that reads a data stream (text format) from a file or any R connection.
DSD_ReadStream(
file,
k = NA,
take = NULL,
sep = ",",
header = FALSE,
skip = 0,
col.names = NULL,
colClasses = NA,
outofpoints = c("warn", "ignore", "stop"),
...
)
DSD_ReadCSV(
file,
k = NA,
take = NULL,
sep = ",",
header = FALSE,
skip = 0,
col.names = NULL,
colClasses = NA,
outofpoints = c("warn", "ignore", "stop"),
...
)
## S3 method for class 'DSD_ReadStream'
close_stream(dsd, ...)
## S3 method for class 'DSD_ReadCSV'
close_stream(dsd, ...)
file |
A file/URL or an open connection. |
k |
Number of true clusters, if known. |
take |
indices of columns to extract from the file. |
sep |
The character string that separates dimensions in data points in the stream. |
header |
Does the first line contain variable names? |
skip |
the number of lines of the data file to skip before beginning to read data. |
col.names |
A vector of optional names for the variables. The default is to use |
colClasses |
A vector of classes to be assumed for the columns passed
on to |
outofpoints |
Action taken if less than
|
... |
Further arguments are passed on to |
dsd |
A object of class |
DSD_ReadStream
uses readLines()
and read.table()
to read data from an R
connection line-by-line and convert it into a data.frame.
The connection is responsible for maintaining where the stream
is currently being read from. In general, the connections will consist of
files stored on disk but have many other possibilities (see
connection).
The implementation tries to gracefully deal with slightly corrupted data by dropping points with inconsistent reading and producing a warning. However, this might not always be possible resulting in an error instead.
Column names
If the file has column headers in the first line, then they can be used by setting header = TRUE
.
Alternatively, column names can be set using col.names
or a named vector for take
. If no column
names are specified then default names will be created.
Columns with names that start with .
are considered information columns and are ignored by DST
s.
See get_points()
for details.
Other information columns are are used by various functions.
Reading the whole stream
By using n = -1
in get_points()
, the whole stream is returned.
Resetting and closing a stream
The position in the file can be reset to the beginning or another position using
reset_stream()
. This fails of the underlying connection is not seekable (see connection).
DSD_ReadStream
maintains an open connection to the stream and needs to be closed
using close_stream()
.
DSD_ReadCSV
reads a stream from a comma-separated values file.
An object of class DSD_ReadCSV
(subclass of DSD_R, DSD).
Michael Hahsler
readLines()
, read.table()
.
Other DSD:
DSD()
,
DSD_BarsAndGaussians()
,
DSD_Benchmark()
,
DSD_Cubes()
,
DSD_Gaussians()
,
DSD_MG()
,
DSD_Memory()
,
DSD_Mixture()
,
DSD_NULL()
,
DSD_ReadDB()
,
DSD_Target()
,
DSD_UniformNoise()
,
DSD_mlbenchData()
,
DSD_mlbenchGenerator()
,
DSF()
,
animate_data()
,
close_stream()
,
get_points()
,
plot.DSD()
,
reset_stream()
# Example 1: creating data and writing it to disk
stream <- DSD_Gaussians(k = 3, d = 2)
write_stream(stream, "data.txt", n = 100, info = TRUE, header = TRUE)
readLines("data.txt", n = 5)
# reading the same data back
stream2 <- DSD_ReadStream("data.txt", header = TRUE)
stream2
# get points
get_points(stream2, n = 5)
plot(stream2, n = 20)
# clean up
close_stream(stream2)
file.remove("data.txt")
# Example 2: Read part of the kddcup1999 data (take only cont. variables)
# col 42 is the class variable
file <- system.file("examples", "kddcup10000.data.gz", package = "stream")
stream <- DSD_ReadCSV(gzfile(file),
take = c(1, 5, 6, 8:11, 13:20, 23:41, .class = 42), k = 7)
stream
get_points(stream, 5)
# plot 100 points (projected on the first two principal components)
plot(stream, n = 100, method = "pca")
close_stream(stream)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.