(ref:datastructures-intro)
You Must - (ref:datastructures-must)
You Should - (ref:datastructures-should)
You Could - (ref:datastructures-could)
|Related Areas: | Sensible Defaults | |--------------- |------------------------------------------------------------|
A dataset is a collection of values, usually either numbers (if quantitative) or strings (if qualitative). Values are organised in two ways; every value belongs to a variable and an observation:
The majority of data we work with comes in rectangles. For this data to be tidy, ensure that: 1. Each variable forms a column. 2. Each observation forms a row. 3. Each type of observational unit forms a table.
For more see the section on Tidy Data in R for Data Science or the original paper.
Use tidy data structures as part of your work. You should attempt to convert incoming data into tidy format as quickly as possible. Any data that is output that may be used in other projects should be in tidy format as well as any other required formats.
Data types are the basic units which your language uses to store data, things like integers, doubles, strings and logical data. Typically you are working with data frames, arrays, matricies or lists. These hold multiple items of data in a data structure.
Different types and structures are used for different things, and have different capabilities. To be effective, know about the data types and structures available to you and use the right ones for the job!
The R Programming for Data Science book has a good section on the 'Nuts and Bolts' of R which covers types and structures. For more about the different data structures a good resource is the Advanced R book.
For a list of python datatypes see the:
The R for data science book has a nice section on relational data.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.