Use Appropriate and Tidy Data {#data_structure}

(ref:datastructures-intro)

You Must - (ref:datastructures-must)

You Should - (ref:datastructures-should)

You Could - (ref:datastructures-could)

|Related Areas: | Sensible Defaults | |--------------- |------------------------------------------------------------|

Tidy Data? {#tidy_data}

A dataset is a collection of values, usually either numbers (if quantitative) or strings (if qualitative). Values are organised in two ways; every value belongs to a variable and an observation:

The majority of data we work with comes in rectangles. For this data to be tidy, ensure that: 1. Each variable forms a column. 2. Each observation forms a row. 3. Each type of observational unit forms a table.

For more see the section on Tidy Data in R for Data Science or the original paper.

Use tidy data structures as part of your work. You should attempt to convert incoming data into tidy format as quickly as possible. Any data that is output that may be used in other projects should be in tidy format as well as any other required formats.

Data Types and Structures {#data_types}

Data types are the basic units which your language uses to store data, things like integers, doubles, strings and logical data. Typically you are working with data frames, arrays, matricies or lists. These hold multiple items of data in a data structure.

Different types and structures are used for different things, and have different capabilities. To be effective, know about the data types and structures available to you and use the right ones for the job!

R

The R Programming for Data Science book has a good section on the 'Nuts and Bolts' of R which covers types and structures. For more about the different data structures a good resource is the Advanced R book.

Python

For a list of python datatypes see the:

Schema {#schema}

The R for data science book has a nice section on relational data.



DataS-DHSC/coding_principles_book documentation built on March 11, 2020, 4:13 a.m.