B Selected Glossary of R Terminology

Definitions from Glosario

argument: one of possibly several expressions that are passed to a function.

Boolean: Relating to a variable or data type that can have either a logical value of true or false. Named for George Boole, a 19th century mathematician. Binary systems, like all computers, are built on this foundation of systems of logical evaluations between states of true and false, 1 or 0.

assignment operator: Symbol that assigns values on the right to an object on the left. Looks like <-. Keyboard shortcut is Alt + -

camelCase: A style of writing code that involves naming variables and objects with no space, underscore (_), dot (.), or dash (-) characters, with each word being capitalized. Examples include CalculateSum and findPattern.

comment: Text written in a script that is not treated as code to be run, but rather as text that describes what the code is doing. These are usually short notes, beginning with a #

Comprehensive R Archive Network (CRAN): A public repository of R packages.

console: A computer terminal where a user may enter commands, or a program, such as a shell that simulates such a device.

data frame: A two-dimensional data structure for storing tabular data in memory. Rows represent records and columns represent variables.

FALSE: The logical (Boolean) state opposite of “true”. Used in logic and programming to represent a binary state of something.

function: A code block which gathers a sequence of operations into a whole, preserving it for ongoing use by defining a set of tasks that takes zero or more required and optional arguments as inputs and returns expected outputs (return values), if any. Functions enable repeating these defined tasks with one command, known as a function call.

logical indexing: To index a vector or other structure with a vector of Booleans, keeping only the values that correspond to true values. Also referred to as masking.

Markdown: A markup language with a simple syntax intended as a replacement for HTML. Markdown is often used for README files, and is the basis for R markdown.

markup language: A set of rules for annotating text to define its meaning or how it should be displayed. The markup is usually not displayed, but instead controls how the underlying text is interpreted or shown. Markdown and HTML are widely-used markup languages for web pages.

object: A data set, a variable, plot, or more formally, almost everything in R. If it has a mode, it is an object. Includes data frames, vectors, matrices, arrays, lists and functions.

pipe operator: The %>% used to make the output of one function the input of the next.

package: A collection of code, data, and documentation that can be distributed and re-used. Also referred to in some languages as a library or module.

R (programming language): A popular open-source programming language used primarily for data science.

R Markdown: A dialect of Markdown that allows authors to mix prose and code (usually written in R) in a single document.

reproducible research: The practice of describing and documenting research results in such a way that another researcher or person can re-run the analysis code on the same data to obtain the same result.

scalar: A single value of a particular type, such as 1 or “a”. Scalars exist in most languages, but do not really exist in R; in R, values that appear to be scalars are actually vectors of unit length.

script: Originally, a program written in a language too user-friendly for “real” programmers to take seriously; the term is now synonymous with program.

seed: A value used to initialize a pseudo-random number generator.

snake_case: A naming style that separates the parts of a name with underscores, as in first_second_third.

tibble: A modern replacement for R’s data frame, which stores tabular data in columns and rows, defined and used in the tidyverse.

tidy data: Tabular data that satisfies three conditions that facilitate initial cleaning, and later exploration and analysis—(1) each variable forms a column, (2) each observation forms a row, and (3) each type of observation unit forms a table.

Tidyverse: A collection of R packages for operating on tabular data in consistent ways.

TRUE: The logical (Boolean) state opposite of “false”. Used in logic and programming to represent a binary state of something.

Vector: A sequence of values, usually of homogeneous type. Vectors are the fundamental data structure in R; a scalar is just a vector with exactly one element.