5 R as a Programming Language

You may or may not have used other programming languages before coming to R. Either way, R has several distinctive features which are worth noting.

5.1 Data frames

One of R’s greatest strengths is in manipulating data. One of the primary structures for storing data in R is called a Data Frame. Much of your work in R will be working with and manipulating data frames. Data frames are made up of rows and columns. The top row is a header and describes the contents of each column. Each row represents an individual data row or observation. Rows can also have names. Each row contains multiple cells which contain the content of the data.

Let’s look at a data frame that is included in R as an example. This data frame is called mtcars and contains some data about common car models. We can look at this data frame using the head function, which previews the first few rows. You can also use the functions colnames or rownames to get the column or row names of a data frame, respecitvely.

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

The columns in data frames can contain different types of information. In this particular data frame, all columns contain numbers, as denoted by the indication <dbl>, a type of number that allows numbers after the decimal point. Columns could also be integers represented by <int> (which don’t allow numbers after the decimal point), dates represented by <date>,booleans or logicals represented by lgl (TRUE or FALSE), characters represented by <chr> (these contain text), or factors represented by <fctr> (these are helpful for storing categorical information such as species names or gear types).

These days, you may also see the word tibble, which is a modern version of the R data frame and is being used more and more widely. Tibbles generally function much like data frames, but do away with some frustrating features and are generally cleaner. We recommend using tibbles.

5.2 Vectors

Another very common data type is the vector, which stores 1-dimensional information, such as a list of numbers or characters. Vectors are built using the c function. Below are two examples:

myNumericVector <- c(1, 2, 3, 4)

myCharacterVector <- c("A", "B", "C", "D")

5.3 Functions

R is a “functional” programming language, which means it gets much of its power by relying on the concept of functions. Functions are small chunks of code that can do a certain task. They require a certain number of inputs, and provide a certain number of outputs. They allow for common tasks to be performed easily and reproducibly.

R contains many built-in functions, including several for helpful statistics, as shown below. In these examples, there are also some helpful comments to tell you what each line is doing in the code blocks below. Comments start with the # operation, and are not evaluated by R - they are simply there to document the code.

# The sum function takes a vector of numbers as an input, calculates the sum of those numbers, and produces a single number as an output
sum(c(1, 2, 3))
## [1] 6
# The mean function takes a vector of numbers as an input, calculates the mean of those numbers, and produces a single number as an
mean(c(1 ,2, 3))
## [1] 2
# The median function takes a vector of numbers as an input, calculates the median of those numbers, and produces a single number as an
median(c(1 ,2, 3))
## [1] 2

You can even define your own functions, which can be an incredibly powerful way to save time when doing a task you anticipate needing to do many times. The example below shows how you would write a function that takes in two numbers as an input, manipulates the numbers, and then provides a single number as an output. Once you have defined a function, you can use it again later on.

# Define the function
myFunction <- function(x,y){
  z <- (x + 2 * y) / x
# Test the function
## [1] 3.666667

5.4 Tips on coding and style

It is helpful to code in a consistent manner. This will not only make your code readble by others, but will even be helpful for you as you revisit code you have previously written. Using consistent code stying also makes it much easier to collaborate with others. We highly recommend following Google’s style guidelines for R.