Select function

Selecting Columns with dplyr’s select function

Generate a Small Data Set

To begin, we’ll use the pre-built dataset for our exercise. We’ll utilize the pre-existing mtcars dataset, which provides details on different car models.

# Load the dplyr package
library(dplyr)

# Create a small data set using the mtcars dataset
data = mtcars

Let’s inspect the data

Prior to delving into column selection, it’s important to examine the structure and contents of our dataset.

# View the structure of the data set
str(data)

## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

# View the first few rows of the data set
head(data)

Now it’s time for the select-function

The select() function grants us the ability to pick particular columns from a data frame. We have the flexibility to specify columns using various methods.

# Select specific columns by name 
selected_columns_by_name = select(data, mpg, hp, wt) # we select "Miles/US Gallon", "horsepower", "Weight"
head(selected_columns_by_name)

The variable selected_columns contains a subset of the initial data set.

# Select columns using column indexing
selected_columns_by_index = select(data, 1:3)
head(selected_columns_by_index)

Again, we get a subset of the data we initially generated by using indices.

# Select columns using column names via a logical condition
selected_columns_by_condition <- select(data, starts_with("cyl"))
head(selected_columns_by_condition)

A third way to get a subset using the select-function is formulating logical conditions, here: choose all variables that start with “cyl”. That gives us only one variable.

Renaming Columns

You can also use select() to rename columns of the data set.

# Rename the 'mpg' column to 'MilesPerGallon'
renamed_column <- select(data, MilesPerGallon = mpg)
head(renamed_column)

Now, we have a subset (one column) stored in a new variable. On top of that, the variable has a new label, we can work with now.

Excluding Columns

In addition to selecting columns from a data set, we can exclude certain columns using the select() function, as well.

# Exclude the 'disp' and 'gear' columns
excluded_columns = select(data, -disp, -gear, -am)
head(excluded_columns)

Now we have a new variable, that is a subset of the inital data set. We got this subset by excluding three variables: disp, gear, am. The exclusion can be done with the help of the minus symbol.