1 Introduction

For this part, I’ve provided an introduction to making different types of plots using ggplot2 and how to play with the aesthetics to produce a nice graphics.

The “Grammar of Graphics” here used by ggplot2 works on a few fundamental things- First you set it up with ggplot() then use + in between functions to link them. There’s lots of different ways to make the same graphs, but generally the format will look like ggplot(data, aes()) + geom_xxx(). If you do this, then each of the geom functions will inherit the data and aesthetics specified in the “parent” function so to speak. Note that you can just write ggplot() to create a “grob” then specify the data and aesthetics in each geom functions. If you do it this way, you can use different data to plot on top of one another.

The aes() is the most important part and where you specify what will be your x, y, and other arguments about group, color or fill. Later in the document, you’ll see some examples of what each of these do!

We’ll first load the packages and the iris that comes with R. Two popular data sets that come with ggplot2 and commonly used in examples are mpg and diamond. These data are often used in the help section as well for various functions.

library(ggplot2)        ## plotting!
library(ggcorrplot)     ## correlation plots
library(gridExtra)      ## arranging plots
library(cowplot)        ## arranging plots
library(dplyr)          ## data manipulation
library(gganimate)      ## animations!!

data("iris")            ## load iris data

There’s plenty of examples using these data sets, and you can find out some more info in the help section by typing ?iris.

I’ll introduce more about ggplot2 themes later on, but for now I’ll set the theme that I prefer to use with the theme_set() function. I typically do this when loading ggplot2 with the following code:

library(ggplot2); theme_set(theme_bw())

2 Histograms

When creating histograms, you can use continuous data or make a bar graph of counts of factors, which is why you can also use geom_bar() to make histograms with some extra arguments. The default geom_histogram function will set the size of bins to 30. Sometimes this works, sometimes it’s not nice. You can change the number of bins with bins = # or change the size of the bins with binwidth = #.

##continuous data
##set binwidth
ggplot() + geom_histogram(data = iris, aes(x=Sepal.Width), binwidth = 0.5)

##set # of bins
ggplot() + geom_histogram(data = iris, aes(x=Sepal.Width), bins = 5)

#count/factor data
ggplot() + geom_histogram(data = iris, aes(x=Species), stat="count")

3 Scatterplots

To make scatter plots, you can use the geom_point() function.

ggplot() + geom_point(data = iris, aes(x=Sepal.Length, y=Sepal.Width))

3.1 Add a Smoothing function

You can add a smoothing function over your points using different methods such as a linear regression, loess, gam, glm, or other functions. Note that you can use geom_smooth() on its own, but for this example I’ve added it in combination with points.

##linear regression
ggplot(data = iris, aes(Sepal.Length, Sepal.Width)) + geom_point() +
  geom_smooth(method = "lm", se = TRUE)

##loess
ggplot(data = iris, aes(Sepal.Length, Sepal.Width)) + geom_point() +
  geom_smooth(method = "loess", se = TRUE)

Since I put the smoothing function AFTER the point, you can see that it overlaps the points. If we flip the order, the points will be on top of the line. Since there are 3 species, we can add in color = Species within theaes of our geom_point. This will mean that the points will be colored by the Species. If we want to change the color overall, we can specify a color and outside of the aes but still within the geom function, we can write color = xxx. You can find out more about colors below!!

ggplot(iris, aes(Sepal.Length, Sepal.Width)) + 
  geom_smooth(method = "lm", se = TRUE, color="pink2") + 
  geom_point(aes(color=Species))

4 Line plots

There’s a couple ways to create lines. If you aren’t trying to fit a regression line, you can use geom_line() or geom_path. The difference between the two is that geom_path will connect the observations in the order that they appear in the data, whereas geom_line() connects observations in the order of the x-axis variables. Note thatgeom_path will look the same as geom_line() if your data is ordered.

##line
ggplot() + geom_line(data = iris, aes(Petal.Length, Petal.Width))

##path
ggplot() + geom_path(data = iris, aes(Petal.Length, Petal.Width))

4.1 adding intervals

To add shaded areas for confidence intervals, geom_ribbon() can be used— but you need to specify the limits. We can quickly make a calculations and use this. We can make calculations within the ggplot aesthetics (shown below), or using mutate the data with dplyr and use a pipe %>% to bring the data straight into ggplot() as shown at the end of this document. Notice that with these plots, I specified alpha = 0.3. The alpha argument changes the transparency of the geom and can be used with ANY geom.

ggplot(data = iris, aes(x=Petal.Length, y=Petal.Width)) + 
  geom_line() + 
  geom_ribbon(aes(ymin = (Petal.Width-sd(Petal.Width)), 
                  ymax = (Petal.Width+sd(Petal.Width))), 
                  alpha = 0.3)        ##change alpha

5 Density Plot

Density plots are a useful alternative to view the distributions of your data. If you use Bayesian methods, this will be 90% of your plots.

##simple
ggplot() + geom_density(data = iris, aes(x=Petal.Length))

##extra stuff
ggplot() + geom_density(data = iris, aes(x=Petal.Length, fill = Species),
                        alpha = 0.4)     ##to change transparency 

6 Boxplots

Boxplots are pretty simple to create and very useful for reasons you’ve probably heard in an introductory stats course.

ggplot() + geom_boxplot(data = iris, aes(x=Species, y=Petal.Width))

One alternative to using boxplots is to use a violin plot, which shows the density of your variable distributions rather than boxes based off quartiles. Again, we can change the color.

ggplot(data = iris, aes(x=Species, y=Petal.Width)) + 
  geom_violin(aes(fill=Species))

7 Correlation Plots

We can make correlation plots easily using the package ggcorrplot. With this package, we just use the ggcorplot() function on our correlation matrix and can specify to show all values, or just upper or lower. Here I show just the lower. This package is great because it integrates with ggplot so you can use other functions to dress up the plot. More on this below.

##make a correlation
cor <- cor(iris[,-c(5)])
ggcorrplot(cor, type = "lower")

8 Facet Wraps

Facet wrapping is a great way to separately show different groups within your data on the same plot. For example, if you have a bunch of lines that would otherwise overlap, you could separate them.

ggplot(iris, aes(Sepal.Length, Sepal.Width)) + 
  geom_smooth(method = "lm", se = TRUE, color="pink3") + 
  geom_point() + 
  facet_wrap(~Species)

Now, we can see it automatically uses the fixed axis scales across all facets. If you want to use different axes for your facets, you can change it using the argument scales = "free".

ggplot(iris, aes(Sepal.Length, Sepal.Width)) + 
  geom_smooth(method = "lm", se = TRUE, color="pink3") + 
  geom_point() + 
  facet_wrap(~Species, scales="free")

If you only want to change one axis, you can specify the x or y within the scales argument.

ggplot(iris, aes(Sepal.Length, Sepal.Width)) + 
  geom_smooth(method = "lm", se = TRUE, color="pink3") + 
  geom_point() + 
  facet_wrap(~Species, scales="free_x")

If you want to change the title of your facet boxes, you can do so by specifying the new names for each level then using the labeller argument. Note that if you change/specify your scales (more on this below), it will also interpret those labels.

##set names as on object
speciesnames <- c(setosa = "I. setosa", 
                  versicolor = "I. versicolor", 
                  virginica = "I. virginica")

ggplot(iris, aes(Sepal.Length, Sepal.Width)) + 
  geom_smooth(method = "lm", se = TRUE, color="pink3") + 
  geom_point() + 
  facet_wrap(~Species, scales="free_x",
             labeller = labeller(Species = speciesnames))

9 Themes + Aesthetics + Extra stuff

There’s a million different ways to make your plots look pretty. You can save plots as an object, and then add aesthetics after. I’ll show you an example by showing some of the different pre-set themes. Another quick note: when creating plots, you can save them as an object in your R environment. If you do this, you can just add more functions in the form of plotobject + function(). The example below showing different theme presets demonstrates this format.

9.1 Some Themes

##save a plot as an object
p1 <- ggplot(iris, aes(Sepal.Length, Sepal.Width, color=Species)) + 
  geom_point() 

##some themes
p1 + theme_bw()

p1 + theme_void()

p1 + theme_dark() 

9.2 Axis Labels

There’s a few different ways to change the labels of the axes. I usually use the function labs() to change the x, y, and title at the same time.

ggplot(iris, aes(Sepal.Length, Sepal.Width)) + 
  geom_smooth(method = "lm", se = TRUE, color="pink2") + 
  geom_point() +
  labs(x = "Sepal Length", y = "Sepal Width", title = "Plant Parts!")

9.3 Colors

You can use preset colors or a hexcode color— so in the geom_smooth, I picked the color “pink2”. I have two websites that I typically use to find colors: Colors in R or Hex Color Codes.

The functions depend on what you are trying to adjust (ie. fill? color?) and the type of scale. For our correlation plot for example, we have continuous data that I’d like to show with a gradient from blue to pink.

Two popular packages with preset color palettes are RColorBrewer and viridis, partially because they are integrated into ggplot2. RColorBrewer has a whole list of palettes that you can print while viridis palettes are designed to be color-blind friendly. Because these are integrated into ggplot2, you can use the argument scale_fill_viridis_c (ending in c for continuous, d for discrete, b for binned continuous data).

One thing to know, is that there are different ways to change the colors. Using fill which change the filled color while color will change the outlining color. Also- where you put the color argument matters. If you include it within the aes(), you should be using a factor variable (or one that can be coerced into one) whereas if it is outside the aes(), then it should be the name of the color you want to use. You can see the differences in the plots below.

## coloring points
ggplot(iris, aes(Sepal.Length, Sepal.Width)) + 
  geom_point(aes(color=Species)) + 
  scale_color_manual(values = c("pink3", "purple", "red"))

## correlation plot
ggcorrplot(corr = cor, type="lower") +  
  scale_fill_gradient2(low="cadetblue1", mid="slateblue", high="pink1",
                      limits = c(-1, +1))

##density plot
##fill by species
ggplot() + geom_density(data = iris, aes(x=Petal.Length, fill = Species),
                        alpha = 0.4,
                        color="grey60")

##fill as red outside the aes()
ggplot() + geom_density(data = iris, aes(x=Petal.Length),
                        alpha = 0.4,
                        color="grey60",
                        fill = "red")

##use group argument within aes() and fill outside
ggplot() + geom_density(data = iris, aes(x=Petal.Length, group=Species),
                        alpha = 0.4,
                        color="grey60",
                        fill = "red")

9.4 Manual Axes

We can also set manual axes. The arguments will be similar across different functions to set breaks, limits, minor breaks, axis labels, and the axis name. We can use this for x or y axes and discrete, continuous, or particular transformed axes. Below is a quick examples of some arguments.

p1 + 
  scale_color_manual(values = c("pink3", "purple", "red"), ##change colors
                     labels = c("I. setosa",         ##change legend labels
                                "I. cersicolor", 
                                "I. virginica")) +   
  scale_x_continuous(breaks = seq(3, 9, by = 1),     ##set breaks
                     limits = c(3, 9),               ##set limits
                     name = "Sepal Length")          ##can change axis name

9.5 Text + Legends

You can change the aesthetics of text- bold/italics, size, font, etc.

Going back to our facet wrap example, we can change the species names to italics.

ggplot(iris, aes(Sepal.Length, Sepal.Width)) + 
  geom_smooth(method = "lm", se = TRUE, color="pink3") + 
  geom_point() + 
  facet_wrap(~Species, scales="free_x",
             labeller = labeller(Species = speciesnames)) + 
  theme(strip.text.x = element_text(
        size = 12, color = "red", face = "bold.italic"
        ))

We can use an example with the legends. Note that for the names of the legend, there is \n between the legend name- this creates a line break! Using \n is universal syntax that can be used throughout different text arguments in ggplot2.

ggplot(iris, aes(Sepal.Length, Sepal.Width, color=Species)) + 
  geom_point() + 
  scale_color_manual(values = c("pink3", "purple", "red"), ##change colors
                     labels = c("I. setosa",         ##change legend labels
                                "I. cersicolor", 
                                "I. virginica"),
                     name = "Species\nNames") 

9.6 Example- Correlation plot

I was trying to make some cleaner correlation plots and found some code online that I adapted. We’ll use the correlation matrix we made earlier saved as cor.

library(reshape2)
cor_m <- cor
cor_m[upper.tri(cor_m)] <- NA
cor_m <- melt(cor_m, na.rm=T) 

ggplot(data = cor_m, aes(x = Var1, y = Var2, fill = value)) +
  geom_tile() +  
  scale_fill_gradient2(low="cadetblue1", mid="slateblue",
                                      high="pink1", 
                                      limits = c(-1, +1)) +
  labs(title = "Correlation Matrix of Iris Measurements", 
       x = "", y = "",   ##leave labels blank, otherwise would be Var1 and Var2
       fill = "Correlation \n Measure") +
  theme(legend.title = element_text(color="black", 
                                    size = 10,
                                    face = "bold")) +
  geom_text(aes(x = Var1, y = Var2, label = round(value, 2)), ##add values
            color = "black", 
            size = 5)

9.7 Arranging Plots

There’s a few different ways to arrange plots. Here I’ll show an example with cowplot package, but the gridExtra package is also pretty useful.

p2 <- ggplot(iris, aes(Sepal.Length, Sepal.Width, color=Species)) + 
  geom_point() + 
  scale_color_manual(values = c("pink3", "purple", "red"), ##change colors
                     labels = c("I. setosa",         ##change legend labels
                                "I. cersicolor", 
                                "I. virginica")) 

##cowplot
ggdraw() + draw_plot(p2) + draw_plot(p1, x = 0.725, y = 0.65, width = 0.3, 
                                     height = 0.3)

9.8 Animations

To make simple animations, we can create a plot like normally, but add in one extra line with the transition_reveal() function from gganimate package. There’s a lot you can do with the package, but below is a simple start. Simple animations are especially cool if you want to see changes over time.

ggplot() + geom_line(data = iris, aes(Petal.Length, Petal.Width)) +
  transition_reveal(along = Petal.Length)

##group/color by species
ggplot() + geom_line(data = iris, aes(Petal.Length, Petal.Width, color = Species)) +
  transition_reveal(along = Petal.Length)

10 Final Thoughts

ggplot2 is a super powerful package for visualizing data- from raw data to analysis results and even all sorts of spatial data! There are lots of neat tricks and lots of packages that build off the basics- a couple of which were shared today. RStudio has several cheatsheets available, including one for ggplot2 which you can find and download from here. It’s pretty useful and gives examples of each of the different available geoms.

Once you get the hang of creating plots, you can even start creating functions to make plots or manipulate your data with dplyr and use pipes (%>%) to specify the data you want to use. Below are a couple examples of using pipes.

library(dplyr)

##geom_point
iris %>% ggplot() + geom_point(aes(Petal.Length, Petal.Width, color = Species)) + 
  scale_color_manual(values = c("pink", "purple", "red"))

##geom_ribbon
##take data and calculate petal length standard error
iris %>% mutate(petal.sd = sd(Petal.Width)) %>% 
  ##pipe straight into ggplot() so it will be used at the data argument
  ggplot(aes(x=Petal.Length, y=Petal.Width)) + 
  geom_line() + 
  geom_ribbon(aes(ymin = (Petal.Width-petal.sd), 
                  ymax = (Petal.Width+petal.sd)), 
              alpha = 0.3)        ##change alpha