For this part, I’ve provided an introduction to making different
types of plots using ggplot2
and how to play with the
aesthetics to produce a nice graphics.
The “Grammar of Graphics” here used by ggplot2
works on
a few fundamental things- First you set it up with ggplot()
then use +
in between functions to link them. There’s lots
of different ways to make the same graphs, but generally the format will
look like ggplot(data, aes()) + geom_xxx()
. If you do this,
then each of the geom
functions will inherit the data and
aesthetics specified in the “parent” function so to speak. Note that you
can just write ggplot()
to create a “grob” then specify the
data and aesthetics in each geom
functions. If you do it
this way, you can use different data to plot on top of one another.
The aes()
is the most important part and where you
specify what will be your x
, y
, and other
arguments about group
, color
or
fill
. Later in the document, you’ll see some examples of
what each of these do!
We’ll first load the packages and the iris
that comes
with R. Two popular data sets that come with ggplot2 and commonly used
in examples are mpg
and diamond
. These data
are often used in the help section as well for various functions.
library(ggplot2) ## plotting!
library(ggcorrplot) ## correlation plots
library(gridExtra) ## arranging plots
library(cowplot) ## arranging plots
library(dplyr) ## data manipulation
library(gganimate) ## animations!!
data("iris") ## load iris data
There’s plenty of examples using these data sets, and you can find
out some more info in the help section by typing ?iris
.
I’ll introduce more about ggplot2
themes later on, but
for now I’ll set the theme that I prefer to use with the
theme_set()
function. I typically do this when loading
ggplot2
with the following code:
library(ggplot2); theme_set(theme_bw())
When creating histograms, you can use continuous data or make a bar
graph of counts of factors, which is why you can also use
geom_bar()
to make histograms with some extra arguments.
The default geom_histogram
function will set the size of
bins to 30. Sometimes this works, sometimes it’s not nice. You can
change the number of bins with bins = #
or change the size
of the bins with binwidth = #
.
##continuous data
##set binwidth
ggplot() + geom_histogram(data = iris, aes(x=Sepal.Width), binwidth = 0.5)
##set # of bins
ggplot() + geom_histogram(data = iris, aes(x=Sepal.Width), bins = 5)
#count/factor data
ggplot() + geom_histogram(data = iris, aes(x=Species), stat="count")
To make scatter plots, you can use the geom_point()
function.
ggplot() + geom_point(data = iris, aes(x=Sepal.Length, y=Sepal.Width))
You can add a smoothing function over your points using different
methods such as a linear regression, loess, gam, glm, or other
functions. Note that you can use geom_smooth()
on its own,
but for this example I’ve added it in combination with points.
##linear regression
ggplot(data = iris, aes(Sepal.Length, Sepal.Width)) + geom_point() +
geom_smooth(method = "lm", se = TRUE)
##loess
ggplot(data = iris, aes(Sepal.Length, Sepal.Width)) + geom_point() +
geom_smooth(method = "loess", se = TRUE)
Since I put the smoothing function AFTER the point, you can see that
it overlaps the points. If we flip the order, the points will be on top
of the line. Since there are 3 species, we can add in
color = Species
within theaes
of our
geom_point
. This will mean that the points will be colored
by the Species. If we want to change the color overall, we can specify a
color and outside of the aes
but still within the geom
function, we can write color = xxx
. You can find out more
about colors below!!
ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_smooth(method = "lm", se = TRUE, color="pink2") +
geom_point(aes(color=Species))
There’s a couple ways to create lines. If you aren’t trying to fit a
regression line, you can use geom_line()
or
geom_path
. The difference between the two is that
geom_path
will connect the observations in the order that
they appear in the data, whereas geom_line()
connects
observations in the order of the x-axis variables. Note
thatgeom_path
will look the same as
geom_line()
if your data is ordered.
##line
ggplot() + geom_line(data = iris, aes(Petal.Length, Petal.Width))
##path
ggplot() + geom_path(data = iris, aes(Petal.Length, Petal.Width))
To add shaded areas for confidence intervals,
geom_ribbon()
can be used— but you need to specify the
limits. We can quickly make a calculations and use this. We can make
calculations within the ggplot aesthetics (shown below), or using
mutate
the data with dplyr
and use a pipe
%>%
to bring the data straight into
ggplot()
as shown at the end of this document. Notice that
with these plots, I specified alpha = 0.3
. The
alpha
argument changes the transparency of the
geom
and can be used with ANY
geom
.
ggplot(data = iris, aes(x=Petal.Length, y=Petal.Width)) +
geom_line() +
geom_ribbon(aes(ymin = (Petal.Width-sd(Petal.Width)),
ymax = (Petal.Width+sd(Petal.Width))),
alpha = 0.3) ##change alpha
Density plots are a useful alternative to view the distributions of your data. If you use Bayesian methods, this will be 90% of your plots.
##simple
ggplot() + geom_density(data = iris, aes(x=Petal.Length))
##extra stuff
ggplot() + geom_density(data = iris, aes(x=Petal.Length, fill = Species),
alpha = 0.4) ##to change transparency
Boxplots are pretty simple to create and very useful for reasons you’ve probably heard in an introductory stats course.
ggplot() + geom_boxplot(data = iris, aes(x=Species, y=Petal.Width))
One alternative to using boxplots is to use a violin plot, which shows the density of your variable distributions rather than boxes based off quartiles. Again, we can change the color.
ggplot(data = iris, aes(x=Species, y=Petal.Width)) +
geom_violin(aes(fill=Species))
We can make correlation plots easily using the package
ggcorrplot
. With this package, we just use the
ggcorplot()
function on our correlation matrix and can
specify to show all values, or just upper or lower. Here I show just the
lower. This package is great because it integrates with
ggplot
so you can use other functions to dress up the plot.
More on this below.
##make a correlation
<- cor(iris[,-c(5)])
cor ggcorrplot(cor, type = "lower")
Facet wrapping is a great way to separately show different groups within your data on the same plot. For example, if you have a bunch of lines that would otherwise overlap, you could separate them.
ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_smooth(method = "lm", se = TRUE, color="pink3") +
geom_point() +
facet_wrap(~Species)
Now, we can see it automatically uses the fixed axis scales across
all facets. If you want to use different axes for your facets, you can
change it using the argument scales = "free"
.
ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_smooth(method = "lm", se = TRUE, color="pink3") +
geom_point() +
facet_wrap(~Species, scales="free")
If you only want to change one axis, you can specify the x or y within the scales argument.
ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_smooth(method = "lm", se = TRUE, color="pink3") +
geom_point() +
facet_wrap(~Species, scales="free_x")
If you want to change the title of your facet boxes, you can do so by
specifying the new names for each level then using the
labeller
argument. Note that if you change/specify your
scales (more on this below), it will also interpret those labels.
##set names as on object
<- c(setosa = "I. setosa",
speciesnames versicolor = "I. versicolor",
virginica = "I. virginica")
ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_smooth(method = "lm", se = TRUE, color="pink3") +
geom_point() +
facet_wrap(~Species, scales="free_x",
labeller = labeller(Species = speciesnames))
There’s a million different ways to make your plots look pretty. You
can save plots as an object, and then add aesthetics after. I’ll show
you an example by showing some of the different pre-set themes. Another
quick note: when creating plots, you can save them as an object in your
R environment. If you do this, you can just add more functions in the
form of plotobject + function()
. The example below showing
different theme presets demonstrates this format.
##save a plot as an object
<- ggplot(iris, aes(Sepal.Length, Sepal.Width, color=Species)) +
p1 geom_point()
##some themes
+ theme_bw() p1
+ theme_void() p1
+ theme_dark() p1
There’s a few different ways to change the labels of the axes. I
usually use the function labs()
to change the x, y, and
title at the same time.
ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_smooth(method = "lm", se = TRUE, color="pink2") +
geom_point() +
labs(x = "Sepal Length", y = "Sepal Width", title = "Plant Parts!")
You can use preset colors or a hexcode color— so in the
geom_smooth
, I picked the color “pink2”. I have two
websites that I typically use to find colors: Colors in R or Hex Color Codes.
The functions depend on what you are trying to adjust (ie. fill? color?) and the type of scale. For our correlation plot for example, we have continuous data that I’d like to show with a gradient from blue to pink.
Two popular packages with preset color palettes are
RColorBrewer
and viridis
, partially because
they are integrated into ggplot2
. RColorBrewer
has a whole list of palettes that you can print while
viridis
palettes are designed to be color-blind friendly.
Because these are integrated into ggplot2
, you can use the
argument scale_fill_viridis_c
(ending in c for continuous,
d for discrete, b for binned continuous data).
One thing to know, is that there are different ways to change the
colors. Using fill
which change the filled color while
color
will change the outlining color. Also- where you put
the color argument matters. If you include it within
the aes()
, you should be using a factor variable (or one
that can be coerced into one) whereas if it is outside
the aes()
, then it should be the name of the color you want
to use. You can see the differences in the plots below.
## coloring points
ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_point(aes(color=Species)) +
scale_color_manual(values = c("pink3", "purple", "red"))
## correlation plot
ggcorrplot(corr = cor, type="lower") +
scale_fill_gradient2(low="cadetblue1", mid="slateblue", high="pink1",
limits = c(-1, +1))
##density plot
##fill by species
ggplot() + geom_density(data = iris, aes(x=Petal.Length, fill = Species),
alpha = 0.4,
color="grey60")
##fill as red outside the aes()
ggplot() + geom_density(data = iris, aes(x=Petal.Length),
alpha = 0.4,
color="grey60",
fill = "red")
##use group argument within aes() and fill outside
ggplot() + geom_density(data = iris, aes(x=Petal.Length, group=Species),
alpha = 0.4,
color="grey60",
fill = "red")
We can also set manual axes. The arguments will be similar across different functions to set breaks, limits, minor breaks, axis labels, and the axis name. We can use this for x or y axes and discrete, continuous, or particular transformed axes. Below is a quick examples of some arguments.
+
p1 scale_color_manual(values = c("pink3", "purple", "red"), ##change colors
labels = c("I. setosa", ##change legend labels
"I. cersicolor",
"I. virginica")) +
scale_x_continuous(breaks = seq(3, 9, by = 1), ##set breaks
limits = c(3, 9), ##set limits
name = "Sepal Length") ##can change axis name
You can change the aesthetics of text- bold/italics, size, font, etc.
Going back to our facet wrap example, we can change the species names to italics.
ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_smooth(method = "lm", se = TRUE, color="pink3") +
geom_point() +
facet_wrap(~Species, scales="free_x",
labeller = labeller(Species = speciesnames)) +
theme(strip.text.x = element_text(
size = 12, color = "red", face = "bold.italic"
))
We can use an example with the legends. Note that for the names of
the legend, there is \n
between the legend name- this
creates a line break! Using \n
is universal syntax that can
be used throughout different text arguments in ggplot2
.
ggplot(iris, aes(Sepal.Length, Sepal.Width, color=Species)) +
geom_point() +
scale_color_manual(values = c("pink3", "purple", "red"), ##change colors
labels = c("I. setosa", ##change legend labels
"I. cersicolor",
"I. virginica"),
name = "Species\nNames")
I was trying to make some cleaner correlation plots and found some
code online that I adapted. We’ll use the correlation matrix we made
earlier saved as cor
.
library(reshape2)
<- cor
cor_m upper.tri(cor_m)] <- NA
cor_m[<- melt(cor_m, na.rm=T)
cor_m
ggplot(data = cor_m, aes(x = Var1, y = Var2, fill = value)) +
geom_tile() +
scale_fill_gradient2(low="cadetblue1", mid="slateblue",
high="pink1",
limits = c(-1, +1)) +
labs(title = "Correlation Matrix of Iris Measurements",
x = "", y = "", ##leave labels blank, otherwise would be Var1 and Var2
fill = "Correlation \n Measure") +
theme(legend.title = element_text(color="black",
size = 10,
face = "bold")) +
geom_text(aes(x = Var1, y = Var2, label = round(value, 2)), ##add values
color = "black",
size = 5)
There’s a few different ways to arrange plots. Here I’ll show an
example with cowplot
package, but the
gridExtra
package is also pretty useful.
<- ggplot(iris, aes(Sepal.Length, Sepal.Width, color=Species)) +
p2 geom_point() +
scale_color_manual(values = c("pink3", "purple", "red"), ##change colors
labels = c("I. setosa", ##change legend labels
"I. cersicolor",
"I. virginica"))
##cowplot
ggdraw() + draw_plot(p2) + draw_plot(p1, x = 0.725, y = 0.65, width = 0.3,
height = 0.3)
To make simple animations, we can create a plot like normally, but
add in one extra line with the transition_reveal()
function
from gganimate
package. There’s a lot you can do with the
package, but below is a simple start. Simple animations are especially
cool if you want to see changes over time.
ggplot() + geom_line(data = iris, aes(Petal.Length, Petal.Width)) +
transition_reveal(along = Petal.Length)
##group/color by species
ggplot() + geom_line(data = iris, aes(Petal.Length, Petal.Width, color = Species)) +
transition_reveal(along = Petal.Length)
ggplot2
is a super powerful package for visualizing
data- from raw data to analysis results and even all sorts of spatial
data! There are lots of neat tricks and lots of packages that build off
the basics- a couple of which were shared today. RStudio has several
cheatsheets available, including one for ggplot2
which you
can find and download from here.
It’s pretty useful and gives examples of each of the different available
geoms
.
Once you get the hang of creating plots, you can even start creating
functions to make plots or manipulate your data with dplyr
and use pipes (%>%
) to specify the data you want to use.
Below are a couple examples of using pipes.
library(dplyr)
##geom_point
%>% ggplot() + geom_point(aes(Petal.Length, Petal.Width, color = Species)) +
iris scale_color_manual(values = c("pink", "purple", "red"))
##geom_ribbon
##take data and calculate petal length standard error
%>% mutate(petal.sd = sd(Petal.Width)) %>%
iris ##pipe straight into ggplot() so it will be used at the data argument
ggplot(aes(x=Petal.Length, y=Petal.Width)) +
geom_line() +
geom_ribbon(aes(ymin = (Petal.Width-petal.sd),
ymax = (Petal.Width+petal.sd)),
alpha = 0.3) ##change alpha