R has a number of graphing libraries, including *base* graphics that are installed whenever you install R.

ggplot2, is a graphing library in R that makes beautiful graphs. ggplot2 graph syntax can be formidably complex, with a somewhat steep learning curve.

That being said, learning ggplot2 is worth the effort for a couple of reasons. First, the graphs are beautiful. Second, ggplot2’s syntax, though seemingly arcane at times, forces you to think about the nature of your data, and the ideas that you are graphing. Lastly, a little bit of knowledge about ggplot2 can go a long way, and can build a powerful foundation for future learning.

The intent of this tutorial is to build the foundation of this idea that:

A little bit of ggplot can go a long way

and to give you a simple introduction to the idea that any ggplot graph is composed of:

an

`aesthetic`

+`a geom or two`

+`other optional elements like titles and themes`

.

So, as a quick and simple example…

`ggplot(my_demo_data,`

(the data that I am using)

`aes(x = my_outcome)) +`

(aesthetic: what I am graphing)

`geom_dotplot(fill = "purple")`

(geom: how I am graphing it)

This document is a *very brief* introduction to the *basic* ideas of ggplot2. More information about ggplot can be found here. More ggplot2 examples can be found here.

In PDF versions of this document, the R code is automatically displayed. If you are looking at a webpage version of this document, click the **Code** buttons on the right to see the actual code.

You will need a few R libraries to work in ggplot.

```
library(ggplot2) # beautiful graphs
library(ggthemes) # nice themes for ggplot2
library(ggbeeswarm) # "beeswarm" plots
library(cowplot) # arrrange graphs
library(pander) # nice tables
library(psych) # nice table of descriptive statistics
```

In this example, we simulate some data. But your own learning of ggplot will progress more quickly if you use data that you have access to, on an issue that you care about.

Here are the first few rows of simulated data:

```
# simulated data
N <- 100 # set sample size
predictor <- rnorm(n=N, mean = 100, sd = 25) # n, mean, sd
group <- rbinom(n=N, 1, .5) # n, number of trials, probability
outcome <- predictor +
10 * group +
rnorm(n=N,
mean = 0,
sd = 15) # outcome is a function of predictor + group + error
group <- factor(group)
mydata <- data.frame(predictor, outcome, group) # make data frame
pander(head(mydata)) # nice looking table of first few rows of data
```

predictor | outcome | group |
---|---|---|

115.8 | 127.3 | 0 |

107.4 | 118.3 | 0 |

92.32 | 105.2 | 1 |

119.8 | 109.9 | 0 |

121.6 | 137.4 | 1 |

62.45 | 57.42 | 1 |

There are 3 essential elements to any ggplot call:

- An
*aesthetic*that tells ggplot which variables are being mapped to the*x axis*,*y axis*, (and often other attributes of the graph, such as the*color fill*). Intuitively, the aesthetic can be thought of as*what you are graphing*. - A
*geom*or*geometry*that tells ggplot about the basic structure of the graph. Intuitively, the geom can be thought of as*how you are graphing it*. - Other options, such as a
*graph title*,*axis labels*and*overall theme*for the graph.

For one variable:

`p <- ggplot(mydata, aes(x = ...))`

This says there is only one variable running along the horizontal *x* axis in the aesthetic.

The

`p <-...`

means that we areassigningthis graph aesthetic to plotp. We can then add other features to plotpas we continue our work. Thisiterativenature of ggplot2 is one of the things that makes it so powerful. As your workflow and your documents become more complex, you can build a simple consistent foundation^{1}for your graphs, then add something simple to make a first graph, and a different something simple to make a second graph.

For two variables:

`p <- ggplot(mydata, aes(x = ..., y = ...))`

This says there are two variables: one for the horizontal *x* axis; and another for the vertical *y* axis, in the aesthetic.

We can then add different geometries to our plot:

For one var