Intro to R: ggplot2
ggplot2 is a package for data visualization. ggplot2 is based on the grammar of graphics, Plot= data+ Aesthetics+ Geometry. This article will introduce this powerful R visualization package step by step from the two basic elements.
- Data and Mapping
Data: the data used for drawing graphics. This article mainly uses the inherent mtcars dataset and diamonds dataset as examples to draw graphics.
Mapping: aes() function is the mapping function in ggplot2. Mapping is a corresponding relationship in the process of associating the data in the dataset with the corresponding graph attributes. The color, shape, grouping of the graph can be mapped through the variables in the dataset.
Here we use the diamonds dataset as data and use mapping to set carat as x-axis variable and price as y-axis variable.
- Geometric
We could use geom function to represent data points, use geom’s aesthetics properties to represent variables.
Here are some common geom functions:
geom_histogram()
Histogram can be used to display continuous single variable to draw the distribution of data.
geom_bar()
Bar plot can be used to display categorical variable and its relative count.
geom_boxplot()
The box plot describes the distribution of variable values by drawing the total of five numbers of observation data, namely minimum, lower quartile, median, upper quartile and maximum. At the same time, box plot can show outliers, and outliers in the data can be easily identified by box plot.
geom_point()
Dot plot can be used to display the relationship over two numerical variables.
For now we have introduced how to draw the simplest graph in R. In the next blog, I will continue to introduce some more advanced functions to make the graph more informative.