class: center, middle, inverse, title-slide .title[ # Visualizing data with ggplot2
👨🎨 ] .author[ ### S. Mason Garrison ] --- layout: true <div class="my-footer"> <span> <a href="https://DataScience4Psych.github.io/DataScience4Psych/" target="_blank">S. Mason Garrison</a> </span> </div> --- class: middle # ggplot2 ❤️ ♊ --- ## ggplot2 `\(\in\)` tidyverse .pull-left-narrow[ <img src="img/ggplot2-part-of-tidyverse.png" width="60%" style="display: block; margin: auto;" /> ] <!-- markdownlint-disable error --> .pull-right-wide[ - **ggplot2** is tidyverse's data visualization package - Structure of the code for plots can be summarized as ``` r ggplot(data = [[dataset]], mapping = aes(x = [[x-variable]], y = [[y-variable]])) + geom_xxx() + other options ``` ] <!-- markdownlint-enable --> --- ## Data: Australian National Health and Medical Research Council Twin Registry Measurements for 3,808 Australian twin pairs with lots of biometric data (e.g., body mass index (BMI)) <br> .pull-left[ <img src="img/cartoony image of Australian twins.png" width="65%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="img/cartoony image of Australian twins_v2.png" width="65%" style="display: block; margin: auto;" /> ] .footnote[This is what DALL-E thinks "cartoony Austrialian twins" look like] --- ## Australian National Health and Medical Research Council Twin Registry Measurements for 3,808 Australian twin pairs with lots of biometric data (e.g., body mass index (BMI)) ``` r glimpse(twinData) ``` ``` ## Rows: 3,808 ## Columns: 16 ## $ fam <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,… ## $ age <int> 21, 24, 21, 21, 19, 26, 23, 29, 24, 28, 29, 19… ## $ zyg <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… ## $ part <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2… ## $ wt1 <int> 58, 54, 55, 66, 50, 60, 65, 40, 60, 76, 48, 70… ## $ wt2 <int> 57, 53, 50, 76, 48, 60, 65, 39, 57, 64, 51, 67… ## $ ht1 <dbl> 1.7000, 1.6299, 1.6499, 1.5698, 1.6099, 1.5999… ## $ ht2 <dbl> 1.7000, 1.6299, 1.6799, 1.6499, 1.6299, 1.5698… ## $ htwt1 <dbl> 20.0692, 20.3244, 20.2020, 26.7759, 19.2894, 2… ## $ htwt2 <dbl> 19.7232, 19.9481, 17.7154, 27.9155, 18.0662, 2… ## $ bmi1 <dbl> 20.9943, 21.0828, 21.0405, 23.0125, 20.7169, 2… ## $ bmi2 <dbl> 20.8726, 20.9519, 20.1210, 23.3043, 20.2583, 2… ## $ cohort <chr> "younger", "younger", "younger", "younger", "y… ## $ zygosity <fct> MZFF, MZFF, MZFF, MZFF, MZFF, MZFF, MZFF, MZFF… ## $ age1 <int> 21, 24, 21, 21, 19, 26, 23, 29, 24, 28, 29, 19… ## $ age2 <int> 21, 24, 21, 21, 19, 26, 23, 29, 24, 28, 29, 19… ``` .footnote[This is what Austrialian twins actually look like] --- # Plot <img src="00_ggplot2_files/figure-html/unnamed-chunk-8-1.png" width="70%" style="display: block; margin: auto;" /> --- # Code ``` r ggplot(data = twinData_cleaned, mapping = aes( x = height_t1, y = height_t2, color = zyg)) + geom_point() + labs(title = "Height Comparison between Twins", subtitle = "by zygosity", x = "Height of Twin 1 (m)", y = "Height of Twin 2 (m)", color = "Zygosity", caption = "Source: Australian National Health and Medical Research Council Twin Registry / OpenMx package") + scale_color_viridis_d() ``` --- class: middle # Wrapping Up... --- class: middle # Coding out loud --- .midi[ > **Start with the `twinData_cleaned` data frame** ] .pull-left[ ``` r *ggplot(data = twinData_cleaned) ``` ] .pull-right[ <img src="00_ggplot2_files/figure-html/unnamed-chunk-9-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `twinData_cleaned` data frame, > **map height of Twin 1 to the x-axis** ] .pull-left[ ``` r ggplot(data = twinData_cleaned, * mapping = aes(x = height_t1)) ``` ] .pull-right[ <img src="00_ggplot2_files/figure-html/unnamed-chunk-10-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `twinData_cleaned` data frame, > map height of Twin 1 to the x-axis > **and map height of Twin 2 to the y-axis.** ] .pull-left[ ``` r ggplot(data = twinData_cleaned, mapping = aes(x = height_t1, * y = height_t2)) ``` ] .pull-right[ <img src="00_ggplot2_files/figure-html/unnamed-chunk-11-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `twinData_cleaned` data frame, > map height of Twin 1 to the x-axis > and map height of Twin 2 to the y-axis. > **Represent each observation with a point** ] .pull-left[ ``` r ggplot(data = twinData_cleaned, mapping = aes(x = height_t1, y = height_t2)) + * geom_point() ``` ] .pull-right[ <img src="00_ggplot2_files/figure-html/unnamed-chunk-12-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `twinData_cleaned` data frame, > map height of Twin 1 to the x-axis > and map height of Twin 2 to the y-axis. > Represent each observation with a point > **and map zygosity to the color of each point.** ] .pull-left[ ``` r ggplot(data = twinData_cleaned, mapping = aes( x = height_t1, y = height_t2, * color = zyg)) + geom_point() ``` ] .pull-right[ <img src="00_ggplot2_files/figure-html/unnamed-chunk-13-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `twinData_cleaned` data frame, > map height of Twin 1 to the x-axis > and map height of Twin 2 to the y-axis. > Represent each observation with a point > and map zygosity to the color of each point. > **Title the plot "Height Comparison between Twins"** ] .pull-left[ ``` r ggplot(data = twinData_cleaned, mapping = aes( x = height_t1, y = height_t2, color = zyg)) + geom_point() + * labs(title = "Height Comparison between Twins") ``` ] .pull-right[ <img src="00_ggplot2_files/figure-html/unnamed-chunk-14-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `twinData_cleaned` data frame, > map height of Twin 1 to the x-axis > and map height of Twin 2 to the y-axis. > Represent each observation with a point > and map zygosity to the color of each point. > Title the plot "Height Comparison between Twins", > **add the subtitle "by zygosity"** ] .pull-left[ ``` r ggplot(data = twinData_cleaned, mapping = aes( x = height_t1, y = height_t2, color = zyg)) + geom_point() + labs(title = "Height Comparison between Twins", * subtitle = "by zygosity") ``` ] .pull-right[ <img src="00_ggplot2_files/figure-html/unnamed-chunk-15-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `twinData_cleaned` data frame, > map height of Twin 1 to the x-axis, > and map height of Twin 2 to the y-axis. > Represent each observation with a point, > and map zygosity to the color of each point. > Title the plot "Height Comparison between Twins", > add the subtitle "by zygosity" > **label the x and y axes as "Height of Twin 1 (m)" and "Height of Twin 2 (m)", respectively** ] .pull-left[ ``` r ggplot(data = twinData_cleaned, mapping = aes( x = height_t1, y = height_t2, color = zyg)) + geom_point() + labs(title = "Height Comparison between Twins", subtitle = "by zygosity", * x = "Height of Twin 1 (m)", y = "Height of Twin 2 (m)") ``` ] .pull-right[ <img src="00_ggplot2_files/figure-html/unnamed-chunk-16-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `twinData_cleaned` data frame, > map height of Twin 1 to the x-axis, > and map height of Twin 2 to the y-axis. > Represent each observation with a point, > and map zygosity to the color of each point. > Title the plot "Height Comparison between Twins", > add the subtitle "by zygosity" > label the x and y axes as "Height of Twin 1 (m)" and "Height of Twin 2 (m)", respectively, > **label the legend "Zygosity"** ] .pull-left[ ``` r ggplot(data = twinData_cleaned, mapping = aes( x = height_t1, y = height_t2, color = zyg)) + geom_point() + labs(title = "Height Comparison between Twins", subtitle = "by zygosity", x = "Height of Twin 1 (m)", y = "Height of Twin 2 (m)", * color = "Zygosity") ``` ] .pull-right[ <img src="00_ggplot2_files/figure-html/unnamed-chunk-17-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `twinData_cleaned` data frame, > map height of Twin 1 to the x-axis, > and map height of Twin 2 to the y-axis. > Represent each observation with a point, > and map zygosity to the color of each point. > Title the plot "Height Comparison between Twins", > add the subtitle "by zygosity" > label the x and y axes as "Height of Twin 1 (m)" and "Height of Twin 2 (m)", respectively, > label the legend "Zygosity", > **and add a caption for the data source.** ] .pull-left[ ``` r ggplot(data = twinData_cleaned, mapping = aes( x = height_t1, y = height_t2, color = zyg)) + geom_point() + labs(title = "Height Comparison between Twins", subtitle = "by zygosity", x = "Height of Twin 1 (m)", y = "Height of Twin 2 (m)", color = "Zygosity", * caption = "Source: Australian National Health and Medical Research Council Twin Registry / OpenMx package") ``` ] .pull-right[ <img src="00_ggplot2_files/figure-html/unnamed-chunk-18-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .midi[ > Start with the `twinData_cleaned` data frame, > map height of Twin 1 to the x-axis, > and map height of Twin 2 to the y-axis. > Represent each observation with a point, > and map zygosity to the color of each point. > Title the plot "Height Comparison between Twins", > add the subtitle "by zygosity" > label the x and y axes as "Height of Twin 1 (m)" and "Height of Twin 2 (m)", respectively, > label the legend "Zygosity", > and add a caption for the data source. > **Finally, use a discrete color scale that is designed to be perceived by viewers with common forms of color blindness.** ] .pull-left[ ``` r ggplot(data = twinData_cleaned, mapping = aes( x = height_t1, y = height_t2, color = zyg)) + geom_point() + labs(title = "Height Comparison between Twins", subtitle = "by zygosity", x = "Height of Twin 1 (m)", y = "Height of Twin 2 (m)", color = "Zygosity", caption = "Source: Australian National Health and Medical Research Council Twin Registry / OpenMx package") + * scale_color_viridis_d() ``` ] .pull-right[ <img src="00_ggplot2_files/figure-html/unnamed-chunk-19-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Plot <img src="00_ggplot2_files/figure-html/unnamed-chunk-20-1.png" width="70%" style="display: block; margin: auto;" /> --- # Code ``` r ggplot(data = twinData_cleaned, mapping = aes( x = height_t1, y = height_t2, color = zyg)) + geom_point() + labs(title = "Height Comparison between Twins", subtitle = "by zygosity", x = "Height of Twin 1 (m)", y = "Height of Twin 2 (m)", color = "Zygosity", caption = "Source: Australian National Health and Medical Research Council Twin Registry / OpenMx package") + scale_color_viridis_d() ``` --- # Narrative .midi[ + Start with the `twinData_cleaned` data frame, map height of Twin 1 to the x-axis, and map height of Twin 2 to the y-axis. + Represent each observation with a point, and map zygosity to the color of each point. + Title the plot "Height Comparison between Twins", add the subtitle "by zygosity" label the x and y axes as "Height of Twin 1 (m)" and "Height of Twin 2 (m)", respectively, label the legend "Zygosity", and add a caption for the data source. + Finally, use a discrete color scale that is designed to be perceived by viewers with common forms of color blindness. ] --- ## Argument names .tip[ You can omit the names of first two arguments when building plots with `ggplot()`. ] .pull-left[ ``` r ggplot(data = twinData_cleaned, mapping = aes(x = height_t1, y = height_t1, color = zyg)) + geom_point() + scale_color_viridis_d() ``` ] .pull-right[ ``` r ggplot(twinData_cleaned, aes(x = height_t1, y = height_t1, color = zyg)) + geom_point() + scale_color_viridis_d() ``` ] --- class: middle # Wrapping Up... --- class: middle # Aesthetics --- ## Aesthetics options Commonly used characteristics of plotting characters that can be **mapped to a specific variable** in the data are - `color` - `shape` - `size` - `alpha` (transparency) --- ## Color .pull-left[ ``` r ggplot(twinData_cleaned, aes(x = height_t1, y = height_t2, * color = zyg)) + geom_point() + scale_color_viridis_d() ``` ] .pull-right[ <img src="00_ggplot2_files/figure-html/unnamed-chunk-21-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Shape Mapped to a different variable than `color` .pull-left[ ``` r ggplot(twinData_cleaned, aes(x = height_t1, y = height_t2, color = zyg, * shape = cohort)) + geom_point() + scale_color_viridis_d() ``` ] .pull-right[ <img src="00_ggplot2_files/figure-html/unnamed-chunk-22-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Shape Mapped to same variable as `color` .pull-left[ ``` r ggplot(twinData_cleaned, aes(x = height_t1, y = height_t2, color = zyg, * shape = zyg)) + geom_point() + scale_color_viridis_d() ``` ] .pull-right[ <img src="00_ggplot2_files/figure-html/unnamed-chunk-23-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Size .pull-left[ ``` r ggplot(twinData_cleaned, aes(x = height_t1, y = height_t2, color = zyg, shape = zyg, * size = age)) + geom_point() + scale_color_viridis_d() ``` ] .pull-right[ <img src="00_ggplot2_files/figure-html/unnamed-chunk-24-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Alpha .pull-left[ ``` r ggplot(twinData_cleaned, aes(x = height_t1, y = height_t2, color = zyg, shape = zyg, size = age, * alpha = family)) + geom_point() + scale_color_viridis_d() ``` ] .pull-right[ <img src="00_ggplot2_files/figure-html/unnamed-chunk-25-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .pull-left[ **Mapping** ``` r ggplot(twinData_cleaned, aes(x = height_t1, y = height_t2, * size = age, * alpha = family)) + geom_point() ``` <img src="00_ggplot2_files/figure-html/unnamed-chunk-26-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ **Setting** ``` r ggplot(twinData_cleaned, aes(x = height_t1, y = height_t2)) + * geom_point(size = 2, alpha = 0.5) ``` <img src="00_ggplot2_files/figure-html/unnamed-chunk-27-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Mapping vs. setting - **Mapping:** Determine the size, alpha, etc. of points based on the values of a variable in the data - goes into `aes()` - **Setting:** Determine the size, alpha, etc. of points **not** based on the values of a variable in the data - goes into `geom_*()` - (in the previous example, we used `geom_point()` , - but we'll learn about other geoms soon!) --- class: middle # Faceting --- ## Faceting - Smaller plots that display different subsets of the data - Useful for exploring conditional relationships and large data --- ### Plot <img src="00_ggplot2_files/figure-html/unnamed-chunk-28-1.png" width="70%" style="display: block; margin: auto;" /> --- ### Code ``` r ggplot(twinData_cleaned, aes(x = height_t1, y = height_t2)) + geom_point() + * facet_grid(cohort ~ zyg) ``` --- ## Various ways to facet In the next few slides describe what each plot displays. Think about how the code relates to the output. .question[ **Note:** The plots in the next few slides do not have proper titles, axis labels, etc. because we want you to figure out what's happening in the plots. But you should always label your plots! ] --- ``` r ggplot(twinData_cleaned, aes(x = height_t1, y = height_t2)) + geom_point() + * facet_grid(cohort ~ sex) ``` <img src="00_ggplot2_files/figure-html/unnamed-chunk-29-1.png" width="60%" style="display: block; margin: auto;" /> --- ``` r ggplot(twinData_cleaned, aes(x = height_t1, y = height_t2)) + geom_point() + * facet_grid(sex ~ cohort) ``` <img src="00_ggplot2_files/figure-html/unnamed-chunk-30-1.png" width="60%" style="display: block; margin: auto;" /> --- ``` r ggplot(twinData_cleaned, aes(x = height_t1, y = height_t2)) + geom_point() + * facet_wrap(~ cohort) ``` <img src="00_ggplot2_files/figure-html/unnamed-chunk-31-1.png" width="60%" style="display: block; margin: auto;" /> --- ``` r ggplot(twinData_cleaned, aes(x = height_t1, y = height_t2)) + geom_point() + * facet_grid(. ~ cohort) ``` <img src="00_ggplot2_files/figure-html/unnamed-chunk-32-1.png" width="60%" style="display: block; margin: auto;" /> --- ``` r ggplot(twinData_cleaned, aes(x = height_t1, y = height_t2)) + geom_point() + * facet_wrap(~ cohort, ncol = 2) ``` <img src="00_ggplot2_files/figure-html/unnamed-chunk-33-1.png" width="60%" style="display: block; margin: auto;" /> --- ## Faceting summary - `facet_grid()`: - 2d grid - `rows ~ cols` - use `.` for no split - `facet_wrap()`: 1d ribbon wrapped according to number of rows and columns specified or available plotting area --- ## Facet and color .pull-left-narrow[ ``` r ggplot( twinData_cleaned, aes(x = height_t1, y = height_t2, * color = zyg)) + geom_point() + facet_grid(cohort ~ sex) + * scale_color_viridis_d() ``` ] .pull-right-wide[ <img src="00_ggplot2_files/figure-html/unnamed-chunk-34-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Facet and color, no legend .pull-left-narrow[ ``` r ggplot( twinData_cleaned, aes(x = height_t1, y = height_t2, color = zyg)) + geom_point() + facet_grid(cohort ~ sex) + scale_color_viridis_d() + * guides(color = FALSE) ``` ] .pull-right-wide[ <img src="00_ggplot2_files/figure-html/unnamed-chunk-35-1.png" width="100%" style="display: block; margin: auto;" /> ] --- class: middle # Wrapping Up...