Introduction
This vignette walks through the steps needed to recreate a figure
from Garrison
and Rodgers (2016). The example uses kinship pair data prepared with
the discord package and visualizes the results using updated
ggplot2 tools.
Data Preparation
Data Cleaning
This section reuses the data preparation pipeline developed in the regression vignette.
That vignette demonstrated how to set up data for discordant regression analysis by using discord data processing tools. Those tools facilitate the construction of kinship links, including identifying sibling pairs, merging sibling characteristics, and calculating pair-level variables.
Here, we reuse that same pipeline to prepare the data for plotting. Specifically, we apply the same kinship pairing, data merging, and cleaning procedures, but our focus is now on visualizing patterns rather than fitting regression models.
The underlying dataset is the NLSY79, which includes measures of flu vaccination and socioeconomic status (SES) for kinship pairs. As in the regression vignette, we restrict the sample to individuals who are housemates and have a relatedness of 0.5.
For full details on the data processing and kinship link construction, see the regression vignette.
## Setup: Use discord_data
# Visualizing the Results
library(discord)
library(NlsyLinks)
library(tidyverse)
library(ggplot2)
library(grid)
library(gridExtra)
library(janitor)
# Load the data
data(data_flu_ses)
# Get kinship links for individuals with the following variables:
link_vars <- c(
"FLU_total", "FLU_2008", "FLU_2010",
"FLU_2012", "FLU_2014", "FLU_2016",
"S00_H40", "RACE", "SEX"
)
link_pairs <- Links79PairExpanded %>%
filter(RelationshipPath == "Gen1Housemates" & RFull == 0.5)
df_link <- CreatePairLinksSingleEntered(
outcomeDataset = data_flu_ses,
linksPairDataset = link_pairs,
outcomeNames = link_vars
)
consistent_kin <- df_link %>%
group_by(SubjectTag_S1, SubjectTag_S2) %>%
count(
FLU_2008_S1, FLU_2010_S1,
FLU_2012_S1, FLU_2014_S1,
FLU_2016_S1, FLU_2008_S2,
FLU_2010_S2, FLU_2012_S2,
FLU_2014_S2, FLU_2016_S2
) %>%
na.omit()
# Create the flu_modeling_data object with only consistent responders.
# Clean the column names with the {janitor} package.
flu_modeling_data <- semi_join(df_link,
consistent_kin,
by = c(
"SubjectTag_S1",
"SubjectTag_S2"
)
) %>%
clean_names()Creating the Discord Data
With the data prepared, we restructure it using
discord_data().
discord_flu <- discord_data(
data = flu_modeling_data,
outcome = "flu_total",
predictors = "s00_h40",
id = "extended_id",
sex = "sex",
race = "race",
pair_identifiers = c("_s1", "_s2"),
demographics = "both"
) %>%
filter(!is.na(s00_h40_mean), !is.na(flu_total_mean))Because we are interested in differences between kin, we create a new
variable, ses_diff_group, that classifies SES differences
into three categories: “More Advantage”, “Equally Advantage”, and “Less
Advantage”. This variable is later used to group observations in the
marginal density plots.
discord_flu <- discord_flu %>%
mutate(
# # Classify Difference Grouping
ses_diff_group = factor(
case_when(
scale(s00_h40_diff) > 0.33 ~ "More Advantage",
scale(s00_h40_diff) < -0.33 ~ "Less Advantage",
abs(scale(s00_h40_diff)) <= 0.33 ~ "Equally Advantage"
),
levels = c(
"Less Advantage",
"Equally Advantage",
"More Advantage"
)
)
)Setting Up Color Palette
To visually distinguish sibling differences in SES, we create a color palette that transitions from red (indicating lower SES) to blue (indicating higher SES). This palette is used in the scatter plots to color points based on the SES difference between siblings. That is the predictor variable in our discord data.
Plotting the Results
Individual Level Plot
This plot is for looking at individual level data rather than sibling pair means or differences. It provides context for understanding the relationship between SES and flu vaccinations before examining sibling differences.
This scatter plot shows individual SES at age 40 against individual flu vaccinations. Point color indicates the SES difference between siblings.
The first step is to create the base plot with sibling 1 data. In the next code block, we add sibling 2 data to the plot.
# Individual level plot
individual_plot <- individual_plot_sib1 <- ggplot(flu_modeling_data, aes(
x = s00_h40_s1,
y = flu_total_s1,
color = s00_h40_s1 - s00_h40_s2
)) +
geom_point(
size = 0.8, alpha = 0.8, na.rm = TRUE,
position = position_jitter(width = 0.2, height = 0.2)
) +
geom_smooth(method = "lm", se = FALSE, color = "black") But if we only plot sibling 1, we can visualize that first.
individual_plot_sib1 +
scale_colour_gradientn(
name = "Sibling\nDifferences\nin SES",
colours = shading,
na.value = "#AD78B6",
values = scales::rescale(c(-max_val, max_val))
) +
labs(
x = "SES at Age 40",
y = "Flu Vaccination Count"
) +
theme_minimal() +
ggtitle("Individual Level Plot: Sibling 1 Only") +
theme(plot.title = element_text(hjust = 0.5))
#> Warning: Removed 3 rows containing non-finite outside the scale range
#> (`stat_smooth()`).
Now, we add sibling 2 to the plot, using the same color scheme to indicate SES differences between siblings.
individual_plot <- individual_plot +
# added sibling 2 to the plot
geom_point(
size = 0.8, alpha = 0.8, na.rm = TRUE,
position = position_jitter(width = 0.2, height = 0.2),
aes(
x = s00_h40_s2,
y = flu_total_s2,
color = s00_h40_s2 - s00_h40_s1 # this reverses the color difference so sibling 2 points use the opposite color gradient compared to sibling 1, making it visually clear which sibling is being represented and how their SES difference is encoded
)
) +
scale_colour_gradientn(
name = "Sibling\nDifferences\nin SES",
colours = shading,
na.value = "#AD78B6",
values = scales::rescale(c(-max_val, max_val))
) +
labs(
x = "SES at Age 40",
y = "Flu Vaccination Count"
) +
theme_minimal() #+
# theme(legend.position = "none")
individual_plot +
ggtitle("Individual Level Plot") +
theme(plot.title = element_text(hjust = 0.5))
#> Warning: Removed 3 rows containing non-finite outside the scale range
#> (`stat_smooth()`).
Between Family Plots
This section creates a between-family plot that visualizes mean SES at age 40 against mean flu vaccinations for sibling pairs. Points are colored based on the SES difference between siblings. Each point represents a sibling pair, with the x-axis showing the average SES of the pair and the y-axis showing the average flu vaccinations.
Additionally, marginal density plots can be added to show the distribution of mean SES and mean flu vaccinations for different SES difference groups.
Like the individual level plot, this between-family plot shows a
positive association between mean SES and mean flu vaccinations. Higher
average SES among sibling pairs is associated with higher average flu
vaccination rates.
Adding Marginal Density Plots
To enhance the visualization, we can add marginal density plots to the scatter plot. These plots show the distribution of mean SES and mean flu vaccinations, grouped by the SES difference category.
# Marginal X density (SES mean)
xdensity <- ggplot(discord_flu, aes(x = s00_h40_mean, group = ses_diff_group, color = ses_diff_group)) +
geom_density(adjust = 2, linewidth = 1, fill = NA) +
scale_colour_manual(
name = "Sibling\nDifferences\nin SES",
values = c("firebrick1", "#AD78B6", "dodgerblue1")
) +
theme_minimal() +
theme(
legend.position = "left",
axis.title.y = element_blank(),
axis.text.y = element_blank(),
legend.title = element_text(size = 8),
legend.text = element_text(size = 6)
) +
labs(x = NULL, y = NULL)And for the Y density plot:
# Marginal Y density (Flu mean)
ydensity <- ggplot(discord_flu, aes(x = flu_total_mean, group = ses_diff_group, color = ses_diff_group)) +
geom_density(adjust = 2, linewidth = 1, fill = NA) +
scale_colour_manual(
values = c("firebrick1", "#AD78B6", "dodgerblue1")
) +
coord_flip() +
theme_minimal() +
theme(
legend.position = "none",
axis.title.x = element_blank(),
axis.text.x = element_blank()
) +
labs(x = NULL, y = NULL)To finalize the marginal plots, we add titles and adjust themes for better presentation.


Assembling the Final Plot
Finally, we arrange the main scatter plot and the marginal density
plots into a cohesive layout using
gridExtra::grid.arrange(). The x-density plot is placed
above the main scatter plot, and the y-density plot is placed to the
right of the main scatter plot. We can do this by creating a blank
placeholder plot to fill the empty space in the layout.
# Blank placeholder plot
blankPlot <- ggplot() +
theme_void()
# Final layout
grid.arrange(
arrangeGrob(xdensity, blankPlot, ncol = 2, widths = c(4, 1)),
arrangeGrob(main_plot, ydensity, ncol = 2, widths = c(4, 1)),
heights = c(1.5, 4),
top = textGrob("Sibling Differences in SES and Flu Vaccinations",
gp = gpar(fontsize = 20, font = 3)
)
)
#> `geom_smooth()` using formula = 'y ~ x'
Within Family Plots
This plot compares differences in SES at age 40 to differences in flu vaccinations, with points colored by SES difference. Marginal plots are omitted for simplicity but can be added using the same structure.
# Within Family
## Setup: Use discord_data
# Main scatter plot
main_plot_within <- ggplot(discord_flu, aes(
x = s00_h40_diff,
y = flu_total_diff, colour = s00_h40_diff
)) +
geom_point(
size = 0.8, alpha = 0.8, na.rm = TRUE,
position = position_jitter(w = 0.2, h = 0.2)
) +
geom_smooth(method = "lm", se = FALSE, color = "black") +
scale_colour_gradientn(
name = "Sibling\nDifferences\nin SES",
colours = shading,
na.value = "#AD78B6",
values = scales::rescale(c(
min(discord_flu$s00_h40_diff, na.rm = TRUE),
mean(discord_flu$s00_h40_diff, na.rm = TRUE),
max(discord_flu$s00_h40_diff, na.rm = TRUE)
))
) +
theme_minimal() +
theme(legend.position = "left") +
labs(x = "Diff SES at Age 40", y = "Diff Flu Vaccinations (2006–2016)")
main_plot_within
You can further facet this plot by the difference in SES between kin to see how the relationship varies across different groups. The following code does this and adds a title to the plot.
main_plot_within + facet_wrap(~ses_diff_group, ncol = 1) +
theme(legend.position = "bottom") +
labs(title = "Within Family Differences in SES and Flu Vaccinations")
Conclusion
This vignette demonstrated how to visualize sibling differences in SES and flu vaccinations using discord-structured data. Scatter and density plots highlight associations by SES difference group. The use of color to indicate the difference in SES between kin adds an additional layer of insight into the data.

