Just how related are Jon Snow and Daenerys Targaryen? This vignette
walks through how to quantify their genetic relatedness using functions
from the BGmisc package. While the Game of Thrones canon gives us some
clues, we can use a formal pedigree-based approach to quantify their
genetic relatedness. This vignette demonstrates how to compute
coefficients of relatedness using the BGmisc
package, along
with basic data manipulation from tidyverse. We will also handle
incomplete parental information programmatically and generate a plot of
the reconstructed pedigree.
Load Packages and Data
We begin by loading the necessary packages and accessing the built-in
ASOIAF
pedigree dataset included with
BGmisc
.
The ASOIAF data contains character IDs, family identifiers, and parent-child links extracted from A Song of Ice and Fire lore.
head(ASOIAF)
## id famID momID dadID name sex
## 1 1 1 NA NA Walder Frey M
## 2 2 1 NA NA Perra Royce F
## 3 3 1 2 1 Stevron Frey M
## 4 4 1 2 1 Emmon Frey M
## 5 5 1 2 1 Aenys Frey M
## 6 6 1 NA NA Corenna Swann F
Prepare and Validate Sex Codes
We use checkSex()
to ensure that all individuals have
valid sex codes, repairing as needed. This is important for correct
pedigree plotting and downstream calculations.
df_got <- checkSex(ASOIAF,
code_male = 1,
code_female = 0,
verbose = FALSE, repair = TRUE
)
Compute Relatedness Matrices
We now compute the additive genetic relatedness matrix (add) and the
common nuclear relatedness matrix (cn) from the pedigree using ped2com()
and ped2cn(), respectively. The isChild_method
argument
specifies how to identify child-parent relationships. We use
“partialparent” to account for missing parent information. The
adjacency_method
argument specifies how to construct the
adjacency matrix. We use “direct” for the additive matrix and “indexed”
for the common nuclear matrix. The direct method is much faster. The
sparse
argument is set to FALSE to return dense
matrices.
Convert to Pairwise Format
We convert the component matrices into a long-format table of
pairwise relationships using com2links()
. This gives us a
long dataframe where each row represents a pair of individuals and their
relatedness. The function can return the entire matrix or just the lower
triangular part, which is often sufficient for our purposes. We set
writetodisk = FALSE
to keep the data in memory.
df_links <- com2links(
writetodisk = FALSE,
ad_ped_matrix = add, cn_ped_matrix = cn,
drop_upper_triangular = TRUE
)# %>%
# filter(ID1 != ID2)
Locate Jon and Daenerys
Next, we extract the IDs corresponding to Jon Snow and Daenerys
Targaryen. We use the filter()
function to find the rows in
the df_links
dataframe where either ID1 or ID2 corresponds
to Jon Snow, and then filter again to find the row where the other ID
corresponds to Daenerys Targaryen.
# Find the IDs of Jon Snow and Daenerys Targaryen
jon_id <- df_got %>%
filter(name == "Jon Snow") %>%
pull(ID)
dany_id <- df_got %>%
filter(name == "Daenerys Targaryen") %>%
pull(ID)
We then filter the pairwise table to retrieve the row containing their relationship.
jon_dany_row <- df_links %>%
filter(ID1 == jon_id | ID2 == jon_id) %>%
filter(ID1 %in% dany_id| ID2 %in% dany_id)
jon_dany_row
## ID1 ID2 addRel cnuRel
## 1 206 211 0.31274414 0
## 2 211 304 0.01953125 0
This row contains the additive relatedness coefficient between Jon and Daenerys, which allows us to assess how closely related they are genetically. We’d expect to see a value of 0.25 for an Aunt-Nephew relationship, which is what Jon and Daenerys are in the show. However, the value is 0.3127441, indicating a more complex relationship.
Plotting the Pedigree with incomplete parental information
To facilitate plotting, we check for individuals with one known parent but a missing other. For those cases, we assign a placeholder ID to the missing parent.
df_repaired <- checkParentIDs(df_got,addphantoms=TRUE,
repair=TRUE,
parentswithoutrow=FALSE,
repairsex=FALSE
) %>% mutate(fam=1,
affected = case_when(ID %in% c(jon_id,dany_id) ~ 1,
TRUE ~ 0)
)
## REPAIR IN EARLY ALPHA
This code creates new IDs for individuals with one known parent and a
missing other. It checks if either momID
or
dadID
is missing, and if so, it assigns a new ID based on
the row number. This allows us to visualize the pedigree even when some
parental information is incomplete.
Visualize the Pedigree
#fixParents(id=df_got$ID, dadid=df_got$dadID, momid=df_got$momID, sex=df_got$sex, missid = NA)
plotPedigree(df_repaired,affected=df_repaired$affected,verbose=FALSE)
## Did not plot the following people: 85 88 125 142 228 229 258 259 274 275 305 336 357 381 388 405 409 418 420 424 428 451 487
## named list()