Skip to contents

Just how related are Jon Snow and Daenerys Targaryen? This vignette walks through how to quantify their genetic relatedness using functions from the BGmisc package. While the Game of Thrones canon gives us some clues, we can use a formal pedigree-based approach to quantify their genetic relatedness. This vignette demonstrates how to compute coefficients of relatedness using the BGmisc package, along with basic data manipulation from tidyverse. We will also handle incomplete parental information programmatically and generate a plot of the reconstructed pedigree.

Load Packages and Data

We begin by loading the necessary packages and accessing the built-in ASOIAF pedigree dataset included with BGmisc.

The ASOIAF data contains character IDs, family identifiers, and parent-child links extracted from A Song of Ice and Fire lore.

head(ASOIAF)
##   id famID momID dadID          name sex
## 1  1     1    NA    NA   Walder Frey   M
## 2  2     1    NA    NA   Perra Royce   F
## 3  3     1     2     1  Stevron Frey   M
## 4  4     1     2     1    Emmon Frey   M
## 5  5     1     2     1    Aenys Frey   M
## 6  6     1    NA    NA Corenna Swann   F

Prepare and Validate Sex Codes

We use checkSex() to ensure that all individuals have valid sex codes, repairing as needed. This is important for correct pedigree plotting and downstream calculations.

df_got <- checkSex(ASOIAF,
  code_male = 1,
  code_female = 0,
  verbose = FALSE, repair = TRUE
)

Compute Relatedness Matrices

We now compute the additive genetic relatedness matrix (add) and the common nuclear relatedness matrix (cn) from the pedigree using ped2com() and ped2cn(), respectively. The isChild_method argument specifies how to identify child-parent relationships. We use “partialparent” to account for missing parent information. The adjacency_method argument specifies how to construct the adjacency matrix. We use “direct” for the additive matrix and “indexed” for the common nuclear matrix. The direct method is much faster. The sparse argument is set to FALSE to return dense matrices.

add <- ped2com(df_got,
  isChild_method = "partialparent",
  component = "additive",
  adjacency_method = "direct",
  sparse = FALSE
)

cn <- ped2cn(df_got,
  isChild_method = "partialparent",
  adjacency_method = "indexed",
  sparse = FALSE
)

Convert to Pairwise Format

We convert the component matrices into a long-format table of pairwise relationships using com2links(). This gives us a long dataframe where each row represents a pair of individuals and their relatedness. The function can return the entire matrix or just the lower triangular part, which is often sufficient for our purposes. We set writetodisk = FALSE to keep the data in memory.

df_links <- com2links(
  writetodisk = FALSE,
  ad_ped_matrix = add, cn_ped_matrix = cn,
  drop_upper_triangular = TRUE
)# %>%
#  filter(ID1 != ID2)

Locate Jon and Daenerys

Next, we extract the IDs corresponding to Jon Snow and Daenerys Targaryen. We use the filter() function to find the rows in the df_links dataframe where either ID1 or ID2 corresponds to Jon Snow, and then filter again to find the row where the other ID corresponds to Daenerys Targaryen.

# Find the IDs of Jon Snow and Daenerys Targaryen

jon_id <- df_got %>%
  filter(name == "Jon Snow") %>%
  pull(ID)

dany_id <- df_got %>%
  filter(name == "Daenerys Targaryen") %>%
  pull(ID)

We then filter the pairwise table to retrieve the row containing their relationship.

jon_dany_row <- df_links %>%
  filter(ID1 == jon_id | ID2 == jon_id) %>%
  filter(ID1 %in% dany_id| ID2 %in% dany_id)

jon_dany_row 
##   ID1 ID2     addRel cnuRel
## 1 206 211 0.31274414      0
## 2 211 304 0.01953125      0

This row contains the additive relatedness coefficient between Jon and Daenerys, which allows us to assess how closely related they are genetically. We’d expect to see a value of 0.25 for an Aunt-Nephew relationship, which is what Jon and Daenerys are in the show. However, the value is 0.3127441, indicating a more complex relationship.

Plotting the Pedigree with incomplete parental information

To facilitate plotting, we check for individuals with one known parent but a missing other. For those cases, we assign a placeholder ID to the missing parent.

df_repaired <- checkParentIDs(df_got,addphantoms=TRUE,
                              repair=TRUE,
                              parentswithoutrow=FALSE,
                              repairsex=FALSE
                              ) %>% mutate(fam=1,
                                           affected = case_when(ID %in% c(jon_id,dany_id) ~ 1,
                                                               TRUE ~ 0)
                              )
## REPAIR IN EARLY ALPHA

This code creates new IDs for individuals with one known parent and a missing other. It checks if either momID or dadID is missing, and if so, it assigns a new ID based on the row number. This allows us to visualize the pedigree even when some parental information is incomplete.

Visualize the Pedigree

#fixParents(id=df_got$ID, dadid=df_got$dadID, momid=df_got$momID, sex=df_got$sex, missid = NA)

plotPedigree(df_repaired,affected=df_repaired$affected,verbose=FALSE)

## Did not plot the following people: 85 88 125 142 228 229 258 259 274 275 305 336 357 381 388 405 409 418 420 424 428 451 487
## named list()