Extended: Fitting Pedigree-Based Variance Component Models
Source:vignettes/articles/v61_pedigree_model_fitting.Rmd
v61_pedigree_model_fitting.RmdThis vignette extends the example from
vignette("v60_pedigree_model_fitting", package = "BGmisc")
to show how to fit models to multiple families simultaneously. The key
functions are buildOneFamilyGroup() and
buildPedigreeMx(), which translate pedigree data into
OpenMx model specifications.
Scaling Up to Many Families
Here we replicate several estimates of heritability across multiple
families of red squirrels. We use the redsquirrels_full
dataset from the ggpedigree package, which contains
pedigree and phenotypic data on red squirrels from the Kluane region of
the Yukon, Canada. The phenotype we analyze is lifetime reproductive
success (LRS), which is a count of the number of offspring that survive
to a certain age.
library(ggpedigree) # for pedigree data)
library(tidyverse)
data("redsquirrels_full")
ped_krsp <- redsquirrels_full |>
transmute(
ID = as.integer(personID),
momID = as.integer(momID),
dadID = as.integer(dadID),
sex = sex,
famID = as.integer(famID),
lrs = lrs
)
summarizeFamilies(ped_krsp, famID = "famID")$family_summary |> arrange(desc(count))
#> famID count lrs_mean lrs_median lrs_min lrs_max lrs_sd
#> <int> <int> <num> <num> <num> <num> <num>
#> 1: 8 3803 1.173895 0 0 31 3.599582
#> 2: 729 1249 NaN NA NA NA NA
#> 3: 160 103 1.482143 0 0 16 3.411164
#> 4: 38 85 1.304348 0 0 26 4.345468
#> 5: 226 71 1.421053 0 0 21 4.365910
#> ---
#> 1096: 1096 1 NaN NA NA NA NA
#> 1097: 1097 1 NaN NA NA NA NA
#> 1098: 1098 1 NaN NA NA NA NA
#> 1099: 1099 1 NaN NA NA NA NA
#> 1100: 1100 1 NaN NA NA NA NAAs you can see, there are 7799 squirells in 1100 families in this dataset. However, but we are missing several phenotyped individuals (i.e., individuals with non-missing LRS). To fit a multigroup pedigree model, we need to subset to families that have a sufficient number of phenotyped individuals. Here we set a minimum family size of 10 phenotyped individuals, which leaves us with a reduced number of families to include in the analysis.
minim_family_size <- 10
ped_krsp_subset <- ped_krsp |>
group_by(famID) |>
filter(sum(!is.na(lrs)) >= minim_family_size) |>
ungroup()
id_families <- unique(ped_krsp_subset$famID)
n_families <- length(id_families)We now have 17 families with at least 10 phenotyped individuals each. Once we subset to the families with sufficient phenotyped individuals, we can prepare the relatedness matrices and phenotypic data for each family. We pre-allocate lists to store these matrices and data for each family, which will be used to build the OpenMx models.
# Pre-allocate storage
add_list <- vector("list", length(n_families))
cn_list <- vector("list", length(n_families))
mt_list <- vector("list", length(n_families))
obs_ids_list <- vector("list", length(n_families))
pheno_list <- vector("list", length(n_families))Creating starting values for the variance components is important for model convergence. Here we set some reasonable starting values based on prior knowledge or expectations about the trait and the population. These starting values can be adjusted based on the specific dataset and trait being analyzed.
# Starting values for variance components
start_vars <- list(
ad2 = 0.1, # additive genetic
cn2 = 0.1, # common nuclear environment
ce2 = 0, # common extended (not estimated here)
mt2 = 0.1, # mitochondrial
dd2 = 0, # dominance (not estimated here)
am2 = 0, # A x Mt interaction (not estimated here)
ee2 = 0.7 # unique environment
)Model Building
Now we loop through each family, extract the pedigree and phenotypic
data, and prepare the relatedness matrices (additive genetic, common
nuclear environment, and mitochondrial) for each family. We also prepare
the phenotypic data in the format required for OpenMx. Finally, we build
the group models for ACE, MACE, and CE models for each family. The
buildOneFamilyGroup() function is used to create the model
specification for each family, and the buildPedigreeMx()
function is used to combine these group models into a single multigroup
model that can be fitted in OpenMx. As you can see, we are fitting three
different models: ACE (additive genetic, common environment, unique
environment), MACE (additive genetic, common environment, mitochondrial,
unique environment), and CE (common environment, unique environment).
This allows us to compare the models and assess the contribution of each
variance component to the trait of interest (LRS) in these
squirrels.
for (i in seq_len(n_families)) {
ped_i <- subset(ped_krsp_subset, famID == id_families[i])
phenotypic_ids <- ped_i$ID[!is.na(ped_i$lrs)]
A_i <- as.matrix(ped2add(ped_i, sparse = FALSE, keep_ids = phenotypic_ids))
Cn_i <- as.matrix(ped2cn(ped_i, sparse = FALSE, keep_ids = phenotypic_ids))
Mt_i <- as.matrix(ped2mit(ped_i, sparse = FALSE, keep_ids = phenotypic_ids))
n_i <- nrow(A_i)
id_order_i <- rownames(A_i)
pheno_vals <- ped_i$lrs[match(id_order_i, as.character(ped_i$ID))]
obs_ids_i <- make.names(id_order_i[!is.na(pheno_vals)])
pheno_row_i <- matrix(as.double(pheno_vals[!is.na(pheno_vals)]),
nrow = 1,
dimnames = list(NULL, obs_ids_i)
)
rownames(A_i) <- colnames(A_i) <- obs_ids_i
rownames(Cn_i) <- colnames(Cn_i) <- obs_ids_i
rownames(Mt_i) <- colnames(Mt_i) <- obs_ids_i
add_list[[i]] <- A_i
cn_list[[i]] <- Cn_i
mt_list[[i]] <- Mt_i
obs_ids_list[[i]] <- obs_ids_i
pheno_list[[i]] <- pheno_row_i
}For convenience, we build the group models for each family separately
for the ACE, MACE, and CE models. This allows us to easily compare the
models and assess the contribution of each variance component to the
trait of interest (LRS) in these squirrels. The
buildOneFamilyGroup() function is used to create the model
specification for each family, and the buildPedigreeMx()
function is used to combine these group models into a single multigroup
model that can be fitted in OpenMx.
group_models_mace <- lapply(seq_len(n_families), function(i) {
buildOneFamilyGroup(
group_name = paste0("ped", i),
Addmat = add_list[[i]],
Nucmat = cn_list[[i]],
Mtdmat = mt_list[[i]],
full_df_row = pheno_list[[i]],
obs_ids = obs_ids_list[[i]]
)
})
group_models_ace <- lapply(seq_len(n_families), function(i) {
buildOneFamilyGroup(
group_name = paste0("ped", i),
Addmat = add_list[[i]],
Nucmat = cn_list[[i]],
Mtdmat = NULL,
full_df_row = pheno_list[[i]],
obs_ids = obs_ids_list[[i]]
)
})
group_models_ce <- lapply(seq_len(n_families), function(i) {
buildOneFamilyGroup(
group_name = paste0("ped", i),
Addmat = NULL,
Nucmat = cn_list[[i]],
Mtdmat = NULL,
full_df_row = pheno_list[[i]],
obs_ids = obs_ids_list[[i]]
)
})The lapply() function is used to create a list of group
models for each family, which are then combined into a single multigroup
model using the buildPedigreeMx() function. The resulting models can
then be fitted in OpenMx to estimate the variance components for each
family and compare the ACE, MACE, and CE models.
multi_model_mace <- buildPedigreeMx(
model_name = "MultiPedigreeModel",
vars = start_vars,
group_models = group_models_mace,
ci = TRUE
)
multi_model_ace <- buildPedigreeMx(
model_name = "MultiPedigreeModel",
vars = start_vars,
group_models = group_models_ace,
ci = TRUE
)
multi_model_ce <- buildPedigreeMx(
model_name = "MultiPedigreeModel",
vars = start_vars,
group_models = group_models_ce,
ci = TRUE
)Model Fitting
Note that fitting these models can take some time, especially with
many families and large pedigrees. The mxTryHard() function
can be used to attempt to find better-fitting solutions if the initial
optimization does not converge well. In practice, you may want to
experiment with different starting values or optimization settings to
improve convergence.
fitted_multi_mace <- mxRun(multi_model_mace)
fitted_multi_ace <- mxRun(multi_model_ace)
fitted_multi_ce <- mxRun(multi_model_ce)
MACE Model Results
As you can see, we have fit a multigroup pedigree model using 17 families of several thousand squirrels. The model includes additive genetic variance (Vad), common nuclear environmental variance (Vcn), and unique environmental variance (Ver). The mitochondrial variance component (Vmt) was included in the MACE model but not estimated in the ACE model. The results show the proportion of total variance attributed to each component, which can be interpreted as heritability and environmental contributions to the phenotype of interest (LRS) in these squirrels.
summary(fitted_multi_mace)
#> Summary of MultiPedigreeModel
#>
#> free parameters:
#> name matrix row col Estimate Std.Error A lbound ubound
#> 1 vad ModelOne.Vad 1 1 0.0000000001 0.30183118 ! 0!
#> 2 vcn ModelOne.Vcn 1 1 4.1144290806 1.19868649 1e-10
#> 3 vmt ModelOne.Vmt 1 1 0.0000000001 0.09662163 ! 0!
#> 4 ver ModelOne.Ver 1 1 9.2999051171 0.98364306 1e-10
#> 5 meanLI ped1.M 1 X208 1.2569248071 0.11367465
#>
#> confidence intervals:
#> lbound estimate ubound note
#> vad NA 0.0000000001 NA !!!
#> vcn NA 4.1144290806 NA !!!
#> vmt NA 0.0000000001 NA !!!
#> ver NA 9.2999051171 NA !!!
#> To investigate missing CIs, run summary() again, with verbose=T, to see CI details.
#>
#> Model Statistics:
#> | Parameters | Degrees of Freedom | Fit (-2lnL units)
#> Model: 5 2670 14465.02
#> Saturated: NA NA NA
#> Independence: NA NA NA
#> Number of observations/statistics: 17/2675
#>
#> Information Criteria:
#> | df Penalty | Parameters Penalty | Sample-Size Adjusted
#> AIC: 9125.024 14475.02 14480.48
#> BIC: 6900.344 14479.19 14463.86
#> To get additional fit indices, see help(mxRefModels)
#> timestamp: 2026-03-30 22:09:23
#> Wall clock time: 9521.762 secs
#> optimizer: SLSQP
#> OpenMx version number: 2.22.11
#> Need help? See help(mxSummary)
summary(fitted_multi_mace)$CI
#> lbound estimate ubound note
#> vad NA 0.0000000001 NA !!!
#> vcn NA 4.1144290806 NA !!!
#> vmt NA 0.0000000001 NA !!!
#> ver NA 9.2999051171 NA !!!
total_var_mace <- sum(
fitted_multi_mace$ModelOne$Vad$values,
fitted_multi_mace$ModelOne$Vcn$values,
fitted_multi_mace$ModelOne$Vmt$values,
fitted_multi_mace$ModelOne$Ver$values
)
cat("Additive genetic (Vad):", fitted_multi_mace$ModelOne$Vad$values / total_var_mace, "\n")
#> Additive genetic (Vad): 7.454712e-12
cat("Common nuclear (Vcn):", fitted_multi_mace$ModelOne$Vcn$values / total_var_mace, "\n")
#> Common nuclear (Vcn): 0.3067188
cat("Mitochondrial (Vmt):", fitted_multi_mace$ModelOne$Vmt$values / total_var_mace, "\n")
#> Mitochondrial (Vmt): 7.454712e-12
cat("Unique environ. (Ver):", fitted_multi_mace$ModelOne$Ver$values / total_var_mace, "\n")
#> Unique environ. (Ver): 0.6932812As you can see, the MACE model includes an additional variance component for mitochondrial effects (Vmt), which allows us to assess the contribution of mitochondrial inheritance to the trait of interest (LRS) in these squirrels. The results show the proportion of total variance attributed to each component, which can be interpreted as heritability and environmental contributions to LRS in these squirrels. We can also compare the ACE and MACE models to see if including the mitochondrial component changes our estimates of heritability and environmental contributions.
ACE Model Results
Now we can look at the results from the ACE model, which does not include the mitochondrial component. This allows us to compare the estimates of additive genetic and common nuclear environmental variance between the ACE and MACE models, and assess whether including the mitochondrial component changes our estimates of heritability and environmental contributions to LRS in these squirrels.
summary(fitted_multi_ace)
#> Summary of MultiPedigreeModel
#>
#> free parameters:
#> name matrix row col Estimate Std.Error A lbound ubound
#> 1 vad ModelOne.Vad 1 1 1.000055e-10 0.1675854 ! 0!
#> 2 vcn ModelOne.Vcn 1 1 4.114450e+00 1.1233310 1e-10
#> 3 ver ModelOne.Ver 1 1 9.299892e+00 0.9632271 1e-10
#> 4 meanLI ped1.M 1 X208 1.256922e+00 0.1098559
#>
#> confidence intervals:
#> lbound estimate ubound note
#> vad 1e-10 1.000055e-10 0.2676144
#> vcn NA 4.114450e+00 NA !!!
#> ver NA 9.299892e+00 NA !!!
#> To investigate missing CIs, run summary() again, with verbose=T, to see CI details.
#>
#> Model Statistics:
#> | Parameters | Degrees of Freedom | Fit (-2lnL units)
#> Model: 4 2671 14465.02
#> Saturated: NA NA NA
#> Independence: NA NA NA
#> Number of observations/statistics: 17/2675
#>
#> Information Criteria:
#> | df Penalty | Parameters Penalty | Sample-Size Adjusted
#> AIC: 9123.024 14473.02 14476.36
#> BIC: 6897.511 14476.36 14464.09
#> To get additional fit indices, see help(mxRefModels)
#> timestamp: 2026-03-30 19:30:34
#> Wall clock time: 4929.521 secs
#> optimizer: SLSQP
#> OpenMx version number: 2.22.11
#> Need help? See help(mxSummary)
summary(fitted_multi_ace, verbose = T)$CI
#> lbound estimate ubound note
#> vad 1e-10 1.000055e-10 0.2676144
#> vcn NA 4.114450e+00 NA !!!
#> ver NA 9.299892e+00 NA !!!
#summary(fitted_multi_ace)$CI
total_var_ace <- sum(
fitted_multi_ace$ModelOne$Vad$values,
fitted_multi_ace$ModelOne$Vcn$values,
# fitted_multi_ace$ModelOne$Vmt$values,
fitted_multi_ace$ModelOne$Ver$values
)As you can see, the ACE model includes variance components for additive genetic effects (Vad), common nuclear environmental effects (Vcn), and unique environmental effects (Ver), but does not include a component for mitochondrial effects (Vmt). The results show the proportion of total variance attributed to each component, which can be interpreted as heritability and environmental contributions to LRS in these squirrels. We can compare these estimates to those from the MACE model to assess the contribution of mitochondrial inheritance to LRS in these squirrels.
cat("Additive genetic (Vad):", fitted_multi_ace$ModelOne$Vad$values / total_var_ace, "\n")
#> Additive genetic (Vad): 7.455119e-12
cat("Common nuclear (Vcn):", fitted_multi_ace$ModelOne$Vcn$values / total_var_ace, "\n")
#> Common nuclear (Vcn): 0.3067203
cat("Unique environ. (Ver):", fitted_multi_ace$ModelOne$Ver$values / total_var_ace, "\n")
#> Unique environ. (Ver): 0.6932797CE Model Results
summary(fitted_multi_ce)
#> Summary of MultiPedigreeModel
#>
#> free parameters:
#> name matrix row col Estimate Std.Error A lbound ubound
#> 1 vcn ModelOne.Vcn 1 1 4.114448 1.08082448 1e-10
#> 2 ver ModelOne.Ver 1 1 9.299891 0.93752032 1e-10
#> 3 meanLI ped1.M 1 X208 1.256923 0.07604897
#>
#> confidence intervals:
#> lbound estimate ubound note
#> vcn NA 4.114448 5.994869 !!!
#> ver 7.756658 9.299891 11.797608
#> To investigate missing CIs, run summary() again, with verbose=T, to see CI details.
#>
#> Model Statistics:
#> | Parameters | Degrees of Freedom | Fit (-2lnL units)
#> Model: 3 2672 14465.02
#> Saturated: NA NA NA
#> Independence: NA NA NA
#> Number of observations/statistics: 17/2675
#>
#> Information Criteria:
#> | df Penalty | Parameters Penalty | Sample-Size Adjusted
#> AIC: 9121.024 14471.02 14472.87
#> BIC: 6894.678 14473.52 14464.32
#> To get additional fit indices, see help(mxRefModels)
#> timestamp: 2026-03-31 05:14:48
#> Wall clock time: 2201.804 secs
#> optimizer: SLSQP
#> OpenMx version number: 2.22.11
#> Need help? See help(mxSummary)
summary(fitted_multi_ce, verbose = T)$CI
#> lbound estimate ubound note
#> vcn NA 4.114448 5.994869 !!!
#> ver 7.756658 9.299891 11.797608
total_var_ce <- sum(
# fitted_multi_ce$ModelOne$Vad$values,
fitted_multi_ce$ModelOne$Vcn$values,
# fitted_multi_ce$ModelOne$Vmt$values,
fitted_multi_ce$ModelOne$Ver$values
)As you can see, the CE model includes variance components for common nuclear environmental effects (Vcn) and unique environmental effects (Ver), but does not include a component for additive genetic effects (Vad) or mitochondrial effects (Vmt). The results show the proportion of total variance attributed to each component, which can be interpreted as environmental contributions to LRS in these squirrels. We can compare these estimates to those from the ACE and MACE models to assess the contribution of additive genetic and mitochondrial inheritance to LRS in these squirrels.
Model Comparison
All of these models are nested, with the CE model being the most constrained (only common environment and unique environment), the ACE model including additive genetic effects, and the MACE model including both additive genetic effects and mitochondrial effects.
Now we can compare the ACE and MACE models to see if including the mitochondrial component changes our estimates of heritability and environmental contributions. This can provide insights into the role of mitochondrial inheritance in the trait of interest (LRS) in these squirrels. In the original paper, the authors found that a maternally inherited component (which could be due to mitochondria or maternal effects) explained a significant portion of the variance in LRS,.
mxCompare(fitted_multi_mace, fitted_multi_ace)
#> base comparison ep minus2LL df AIC diffLL
#> 1 MultiPedigreeModel <NA> 5 14465.02 2670 14475.02 NA
#> 2 MultiPedigreeModel MultiPedigreeModel 4 14465.02 2671 14473.02 -6.135451e-09
#> diffdf p
#> 1 NA NA
#> 2 1 1However, as you can see when we compare the ACE and MACE models, the inclusion of the mitochondrial component does not substantially change the estimates of additive genetic and common nuclear environmental variance, suggesting that the mitochondrial component may not be a major contributor to LRS in this dataset.
mxCompare(fitted_multi_ace, fitted_multi_ce)
#> base comparison ep minus2LL df AIC diffLL
#> 1 MultiPedigreeModel <NA> 4 14465.02 2671 14473.02 NA
#> 2 MultiPedigreeModel MultiPedigreeModel 3 14465.02 2672 14471.02 -6.202754e-10
#> diffdf p
#> 1 NA NA
#> 2 1 1When we compare the ACE and CE models, we see that the ACE model does not fit significantly better than the CE model. This suggests that the additive genetic component may not be a major contributor to LRS in this dataset, and that the variance in LRS may be primarily due to environmental factors. However, it’s important to interpret these results in the context of the specific dataset and trait being analyzed, and to consider other factors such as sample size, pedigree structure, and potential confounding variables that could influence the estimates of variance components.