Skip to contents

This function summarizes pedigree data, by computing key summary statistics for all numeric variables and identifying the originating member (founder) for each family, maternal, and paternal lineage.

Usage

summarizePedigrees(
  ped,
  famID = "famID",
  personID = "ID",
  momID = "momID",
  dadID = "dadID",
  matID = "matID",
  patID = "patID",
  type = c("fathers", "mothers", "families"),
  byr = NULL,
  include_founder = FALSE,
  founder_sort_var = NULL,
  nbiggest = 5,
  noldest = 5,
  skip_var = NULL,
  five_num_summary = FALSE,
  verbose = FALSE
)

Arguments

ped

a pedigree dataset. Needs ID, momID, and dadID columns

famID

character. Name of the column to be created in ped for the family ID variable

personID

character. Name of the column in ped for the person ID variable

momID

character. Name of the column in ped for the mother ID variable

dadID

character. Name of the column in ped for the father ID variable

matID

Character. Maternal line ID variable to be created and added to the pedigree

patID

Character. Paternal line ID variable to be created and added to the pedigree

type

Character vector. Specifies which summaries to compute. Options: `"fathers"`, `"mothers"`, `"families"`. Default includes all three.

byr

Character. Optional column name for birth year. Used to determine the oldest lineages.

include_founder

Logical. If `TRUE`, includes the founder (originating member) of each lineage in the output.

founder_sort_var

Character. Column used to determine the founder of each lineage. Defaults to `byr` (if available) or `personID` otherwise.

nbiggest

Integer. Number of largest lineages to return (sorted by count).

noldest

Integer. Number of oldest lineages to return (sorted by birth year).

skip_var

Character vector. Variables to exclude from summary calculations.

five_num_summary

Logical. If `TRUE`, includes the first quartile (Q1) and third quartile (Q3) in addition to the minimum, median, and maximum values.

verbose

Logical, if TRUE, print progress messages.

Value

A data.frame (or list) containing summary statistics for family, maternal, and paternal lines, as well as the 5 oldest and biggest lines.

Details

The function calculates standard descriptive statistics, including the count of individuals in each lineage, means, medians, minimum and maximum values, and standard deviations. Additionally, if `five_num_summary = TRUE`, the function includes the first and third quartiles (Q1, Q3) to provide a more detailed distributional summary. Users can also specify variables to exclude from the analysis via `skip_var`.

Beyond summary statistics, the function identifies the founding member of each lineage based on the specified sorting variable (`founder_sort_var`), defaulting to birth year (`byr`) when available or `personID` otherwise. Users can retrieve the largest and oldest lineages by setting `nbiggest` and `noldest`, respectively.