Skip to contents

Slices up families by additive relatedness, creating CSV files grouped by degree of relatedness. Operates on a potentially large file by reading in chunks and binning links by additive relatedness.

Usage

sliceFamilies(
  outcome_name = "AD_demo",
  biggest = TRUE,
  bin_width = 0.1,
  degreerelatedness = 12,
  chunk_size = 2e+07,
  max_lines = 1e+13,
  addRel_ceiling = 1.5,
  input_file = NULL,
  folder_prefix = "data",
  progress_csv = "progress.csv",
  progress_status = "progress.txt",
  data_directory = NULL,
  verbose = FALSE,
  error_handling = FALSE,
  file_column_names = c("ID1", "ID2", "addRel", "mitRel", "cnuRel")
)

Arguments

outcome_name

Name of the outcome variable (used for naming input/output files)

biggest

Logical; whether to process the "biggest" family dataset (TRUE) or all-but-biggest (FALSE)

bin_width

Width of additive relatedness bins (default is 0.10)

degreerelatedness

Maximum degree of relatedness to consider (default 12)

chunk_size

Number of lines to read in each chunk (default 2e7)

max_lines

Max number of lines to process from input file (default 1e13)

addRel_ceiling

Numeric. Maximum relatedness value to bin to. Default is 1.5

input_file

Path to the input CSV file. If NULL, defaults to a specific file based on `biggest` flag.

folder_prefix

Prefix for the output folder (default "data")

progress_csv

Path to a CSV file for tracking progress (default "progress.csv")

progress_status

Path to a text file for logging progress status (default "progress.txt")

data_directory

Directory where output files will be saved. If NULL, it is constructed based on `outcome_name` and `folder_prefix`.

verbose

Logical; whether to print progress messages (default FALSE)

error_handling

Logical. Should more aggressive error handing be attemptted? Default is false

file_column_names

Names of the columns in the input file (default c("ID1", "ID2", "addRel", "mitRel", "cnuRel"))

Value

NULL. Writes CSV files to disk and updates progress logs.