Accurate assessment of the age-dependent disease risk conferred by germline variants in disease susceptibility genes is often hampered by the way the data are collected. Cohort-based data sets frequently contain an overrepresentation of patients (i.e. carriers of the gene variant of interest affected with the associated disease), and an underrepresentation of disease-free carriers. In order to overcome this problem, penetrance estimates can be based on family-based study designs, through the evaluation of index patients and their family members. This approach facilitates the identification of asymptomatic germline variant carriers. By adjusting for the way these family data are ascertained, an estimate for the penetrance of the pathogenic gene variant can be obtained. However, the family structure is often incomplete or missing. This complicates the estimation of the penetrance, because full adjustment of the likelihood is not possible. We present a conditional likelihood for the estimation of the penetrance of pathogenic gene variants, based on a cohort of multiple families comprising index patients, disease-free and affected non-index carriers, but with missing information on pedigree structure. The proposed estimator corrects for the ascertainment in a robust way and is shown to be more accurate than the frequently used Kaplan-Meier estimator of the penetrance function.
- conditional maximum likelihood method
- missing data