Use and misuse of predicted values in epidemiologic data analyses (TG4)

00:00

Cite

Related Material

Banff International Research Station (BIRS) for Mathematical Innovation and Discovery

Shaw, Pamela A.

Formal Metadata

Title

Use and misuse of predicted values in epidemiologic data analyses (TG4)

Title of Series

Toward a Comprehensive, Integrated Framework for Advanced Statistical Analyses of Observational Studies (19w5198)

Number of Parts

Author

Shaw, Pamela A.

License

CC Attribution - NonCommercial - NoDerivatives 4.0 International:
You are free to use, copy, distribute and transmit the work or content in unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/58057 (DOI)

Publisher

Banff International Research Station (BIRS) for Mathematical Innovation and Discovery

Release Date

2019

Language

English

Content Metadata

Subject Area

Computer Science Mathematics

Genre

Workshop/Interactive Format Lecture

Abstract

For many epidemiologic settings, the principle exposure or outcome under study can only be imprecisely measured. In an attempt to address error-in-variables, sometimes the analyst will adjust these variables, say through a calibration or prediction equation, and use the resulting predicted value in the analysis in place of the observed value. When a predicted quantity is used in place of an observed value in a data analysis, consideration of the impact of the uncertainty in the predicted quantity on the study results is needed, but this is not always done in practice. Such predicted variables usually have Berkson error. The result of ignoring this uncertainty, or prediction error, for some settings could be that the parameter estimates are biased, the standard errors are biased, or both. We examine three common examples for how predicted values are used in an analysis in place of an error-prone variable: 1) to estimate the distribution of a variable, 2) to compare values of a variable between groups by using the predicted value in a two-group statistic (e.g. t-statistic) or as an outcome variable in a regression, and 3) to estimate the effect of an error-prone variable on an outcome, where the predicted quantity is used as exposure variable in a regression. For each example, we present an overview of the potential consequences for using a predicted quantity in an analysis in place of the true value without appropriate statistical adjustment. We further illustrate some concepts with data from a large population-based cohort, the Hispanic Community Health Study/Study of Latinos (HCHS/SOL).