Checklist and Resources#

Checklist#

References#

Coding segments of this chapter were in part created thanks to several online tutorials which were used as a reference:

Other textbook and paper references used, that have not been previously directly cited:

What to Learn Next#

If you happen to be handling sensitive data in your project, check out the Working on Sensitive Data Projects chapter.

Alternatively, if you want to make your research project and data analysis pipeline more reproducible, see the chapter on Reproducibility with Make, a build automation tool.

Further Reading#

  • Flexible Imputation of Missing Data: This is a much more in-depth look at missing data imputation that goes into further characterising data, including mathematical definitions, and describing data imputation methods.

  • Getting Started with naniar: More R functions to visualise Data Missingness, including one using decision trees to map out the proportion of missingness in a variable based on all other variables.

  • The papers cited throughout this chapter are all good resources for further reading. The original paper on MICE [vBGO11] and the review papers on missing data handling [OJDSP22, Pig01] are especially great resources.

  • For more R visualisation and imputation packages see:

  • The Turing-Roche partnership has some resources on structured missingness:

    • See #ExplainToMe: The Problem of Structured Missing Data for a great animated overview

    • Papers on structured missingness (that were cited previously): [MMC+23] and [JMH+23]

    • For more in-depth recordings from the Turing-Roche Knowledge Series see:

      • Modern Topics on Missing Data, which also provides a brief overview of missing data:

      • Structured Missingness Challenges in Data Integration: