Automated Documentation of End-to-End Experiments in Data Science

Published in the 35th IEEE International Conference on Data Engineering (ICDE), 2019

Recommended citation: S. Redyuk (2019). Automated Documentation of End-to-End Experiments in Data Science. In Ph.D. Symposium track, IEEE 35th International Conference on Data Engineering (ICDE’19), Macau, China

This paper motivates and outlines my Ph.D. project.

Abstract

Reproducibility plays a crucial role in experimentation. However, the modern research ecosystem and the underlying frameworks are constantly evolving and thereby making it extremely difficult to reliably reproduce scientific artifacts such as data, algorithms, trained models and visualizations. We therefore aim to design a novel system for assisting data scientists with rigorous end-to-end documentation of data-oriented experiments. Capturing data lineage, metadata, and other artifacts helps reproducing and sharing experimental results. We summarize this challenge as automated documentation of data science experiments. We aim at reducing manual overhead for experimenting researchers, and intend to create a novel approach in dataflow and metadata tracking based on the analysis of the experiment source code. The envisioned system will accelerate the research process in general, and enable capturing fine-grained meta information by deriving a declarative representation of data science experiments.

Download paper here

Download poster here