## Smart Data: Design of Experiments Starter

### Introduction

We are experiencing unprecedented advances in machine learning which are starting to have a profound impact. But these advances could only be achieved by using massive amounts of data. More importantly, it is actually not the sheer amount of data that matters, but the number of training examples (e.g. batches / experiments). If we have too few examples (e.g. less than 1000), it does not help if we provide more data per example! In research and development, we often have to achieve our objectives with a very limited amount of newly generated examples: “Do more with Less!” Therefore, deep learning techniques are often not applicable, and our best approach is to make our data “smart”, i.e., we deliberately plan the data to be generated to be as informative and unambiguous as possible. This can be accomplished by using Design of Experiments (DOE), i.e. the science of systematic data generation.

DOE has silently been used successfully in many industries for decades, enabling its proponents to achieve substantial improvements and to outclass their competition. Likewise, the most well-known DOE approaches have been around for a long time, but recent years have seen important advances which make its use much more efficient and simpler for a general audience.

#### General Tutorials/Textbooks

- Simple introduction to DOE with an easily accessible cake baking example
- Design Of Experiments: A Modern Approach best overall textbook by Bradley Jones and Douglas C. Montgomery
- Optimal Design of Experiments very accessible introduction by Peter Goos and Bradley Jones
- Design and Analysis of Experiments the classic textbook by Douglas C. Montgomery
- Stat-Ease Handbook for Experimenters extensive free introduction from Statease

#### Mixture Design

- Experimental Design for Formulation by Wendell F. Smith is probably the best introduction for practitioners
- Getting Your Toe into Mixtures free introduction by StatEase
- Experiments With Mixtures classic on the subject by John A. Cornell
- A Primer on Experiments with Mixtures abridged and simplified version of the former by John A. Cornell

#### Robust Parameter Design

- Response Surface Methodology by Raymond H. Myers, Douglas C. Montgomery, Christine M. Anderson-Cook
- Robust Parameter Design: A Review by Timothy J. Robinson, Connie M. Borror and Raymond H. Myers
- The Taguchi Approach discussed by Douglas C. Montgomery
- Presentation, slides and white paper by Stat-Ease
- Taguchi's Parameter Design: A Panel Discussion
- Generalized Linear Models for the Analysis of Taguchi-type Experiments by J. A. Nelder, Y. Lee

#### YouTube

- Statistics Made Easy Channel by Stat-Ease

#### Software

- Design Expert from Statease (if you are only allowed one DOE software)
- JMP from SAS (offering some great features for the advanced user)
- G*Power for classical power calculations in univariate cases
- R / R Taskview DOE (inefficient for general DOE work, but great for programming)
- R libraries dhglm or jmdem for GLM fitting mean and variance of robust parameter designs
- Python (for usage from within Statease 360 or as alternative to R)

#### Goodies from DA-SOL

- Excel Sheet for estimating the required run number
- R function for generating Pareto plots for Design Expert models
- R function for comparing two Pareto plots from Design Expert models
- Hints for Design Expert usage

#### Selected References

- Structured information collection before DOE generation by Douglas C. Montgomery
- Using Power and Precision to Size DOEs talk by Shari Kraber
- The coding of categorical factors
- Correcting Common Misconceptions About Optimal Experiment Design (slides) by Bradley Jones
- Split-Plot Designs: What, Why, and How by Bradley Jones and Christopher J. Nachtsheim
- 21st century screening experiments: What, why, and how Discussion of Definitive Screening Designs
- Proper and improper use of Definitive Screening Designs by Bradley Jones
- Weighted A-optimal designs by Jonathan W. Stallings (for prioritizing specific effects)
- Bayesian D-optimal designs by William DuMouchel and Bradley Jones ("if possible" effects in JMP)
- Alias optimal designs by Bradley Jones and Chris J. Nachtsheim (for minimizing aliasing)
- Tutorial on Handling Covariates (talk) by Ryan Lekivetz
- .. to be expanded ...