Smart Data: Design of Experiments Starter

Introduction

We are experiencing unprecedented advances in machine learning which are starting to have a profound impact. But these advances could only be achieved by using massive amounts of data. More importantly, it is actually not the sheer amount of data that matters, but the number of training examples (e.g. batches / experiments). If we have too few examples (e.g. less than 1000), it does not help if we provide more data per example! In research and development, we often have to achieve our objectives with a very limited amount of newly generated examples: “Do more with Less!” Therefore, deep learning techniques are often not applicable, and our best approach is to make our data “smart”, i.e., we deliberately plan the data to be generated to be as informative and unambiguous as possible. This can be accomplished by using Design of Experiments (DOE), i.e. the science of systematic data generation.
DOE has silently been used successfully in many industries for decades, enabling its proponents to achieve substantial improvements and to outclass their competition.  Likewise, the most well-known DOE approaches have been around for a long time, but recent years have seen important advances which make its use much more efficient and simpler for a general audience.

General Tutorials/Textbooks

Mixture Design

Robust Parameter Design

YouTube

Software

  • Design Expert from Statease (if you are only allowed one DOE software)
  • JMP from SAS (offering some great features for the advanced user)
  • G*Power for classical power calculations in univariate cases
  • R / R Taskview DOE (inefficient for general DOE work, but great for programming)
  • R libraries dhglm or jmdem for GLM fitting mean and variance of robust parameter designs
  • Python (for usage from within Statease 360 or as alternative to R)

Goodies from DA-SOL

  • Excel Sheet for estimating the required run number
  • R function for generating Pareto plots for Design Expert models
  • R function for comparing two Pareto plots from Design Expert models
  • Hints for Design Expert usage

Selected References

Contact

Dr. Juergen von Frese
Data Analysis Solutions S.L.U.
Telephone:
E-Mail: