his blog post is from James Cawse, Consultant and Principal at Cawse and Effect, LLC. Jim uses his unique blend of chemical knowledge, statistical skills, industrial process experience, and quality commitment to find solutions for his client’s difficult experimental and process problems. He received his Ph.D. in Organic Chemistry from Stanford University. On top of all that, he’s a great guy! Visit his website (link above) to find out more about Jim, his background, and his company.
Getting the best information from chemical experimentation using design of experiments (DOE) is a concept that has been around for decades, although it is still painfully underused in chemistry. In a recent article Leardi1 pointed this out with an excellent tutorial on basic DOE for chemistry. The classic DOE text Statistics for Experimenters2 also used many chemical illustrations of DOE methodology. In my consulting practice, however, I have encountered numerous situations where ’vanilla‘ DOE – whether from a book, software, or a Six Sigma course – struggles mightily because of the inherent complications of chemistry.
The basic rationale for using a statistically based DOE in any science are straightforward. The DOE method provides:
- Points distributed in a rational fashion throughout “experimental space”.
- Noise reduction by averaging and application of efficient statistical tools.
- ‘Synergy’, typically the result of the interactions of two or more factors – easily determined in a DOE.
- An equation (model) that can then be used to predict further results and optimize the system.
All of these are provided in a typical DOE, which generally starts simply with a factorial design.
DOE works so well in most scientific disciplines because Mother Nature is kind. In general:
- Most experiments can be performed with small numbers of ’well behaved‘ factors, typically simple numeric or qualitative at 2-3 levels
- Interactions typically involve only 2 factors. Three level and higher interactions are ignored.
- The experimental space is relatively smooth; there are no cliffs (e.g. phase changes).
As a result, additive models are a good fit to the space and can be determined by straightforward regression.
Y = B0 + B1×1 + B2×2 + B12x1x2 + B11×12 +…
In contrast, chemistry offers unique challenges to the team of experimenter and statistician. Chemistry is a science replete with nonlinearities, complex interactions, and nonquantitative factors and responses. Chemical experiments require more forethought and better planning than most DOE’s. Chemistry-specific elements must be considered.
Above all, chemists make mixtures of ‘stuff’. These may be catalysts, drugs, personal care items, petrochemicals, or others. A beginner trying to apply DOE to a mixture system may think to start with a conventional cubic factorial design. It soon becomes clear, however, that there is an impossible situation when the (+1, +1, +1) corner requires 100% of A and B and C! The actual experimental space of a mixture is a triangular simplex. This can be rotated into the plane to show a simplex design, and it can easily be extended to high dimensions such as a tetrahedron.
It is rare that a real mixture experiment will actually use 100% of the components as points. A real experiment with be constrained by upper and lower bounds, or by proportionality requirements. The active ingredients may also be tiny amounts in a solvent. The response to a mixture may be a function of the amount used (fertilizers or insecticides, for example). And the conditions of the process which the mixture is used in may also be important, as in baking a cake – or optimizing a pharmaceutical reaction. All of these will require special designs.
Fortunately, all of these simple and complex mixture designs have been extensively studied and are covered by Cornell3, Anderson et al4, and Design-Expert® software.
The goal of a kinetics study is an equation which describes the progress of the reaction. The fundamental reality of chemical kinetics is
Rate = f(concentrations, temperature).
However, the form of the equation is highly dependent on the details of the reaction mechanism! The very simplest reaction has the first-order form
Rate = k*C1
which is easily treated by regression. The next most complex reaction has the form
Rate = k*C1*C2
in which the critical factors are multiplied – no longer the additive form of a typical linear model. The complexity continues to increase with multistep reactions.
Catalysis studies are chemical kinetics taken to the highest degree of complication! In industry, catalysts are often improved over years or decades. This process frequently results in increasingly complex catalyst formulations with components which interact in increasingly complex ways. A basic catalyst may have as many as five active co-catalysts. We now find multiple 2-factor interactions pointing to 3-factor interactions. As the catalyst is further refined, the Law of Diminishing Returns sets in. As you get closer to the theoretical limit – any improvement disappears in the noise!
Chemicals are not Numbers
As we look at the actual chemicals which may appear as factors in our experiments, we often find numbers appearing as part of their names. Often the only difference among these molecules is the length of the chain (C-12, 14, 16, 18) and it is tempting to incorporate this as numeric levels of the factor. Actually, this is a qualitative factor; calling it numeric invites serious error! The correct description, now available in Design-Expert, is ’Discrete Numeric’.
The real message, however, is that the experimenters must never take off their ’chemist hat‘ when putting on a ’statistics hat’!
- Leardi, R., “Experimental design in chemistry: A tutorial.” Anal Chim Acta 2009, 652 (1-2), 161-72.
- Box, G. E. P.; Hunter, J. S.; Hunter, W. G., Statistics for Experimenters. 2nd ed.; Wiley-Interscience: Hoboken, NJ, 2005.
- Cornell, J. A., Experiments with Mixtures. 3rd ed.; John Wiley and Sons: New York, 2002.
- Anderson, M.J.; Whitcomb, P.J.; Bezener, M.A.; Formulation Simplified; Routledge: New York, 2018.