HC%20Logo.gif (1606 bytes)

Up ]

 

 

The Hyper-Greco-Latin Square Experimental Design as a Formulation Ingredient Selection Tool

This paper was presented to the National Meeting, October, 1995 of the NLGI

Introduction

Formulations are functional mixtures. As such, each ingredient is present to provide or modify one or more properties or functions for the final product. A dilemma facing the development team is this. For each function, there are usually at least a few and often several possible ingredient choices. Therefore, one of the early tasks in developing a new formulation is to find a particularly good combination of ingredients to start an optimization process. This is usually to be done in the face of an often overwhelming number of possible combinations of the choices.

This paper is an survey of the risks one faces when embarking on such a task. It features an industrial example of the use of the Latin-square experimental design to minimize those risks during formulation development. Finally, it describes an expansion of the Latin square to a hyper-Greco-Latin-square(HGLS) in which 6 different ingredients, each having 5 different choices, can be examined in 25 experiments. The HGLS is a method of ingredient selection that has proven to be a superior strategy for use in the early stages of formulation development.

Comparisons

One simplifying view of ingredient selection is to consider each choice as a chain of comparisons between the performance of the several ingredient choices. One wishes to know which of the several choices is best. The basic framework is contained in the following question. "Is Treatment A sufficiently different from Treatment B to be of significant interest?" Examples of comparisons occur in all stages of R&D. These comparisons could be addressing the relative worth of two different additives in a lubricant, two different vendors for the same additive, two different processing treatments (high or low) of temperature, time or shear, or perhaps two different machines or analytical instruments. The essential nature of the question is that it can be answered with a yes or no as to whether or not there is an interesting difference between them.

Decisions and Risk

There are two obvious positions one might take depending on the outcome of one's laboratory effort. The experimentalist might declare that "Yes, there is an important difference," or "No, there is not sufficient difference to be of interest."

Our task is to discover the truth about which additive is better. However, we can never actually know this truth, we can only infer it from our experience. There are two ways to draw a conclusion that is congruent with this physical truth. We can declare that a difference between the two treatments exists, when it does in fact, and we can declare that no significant difference exists when there is truly none.

Conversely, there are two ways to draw a wrong or incorrect conclusion. In such cases, our belief is not congruent with the physical world. We can declare that one system is better than the other, when we have actually mistaken the background noise for an interesting signal. Or, we may declare that there is nothing interesting in the difference, when there is, in fact, a difference that would be important to know about.

False Positives and False Negatives

We will call the former erroneous outcome a 'False Positive.' We make such an error when we mistakenly declare that we have a treatment difference when all we have been doing is observing the background noise. This is also called a Type I error, and the probability of such an event happening is called the a-risk. Experience shows that for many experimentalists (and their supervisors and managers), the rate of false positives can be over 30% in the absence of appropriate statistical defenses. We will discuss the defenses below.

Returning to the issue of errors, we will term the error in which we overlook a true signal a 'False Negative.' We make this error when we mistakenly declare that we do not see a treatment difference, when one actually exists. In such cases, we have overlooked something we would be interested in knowing. This is called a Type II error, and the probability of such an event happening is called the b-risk.

Our situation is shown in Figure 1 below. There are two ways to draw the correct conclusion, and two ways to get it wrong.

HGLS%20Figure%201.gif (2814 bytes)                  

Figure 1


The terms Type I and Type II are largely used by statisticians.

Errors versus Mistakes

At this point, we must distinguish between errors and mistakes. We make an error when we draw conclusions that do not agree with the long-term behavior of our system due to the effects of background noise alone. This is not a mistake. A mistake is a blunder. If there is a special situation or event that has a consequence that influences our decision, we have made a mistake. An example of a mistake is using the wrong amount of an additive (mis-weighing), but concluding that the resulting difference was due to the ingredient choice.

Symptoms

The basic symptom of the false positive is irreproducible results. A false positive is usually discovered after further study of the 'interesting' difference that has been declared. The experimentalist finds that the result is not repeatable. Additional consequences often include expenditure of significant resources trying to figure out how to get the 'good' result again, when actually the result was due to noise. Another significant consequence is the loss of credibility to the researcher (or the lab that generated the data, more about this later). His or her work comes under question if a significant number of these situations occur.
The worst case scenario for programs with high rates of false positives are new product candidate leads which do not pan out under further investigation. Usually, these further investigations involve more expensive tests, additional tests, increased work load. In the worst cases, field trial failures can result when candidates are moved forward based on false positives. Often it is concluded that the candidates were moved forward too fast.
On the other hand, there are typically no symptoms of false negatives. These are silent errors. Typically, we walk away from the experimental area of study when we declare that there is nothing of interest in the results. Rarely do we go back and re-examine areas that we believe are without interest. Finally, if there is an important discovery that we have overlooked, we create an opportunity for the competition with a false negative.

Noise and Its Impact

Background noise and its misinterpretation puts a scientist and his or her business patrons, at risk of drawing the incorrect conclusion and expending R&D resources in pursuit of erroneous hypotheses. The minimization of these errors is the reason to engage in experimental design. It is the major reason to use techniques like the hyper-Greco-Latin square in formulation development.

Variation

These difficulties stem from the fact that all things vary. We can never actually 'repeat' a batch, a run, or a reaction exactly. We will get results that are (usually) similar to but not exactly like our previous experience. Thus, in any comparison we are measuring both the treatment differences and the background noise simultaneously.

Strategy

Minimizing false positive declarations requires a clear expectation of what magnitude of differences we might reasonably expect due to background noise alone. This is like the 'blank' used by analytical chemists. We run comparisons and ask the following question.
"Is the difference I measure unusual or unexpected based upon what I know about the background noise of my system?"
We then protect ourselves from false positives by making positive declarations only if we can answer this question affirmatively. Furthermore, we can decide before we run the experiments how unusual the difference needs to be for us to call it a signal. We ask the following question:
"What is the probability that such a difference could come from background noise alone?"

We use a table of background noise to address this question. This table is the (often misunderstood) Student's t-table.
What is the key to not overlooking true signals? Minimizing false negative declarations requires obtaining enough of the right kind of information to defend against the background noise of the system. We need to average together replicated experiments to 'smooth out' the noise so that the signals are more evident. By using averages we make our comparisons more sensitive.

How Much Data Do I Need?

Another outcome of the above discussion is this. The amount of data to be averaged can usually be calculated at the outset of any experimental project. One can decide if the information is worth the cost and do something else with our resources if it is not. This is not a bad thing to decide. It is far better than engaging in a project with the vain hope that we can get an answer quickly and/or cheaply.
There are two major benefits to this approach. The first is that we don't fall into the 'just a few more experiments' trap. In my experience, a significant portion of some R&D budgets get consumed this way. The other benefit is that enough data will be collected before the opportunity to get more data is gone. Field trials with insufficient tests are a classic example that can be avoided. Another is that we will manufacture enough test grease of each type, once we know how many tests must be performed.
How much data one needs is dependent upon three things:

  1. How variable the system is. What is the magnitude of the back-ground noise of the system.
  2. How small of a difference do you want to be reasonably confident of detecting. What is the magnitude of your signal threshold?
  3. How confident do you want to be in your conclusions? How much risk are you willing to accept? What is the likelihood that you will make an incorrect declaration.

The mathematical formula is given below.

N%3D2s2d2D2.gif (985 bytes)

in which

N is the number of replicated experiments required at each of the two levels.
is the standard deviation of the population from which the experiments will be taken,
is the true signal size that one wants to be reasonably sure of detecting, should it really exist, and
is a constant that incorporates the level of risk that the experimentalist is willing to accept. =3.0 if we are willing to accept 5% false positives and 15% false negatives.

We will not dwell on this equation, but it is the fundamental relationship between sample size, signal, noise, and confidence in decision-making.

This relationship has several consequences. For example, the larger the background noise, the more data one needs to detect a difference of any given size. Or, the larger the noise, the less confident one is in a decision given similar signal and noise. Another is that the larger the noise, the more data one needs to detect a difference of any given size at the same level of confidence.

R&D Strategy & Experimental Design

Let us return to the issue of developing a formulation, and choosing winning combinations of ingredients. We now have in hand a framework to explain why the one experiment at a time approach is fraught with difficulties, and why we need to average together several (sometimes many) replicated experiments before we make a decision. However, without some care, this averaging increases our work load and hence our costs. We cannot afford to do this without some way to improve our efficiency.

What we need is a method to combine together the right number of replicated experiments in a particular way that gives answers to more than one question at a time. There are many of these schemes. Collectively, they are called 'experimental design.'

Experimental design is a collection of vigorous methods for obtaining information about any experimental system under study. The reason to learn and use them is to obtain unambiguous results at minimum cost.

There are several basic types of designs. Those that examine many possible factors and separate which are potent from which are weak are called screening designs. These screen out the interesting from the useless. We can subsequently focus our attention on the interesting.

Early in a formulation development plan, we often wish to pick winning combinations of ingredients. We will optimize their levels, and the production factors such as temperature and time later. First we need to know which ingredients we wish to study. One such family of designs, the Latin-Squares has been used repeatedly in functional mixture development. This author collaborated with Dr. Carl E. Ward in planning to use such a design for grease formulation ingredient selection.

The Hyper-Greco-Latin Square

Let us consider some of the work of Ward and Littlefield, which was reported at last year's NLGI conference. This publication has received the Clarence E. Earl Memorial Award for the best technical paper of last years meeting. Briefly, this work represents the sequential use of experimental design tools in a formulation development program. They used a Latin square experimental design as a first-step. This method allowed them to select the best of each of a group of 5 choices for each of 3 ingredients . This was their opening move in their program of developing a new grease. The diagram of the Latin square is shown in Figure 2.

HGLS%20Figure%202.gif (4479 bytes)

Figure 2

Ward and Littlefield examined 5 different EP/anti-wear agents, 5 different rust inhibitors, and 5 different copper passivators. Note that the columns and rows contain, respectively, the 5 levels of E/P agents and Rust Inhibitors. Each of the squares contains one copper passivator. Each square represent a single experimental preparation.

Note that no pair-wise combinations of any two ingredients occur more than once. For example, the E/P-anti-wear agent, "C2" combined with the rust inhibitor, "R4" occurs only once. This is true for each column and row Each copper passivator occurs once in each row and once in each column. Each square has a single choice of each ingredient type. Also, the plan is balanced in that each ingredient of a given type is used in 5 of the formulations. Thus, the average of the 5 responses from Row 1, is to be compared with the average of the response of the 5 experiments in Rows 2 through 5. Likewise, the averages of the 5 columns will be compared among each other to determine the best performance of E/P agents. Finally, the average of the 5 experiments for each letter, A through E will be compared. Thus, the best choice for each can be found easily. Note that in each average, all other levels of each other factors occur only once, their contributions in effect, canceling out.

This design represents 125 possible combinations. Ward and Littlefield made all of their comparisons with the power of averages of 5 vs. averages of 5. They did so with only 25 experiments. They then prepared a prediction equation and calculated the expected outcome the outcome of all of the 125 possible combinations. The predictive ability of the equations were tested by preparing those that were expected to have superior performance. Thus, they were able to identify winning combinations with a minimum of effort, and an maximum of confidence.

Twenty five may seem like a lot of experiments, but they examined a total of 125 possible combinations (5%20to%20the%203rd.gif (857 bytes))with the power of averaging 5 vs. 5. This represents an effort of 625 separate preparations that were examined with the effort of 25 experiments. This is an extremely efficient use of one's experimental and testing effort.

Adding One More Variable

Yet, as efficient as buying the results of an effort of 625 experiments for the cost of 25 appears, this design can accommodate additional variables. Consider studying an oxidation inhibitor, in addition to the Ward-Littlefield list of 3. If we do this, we now have 25 experiments in which we examine 5 levels of each of 4 different ingredients, all with an average of 5 vs. an average of 5. This is shown in Figure 3.

HGLS%20Figure%203.gif (4919 bytes)

Figure 3

This Greco-Latin square represents a total of 625 combinations ,(5%20to%20the%204th.gif (856 bytes)), and, with averaging, the effort of 3,125 separate experiments. Again, this is done in only 25 experiments. Each row has each of A, B, C, D, and E only once. Each row has a, b, g, d, and e occur only once. The same is true for each column. Therefore, we have maintained the balance of the Latin square. Now that we have added Greek letters, the design is called a Greco-Latin square design.

Does it stop here? No! In fact, two more factors may be examined at 5 levels each with these same 25 experiments. We add each as a group of 5 levels. These highly saturated designs are called hyper-Greco-Latin squares. Consider Figure 4, below.

HGLS%20Figure%204.gif (4623 bytes)

Figure 4

We wish to examine 6 different factors, each with 5 possible choices. The factors could be anti-oxidants, metal passivators, E/P-anti-wear agents, viscosity modifiers, thickeners, thickener complexes, dyes, or any other additive to our formulation. In this 6-ingredient, 5-level case we have 15,625 possible combinations of ingredients (5%20to%20the%206th.gif (860 bytes)). We will still consider the contribution of each with the confidence of an average of 5 compared with averages of 5. This represents the effort of 78,125 separate experiments. Again, we will examine them with only 25 experiments. This is efficiency!

Although there is not time to go through an actual case study of this design, the method has been used in many industrial settings. Paints, inks, plastics, and pesticides, as well as now greases have been developed through the use of this experimental design.

Conclusions

The Hyper-Greco-Latin Square is a robust design for screening the best combinations of ingredients from the many possibilities. The mathematics for its effective use is simple, and understandable. This method deserves to be considered the next time a functional mixture formulation is the goal of an R&D effort.

References

  1. Box, George E. P., Stuart G. Hunter, and William G. Hunter; 'Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building'; John Wiley, 1978.

  2. Ward, Carl E., and Carlos E. Littlefield; 'Experimental Design in New Grease Development'; Presented to the 61st Annual Meeting of the National Lubricating Grease Institute; October 23-26, 1994; Palm Springs, CA

HC%20Logo.gif (1606 bytes)

1125-B Arnold Drive, Suite 271
Martinez, CA 94553-4108

Voice
FAX