- Home
- Documents
*Approximate Local D-optimal Experimental Design for Binary ... dms/GLM_Design/Technical...*

prev

next

out of 17

View

0Download

0

Embed Size (px)

Approximate Local D-optimal Experimental Design for Binary Response

Technical Report RP-SOR-0501

Hovav A. Dror and David M. Steinberg

Department of Statistics and Operations Research

Raymond and Beverly Sackler Faculty of Exact Sciences

Tel Aviv University

Ramat Aviv 69978

Israel

Email: dms@post.tau.ac.il

November 2005

Abstract: A fast and simple method is proposed that produces approximate multivariate local D-optimal

designs of high e¢ ciency for models with binary response. The method assumes availability of a D-optimal

design for a parallel normal response linear problem that has the same linear predictor, with an assumption

of homogenous variance; the change required to transform the standard design into an e¢ cient one for a

multivariate logit or probit model is to shift any design point whose probability is very low or very high (and

is therefore non-informative) into the nearest feasible point of moderate probability.

KEY WORDS: Generalized Linear Models ; Logistic Response; Logit; Probit; Design of Experiments

1. INTRODUCTION

Construction of an experimental design for a generalized linear model presents a level of complexity greater than

that required for a model with a normally distributed error term of constant variance. Finding an optimal design

for a linear model is already a numerically intensive optimization problem. However, some linear models can be

characterized and have a known trivial optimal design, and designs for more complicated models can be sought

through available software such as "gosset" (Hardin and Sloane 1993), the statistical toolbox in MATLAB (The

Mathworks, inc), JMP or the SAS Optex procedure. Extension from a linear model to a logistic one does not

retain these qualities. Trivial solutions to linear model design problems are often factorial designs, utilizing the

corners of the design region; for a logisitic response, some of these corners have probabilities that are close to

zero or one, so that the responses are almost deterministic, and are therefore non-informative. Furthermore,

optimal design for binary response depends on the unknown coe¢ cients, and hence two experiments having

the same model but di¤erent coe¢ cient values will typically have di¤erent optimal designs. Common software

packages assume all observations have homogenous variance and so do not provide a remedy for generalized

linear models.

1

This paper proposes a simple method for constructing an approximate local D-optimal design for a multi-

variate binary response problem with the logit or probit link.

2. EXPERIMENTAL DESIGN FOR GENERALIZED LINEAR MODELS

Most work on Experimental Design focuses on linear models with a continuous response. A common assumption

is that the error term of the model is normally distributed and that its variance is constant over the design

region. These assumptions are not met (even asymptotically) with the popular logistic model or with the probit

model for binary response data. Standard procedure for the analysis of these models uses iteratively reweighted

least squares, which asymptotically leads to weights that enable solving the problem as a simple linear model

(McCullagh and Nelder 1989). These weights depend on the parameters of the problem and are generally

di¤erent for two binary response problems of the same structure but di¤erent true (unknown) coe¢ cients.

Frequently, Experimental Design uses optimality criteria based on Fishers Information Matrix. Of these,

D-optimality is the most intensively studied. For the normal response setting with y = � + " and � = F�;

D-optimality maximizes the determinant of the information matrix FTF and so minimizes the volume of the

condence ellipsoid for the unknown coe¢ cients �. Atkinson & Donev (1992) note that the assumptions of

normality and constancy of variance enter the optimal design through the information matrix. They indicate

that instead of FTF; other matrices are appropriate for non-normal distributions, and that once the appropriate

matrix has been dened the principles and practice of optimal experimental design are similar to continuous

response problems. This means that the optimal experimental design is driven from the information matrix

of the parameters FTWF ; the matrix of weights, W , depends on the unknown parameters and on the design.

This dependence creates a di¢ culty in constructing an optimal experimental design.

2.1 Local D-Optimal Designs

Because a D-optimal design depends on the coe¢ cients, its derivation require us to make some assumptions

on the values of the unknowns we want to estimate. When we use an initial estimate (rather than a Bayesian

a-priori distribution, for instance) we produce a design that is optimal only locally, for the coe¢ cient values

that were adopted. If the initial estimate is poor, the designs performance may be far from optimal. D-optimal

designs that are based on an initial estimate for the unknown coe¢ cient values are designated as local D-optimal

designs.

2.2 Local D-Optimal Designs for a logistic model

Abdelbasit & Plackett (1983) discuss the construction of local D-optimal designs for binary response with one

explanatory variable. They show that for an unconstrained univariate problem with the model logit(p) =

�0 + �1x the local D-optimal design places half the points where the estimated probability is 0.824 and the

other half at the location with probability 0.176. Univariate binary response has been further considered by

many authors, see for example Atkinson & Donev (1992), Sitter (1992), Hedayat, Yan & Pezzuto (1997), or

Mathew & Sinha (2001).

2

Although the nal coordinates creating the optimal design are found numerically, the function to maximize

is derived through analytical procedures. Increasing the number of unknowns complicates both the analytical

route and the numerical process. This is probably the reason not much has been published on multivariate

design problems. Chipman and Welch (1996) compare D-optimal designs for Generalized Linear Models over

a constrained region to linear regression D-optimal designs, including multifactor problems. Their comparison

was based on computer generated D-optimal designs, using weighted linear models. This way they show the

general phenomenon that points in the linear model optimal design may "move in" from the edges of the design

region for the logistic model. Sitter and Torsney (1995) analyze D- and c- optimal designs for binary response

experiments with two design variables and various link functions; Smith and Ridout (2003) obtain optimal

Bayesian designs for bioassays involving two parallel dose response relationships where the main interest is in

doses of one substance; Atkinson, Demetrio & Zocchi (1995) describe a dose response experiment with two

variables - one continuous and one indicator variable, considering the special case in which the value of the

indicator variable is unknown during the process of the experiment, but is available for the posterior analysis.

Atkinson (2005) gives examples of rst order D-optimal designs with two variables and discusses usage of

standard factorial design as an approximation, similar to the method proposed here; but - assuming one needs

to search for the approximate design over irregular design regions Atkinson concludes that "Searching over the

original design space to nd optimal design for the generalized linear model would both be easier and lead to a

more e¢ cient design than would trying to nd such a regression approximation".

It is noted that the treatment described so far was limited to a rst order model with two variables. Woods,

Lewis, Eccleston and Russel (2005) o¤er a method of creating multivariate compromise designs that are robust

to uncertainty in aspects of the model, including the uncertainty in the coe¢ cient values, di¤erent choices of

a link function and various models, including interactions. As can be expected for such complex designs, their

method requires intensive computation.

We will now suggest a simple method of constructing approximate local D-optimal designs. This method is

not limited to a small number of covariates, or to rst-order designs without interactions. It can produce ap-

proximate designs almost instantaneously, as it does not require intensive computation, and it is straightforward

to implement. There are two cornerstones for this algorithm; rst, one needs an optimal design for an analogous

linear model (that is, the optimal design for a problem with the same formulation and a constant variance).

Second, we use computation of the probability at design points according to the specic initial coe¢ cient values.

3. REGION OF LINEARITY

We are considering a binary response with the logistic link. That is pi = e Fi�

1+eFi� ; Fi being the i-th row of the

regression matrix F: We attempt to nd an approximation for a local optimal design. As explained, the term

"local" implies that the design relates to particular values of the coe¢ cients.

Note that while both � and feasible points for the design space may be of any di