Title: | Gauss - Dantzig Selector: Aggregation over Random Models |
---|---|
Description: | The method aims to identify important factors in screening experiments by aggregation over random models as studied in Singh and Stufken (2022) <doi:10.48550/arXiv.2205.13497>. This package provides functions to run the Gauss-Dantzig selector on screening experiments when interactions may be affecting the response. Currently, all functions require each factor to be at two levels coded as +1 and -1. |
Authors: | Rakhi Singh [cre, aut] , John Stufken [aut] |
Maintainer: | Rakhi Singh <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.0 |
Built: | 2024-11-08 03:11:23 UTC |
Source: | https://github.com/agrakhi/gdsarm |
The Dantzig selector (DS) finds a solution for the model parameters
of a linear model, beta
using linear programming. For a given delta
,
DS minimizes the L_1-norm (sum of absolute values) of beta
subject to the constraint
that max(|t(X)(y-X * beta)|) <= delta
.
dantzig.delta(X, y, delta, plot = FALSE)
dantzig.delta(X, y, delta, plot = FALSE)
X |
a design matrix. |
y |
a vector of responses. |
delta |
a vector with the values of |
plot |
a boolean value of either TRUE or FALSE with TRUE indicating that the profile plot should be drawn. |
A matrix of the estimated values of beta
with each
row corresponding to a particular value of delta
.
Cand\'es, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics 35 (6), 2313–2351.
Phoa, F. K., Pan, Y. H. and Xu, H. (2009). Analysis of supersaturated designs via the Dantzig selector. Journal of Statistical Planning and Inference 139 (7), 2362–2372.
data(dataHamadaWu) X = dataHamadaWu[,-8] Y = dataHamadaWu[,8] #scale and center X and y scaleX = base::scale(X, center= TRUE, scale = TRUE) scaleY = base::scale(Y, center= TRUE, scale = FALSE) maxDelta = max(abs(t(scaleX)%*%matrix(scaleY, ncol=1))) # Dantzig Selector on 4 equally spaced delta values between 0 and maxDelta dantzig.delta(scaleX, scaleY, delta = seq(0,maxDelta,length.out=4))
data(dataHamadaWu) X = dataHamadaWu[,-8] Y = dataHamadaWu[,8] #scale and center X and y scaleX = base::scale(X, center= TRUE, scale = TRUE) scaleY = base::scale(Y, center= TRUE, scale = FALSE) maxDelta = max(abs(t(scaleX)%*%matrix(scaleY, ncol=1))) # Dantzig Selector on 4 equally spaced delta values between 0 and maxDelta dantzig.delta(scaleX, scaleY, delta = seq(0,maxDelta,length.out=4))
An analytical experiment conducted by Dopico-Garc\' ia et al. (2007) to characterize the chemical composition of white grapes simultaneously determining the most important phenolic compounds and organic acids for the grapes. This example has been further studied in Phoa et al. (2009b) for one phenolic compound, kaempferol-3-Orutinoside + isorhamnetin-3-O glucoside, which is also what we studied. It is accepted for these data that fitting a main-effects model suggests that V3 (Factor C), V4 (Factor D), and interaction V1:V4 (A:D) are active effects.
data(dataCompoundExt)
data(dataCompoundExt)
A data frame with 12 rows and 9 columns:
Factor A
Factor B
Factor C
Factor D
Factor E
Factor F
Factor G
Factor H
Response
Dopico-Garc\' ia, M.S., Valentao, P., Guerra, L., Andrade, P. B., and Seabra, R. M. (2007). Experimental design for extraction and quantification of phenolic compounds and organic acids in white "Vinho Verde" grapes Analytica Chimica Acta, 583(1): 15–22.
Phoa, F. K., Wong, W. K., and Xu, H (2009b). The need of considering the interactions in the analysis of screening designs. Journal of Chemometrics: A Journal of the Chemometrics Society, 23(10): 545–553.
data(dataCompoundExt) X = dataCompoundExt[,-9] Y= dataCompoundExt[,9]
data(dataCompoundExt) X = dataCompoundExt[,-9] Y= dataCompoundExt[,9]
A cast fatigue experiment with 12 runs and 7 factors was originally studied by Hunter et al. (1982), and was later revisited by Hamada and Wu (1992) and Phoa et al. (2009), among others. It is widely accepted for these data that V6 (F) and interaction V6:V7 (F:G) are active effects, with interaction of V1:V5 (A:E) possibly being active as well.
data(dataHamadaWu)
data(dataHamadaWu)
A data frame with 12 rows and 8 columns:
Factor A
Factor B
Factor C
Factor D
Factor E
Factor F
Factor G
Response
Hamada, M. and C. F. J. Wu (1992). Analysis of designed experiments with complex aliasing. Journal of Quality Technology 24 (3), 130–137.
Hunter, G. B., F. S. Hodi, and T. W. Eagar (1982). High cycle fatigue of weld repaired cast Ti-6AI-4V. Metallurgical Transactions A 13 (9), 1589–1594.
Phoa, F. K., Y. H. Pan, and H. Xu (2009). Analysis of supersaturated designs via the Dantzig selector. Journal of Statistical Planning and Inference 139 (7), 2362–2372.
data(dataHamadaWu) X = dataHamadaWu[,-8] Y= dataHamadaWu[,8]
data(dataHamadaWu) X = dataHamadaWu[,-8] Y= dataHamadaWu[,8]
This function runs the Gauss-Dantzig selector on the given columns.
We have two options: either (a) GDS(m) on the m
main
effects, and (b) GDS(m+2fi) on the m
main effects and the corresponding two-factor interactions.
For a given delta
, DS minimizes the L_1-norm (sum of absolute values)
of beta
subject to the constraint that max(|t(X)(y-X * beta)|)
<= delta
.
The GDS is run for multiple values of delta
. We use kmeans and BIC to select a best model.
GDS_givencols(delta.n = 10, design, Y, which.cols = c("main2fi"))
GDS_givencols(delta.n = 10, design, Y, which.cols = c("main2fi"))
delta.n |
a positive integer suggesting the number of delta values
to be tried. |
design |
a |
Y |
a vector of |
which.cols |
a string with either |
A list returning the selected effects as well as the corresponding important factors.
Cand\'es, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics 35 (6), 2313–2351.
Dopico-Garc\' ia, M.S., Valentao, P., Guerra, L., Andrade, P. B., and Seabra, R. M. (2007). Experimental design for extraction and quantification of phenolic compounds and organic acids in white "Vinho Verde" grapes Analytica Chimica Acta, 583(1): 15–22.
Hamada, M. and Wu, C. F. J. (1992). Analysis of designed experiments with complex aliasing. Journal of Quality Technology 24 (3), 130–137.
Hunter, G. B., Hodi, F. S. and Eagar, T. W. (1982). High cycle fatigue of weld repaired cast Ti-6AI-4V. Metallurgical Transactions A 13 (9), 1589–1594.
Phoa, F. K., Pan, Y. H. and Xu, H. (2009). Analysis of supersaturated designs via the Dantzig selector. Journal of Statistical Planning and Inference 139 (7), 2362–2372.
Singh, R. and Stufken, J. (2022). Factor selection in screening experiments by aggregation over random models, 1–31. doi:10.48550/arXiv.2205.13497
data(dataHamadaWu) X = dataHamadaWu[,-8] Y = dataHamadaWu[,8] delta.n = 10 # GDS on main effects GDS_givencols(delta.n, design = X, Y=Y, which.cols = "main") # GDS on main effects and two-factor interactions GDS_givencols(delta.n, design = X, Y=Y) data(dataCompoundExt) X = dataCompoundExt[,-9] Y = dataCompoundExt[,9] delta.n = 10 # GDS on main effects GDS_givencols(delta.n, design = X, Y=Y, which.cols = "main") # GDS on main effects and two-factor interactions GDS_givencols(delta.n, design = X, Y=Y, which.cols = "main2fi")
data(dataHamadaWu) X = dataHamadaWu[,-8] Y = dataHamadaWu[,8] delta.n = 10 # GDS on main effects GDS_givencols(delta.n, design = X, Y=Y, which.cols = "main") # GDS on main effects and two-factor interactions GDS_givencols(delta.n, design = X, Y=Y) data(dataCompoundExt) X = dataCompoundExt[,-9] Y = dataCompoundExt[,9] delta.n = 10 # GDS on main effects GDS_givencols(delta.n, design = X, Y=Y, which.cols = "main") # GDS on main effects and two-factor interactions GDS_givencols(delta.n, design = X, Y=Y, which.cols = "main2fi")
The GDS-ARM procedure consists of three steps. First, it runs
the Gauss Dantzig Selector (GDS) nrep
times, each time
with a different set of nint
randomly selected two-factor interactions.
All m
main effects are included in each GDS run. Second, the best
ntop
models are identified with the smallest BIC. Effects that
appear in at least pkeep x ntop
of the ntop
models are then passed on to the third stage. In the third stage, stepwise
regression is used. With
n
being the number of runs, the stepwise regression starts
with at most n-3
selected effects from the previous step. The
remaining effects from the previous step as well as all main effects are
given a chance to enter into the model using the forward-backward stepwise
regression. The function also has the option of using the modified GDS-ARM.
The modified version incorporates effect heredity in two steps, first, for
each model found by GDS, we ignore active interactions when at least one of
the main effects is not active (for weak heredity) or when both main effects are not
active (for strong heredity); and second, we do the same for the model found
after the stepwise stage of GDS-ARM.
GDSARM( delta.n = 10, nint, nrep, ntop, pkeep, design, Y, cri.penter = 0.01, cri.premove = 0.05, opt.heredity = c("none"), seedvalue = 1234 )
GDSARM( delta.n = 10, nint, nrep, ntop, pkeep, design, Y, cri.penter = 0.01, cri.premove = 0.05, opt.heredity = c("none"), seedvalue = 1234 )
delta.n |
a positive integer suggesting the number of delta values
to be tried. |
nint |
a positive integer representing the number of randomly
chosen interactions. The suggested value to use is the ceiling of 20%
of the total number of interactions, that is, for |
nrep |
a positive integer representing the number of times GDS should
be run. The suggested value is |
ntop |
a positive integer representing the number of top models
to be selected among the |
pkeep |
a number between 0 and 1 representing the proportion of |
design |
a |
Y |
a vector of |
cri.penter |
the p-value cutoff for the most significant effect to enter into the stepwise regression model. The suggested value is 0.01. |
cri.premove |
the p-value cutoff for the least significant effect to exit from the stepwise regression model. The suggested value is 0.05. |
opt.heredity |
a string with either |
seedvalue |
a seed value that will fix the set of interactions being selected. The default value is seed to 1234. |
A list returning the selected effects as well as the corresponding important factors.
Cand\'es, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics 35 (6), 2313–2351.
Dopico-Garc\' ia, M.S., Valentao, P., Guerra, L., Andrade, P. B., and Seabra, R. M. (2007). Experimental design for extraction and quantification of phenolic compounds and organic acids in white "Vinho Verde" grapes Analytica Chimica Acta, 583(1): 15–22.
Hamada, M. and Wu, C. F. J. (1992). Analysis of designed experiments with complex aliasing. Journal of Quality Technology 24 (3), 130–137.
Hunter, G. B., Hodi, F. S. and Eagar, T. W. (1982). High cycle fatigue of weld repaired cast Ti-6AI-4V. Metallurgical Transactions A 13 (9), 1589–1594.
Phoa, F. K., Pan, Y. H. and Xu, H. (2009). Analysis of supersaturated designs via the Dantzig selector. Journal of Statistical Planning and Inference 139 (7), 2362–2372.
Singh, R. and Stufken, J. (2022). Factor selection in screening experiments by aggregation over random models, 1–31. doi:10.48550/arXiv.2205.13497
data(dataHamadaWu) X = dataHamadaWu[,-8] Y = dataHamadaWu[,8] delta.n = 10 n = dim(X)[1] m = dim(X)[2] nint = ceiling(0.2*choose(m,2)) nrep = choose(m,2) ntop = max(20, nint*nrep/(2*choose(m,2))) pkeep = 0.25 cri.penter = 0.01 cri.premove = 0.05 design = X # GDS-ARM with default values GDSARM(delta.n, nint, nrep, ntop, pkeep, X, Y, cri.penter, cri.premove) # GDS-ARM with default values but with weak heredity opt.heredity="weak" GDSARM(delta.n, nint, nrep, ntop, pkeep, X, Y, cri.penter, cri.premove, opt.heredity) data(dataCompoundExt) X = dataCompoundExt[,-9] Y = dataCompoundExt[,9] delta.n = 10 n = dim(X)[1] m = dim(X)[2] nint = ceiling(0.2*choose(m,2)) nrep = choose(m,2) ntop = max(20, nint*nrep/(2*choose(m,2))) pkeep = 0.25 cri.penter = 0.01 cri.premove = 0.05 design = X # GDS-ARM on compound extraction GDSARM(delta.n, nint, nrep, ntop, pkeep, X, Y, cri.penter, cri.premove) # GDS-ARM on compound extraction with strong heredity opt.heredity = "strong" GDSARM(delta.n, nint, nrep, ntop, pkeep, X, Y, cri.penter, cri.premove, opt.heredity)
data(dataHamadaWu) X = dataHamadaWu[,-8] Y = dataHamadaWu[,8] delta.n = 10 n = dim(X)[1] m = dim(X)[2] nint = ceiling(0.2*choose(m,2)) nrep = choose(m,2) ntop = max(20, nint*nrep/(2*choose(m,2))) pkeep = 0.25 cri.penter = 0.01 cri.premove = 0.05 design = X # GDS-ARM with default values GDSARM(delta.n, nint, nrep, ntop, pkeep, X, Y, cri.penter, cri.premove) # GDS-ARM with default values but with weak heredity opt.heredity="weak" GDSARM(delta.n, nint, nrep, ntop, pkeep, X, Y, cri.penter, cri.premove, opt.heredity) data(dataCompoundExt) X = dataCompoundExt[,-9] Y = dataCompoundExt[,9] delta.n = 10 n = dim(X)[1] m = dim(X)[2] nint = ceiling(0.2*choose(m,2)) nrep = choose(m,2) ntop = max(20, nint*nrep/(2*choose(m,2))) pkeep = 0.25 cri.penter = 0.01 cri.premove = 0.05 design = X # GDS-ARM on compound extraction GDSARM(delta.n, nint, nrep, ntop, pkeep, X, Y, cri.penter, cri.premove) # GDS-ARM on compound extraction with strong heredity opt.heredity = "strong" GDSARM(delta.n, nint, nrep, ntop, pkeep, X, Y, cri.penter, cri.premove, opt.heredity)