C.A. Micchelli, W.L. Miranker
Journal of the ACM
We study contextual bandit algorithms for high-dimensional combinatorial action spaces involving a class of structured discrete constrained optimization. We propose a novel approach based on general-purpose regression oracles, leveraging Inverse Gap Weighting (IGW) for seamless integration within an optimization framework. IGW sampling is efficiently managed by: (a) Column generation reformulation of the underlying Mixed Integer Programming (MIP) model, enabling flexible lower-level predictors, causal coherence, and efficient representation of large action spaces. (b) Diverse solution pool generation to balance the exploration-exploitation trade-off in large action spaces. To address discontinuities in the reward function’s gradient due to optimization constraints, we incorporate a risk-averse phased learning strategy. Through ablation studies, we demonstrate that each component is critical for maximizing cumulative rewards. Finally, we validate the practical viability of our approach on a real-world Auto-Scaling problem in IT automation.
C.A. Micchelli, W.L. Miranker
Journal of the ACM
Saurabh Paul, Christos Boutsidis, et al.
JMLR
Joxan Jaffar
Journal of the ACM
Kenneth L. Clarkson, Elad Hazan, et al.
Journal of the ACM