Alexander Zadorojniy, Takayuki Osogami, et al.
IJCAI 2023
We consider the subclass of linear programs that formulate Markov Decision Processes (mdps). We show that the Simplex algorithm with the Gass-Saaty shadow-vertex pivoting rule is strongly polynomial for a subclass of mdps, called controlled random walks (CRWs); the running time is O({pipe}S{pipe}3{dot operator}{pipe}U{pipe}2), where {pipe}S{pipe} denotes the number of states and {pipe}U{pipe} denotes the number of actions per state. This result improves the running time of Zadorojniy et al. (Mathematics of Operations Research 34(4):992-1007, 2009) algorithm by a factor of {pipe}S{pipe}. In particular, the number of iterations needed by the Simplex algorithm for CRWs is linear in the number of states and does not depend on the discount factor. © 2012 Springer Science+Business Media, LLC.
Alexander Zadorojniy, Takayuki Osogami, et al.
IJCAI 2023
Guy Even, Joseph Naor, et al.
SIAM Journal on Computing
Guy Even, Sudipto Guha, et al.
SIAM Journal on Computing
Leon Stok, Ilan Spillinger, et al.
EDTC 1995