Image Manipulation via Neuro-Symbolic Networks
Harman Singh, Poorva Garg, et al.
NeurIPS 2022
We address the problem of learning a risk-sensitive policy based on the CVaR risk measure using distributional reinforcement learning. In particular, we show that the standard action-selection strategy when applying the distributional Bellman optimality operator can result in convergence to neither the dynamic, Markovian CVaR nor the static, non-Markovian CVaR. We propose modifications to the existing algorithms that include a new distributional Bellman operator and show that the proposed strategy greatly expands the utility of distributional RL in learning and representing CVaR-optimized policies. Our proposed approach is a simple extension of standard distributional RL algorithms and can therefore take advantage of many of the recent advances in deep RL. On both synthetic and real data, we empirically show that our proposed algorithm is able to learn better CVaR-optimized policies.
Harman Singh, Poorva Garg, et al.
NeurIPS 2022
Jiaqi Han, Wenbing Huang, et al.
NeurIPS 2022
Hiroki Yanagisawa, Kohei Miyaguchi, et al.
NeurIPS 2022
Debarghya Mukherjee, Felix Petersen, et al.
NeurIPS 2022