Algorithm | Paper Link | Citation |
---|---|---|
TD3 | https://arxiv.org/abs/1802.09477v3 | Fujimoto, Scott, Herke Hoof, and David Meger. “Addressing function approximation error in actor-critic methods.” In International conference on machine learning, pp. 1587-1596. PMLR, 2018. |
SAC | https://arxiv.org/abs/1801.01290 | Haarnoja, Tuomas, Aurick Zhou, Pieter Abbeel, and Sergey Levine. “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor.” In International conference on machine learning, pp. 1861-1870. PMLR, 2018. |
TQC | https://arxiv.org/abs/2005.04269 | Kuznetsov, Arsenii, Pavel Shvechikov, Alexander Grishin, and Dmitry Vetrov. “Controlling overestimation bias with truncated mixture of continuous distributional quantile critics.” In International Conference on Machine Learning, pp. 5556-5566. PMLR, 2020. |
DDPG | https://arxiv.org/abs/1509.02971 | Lillicrap, T. P. “Continuous control with deep reinforcement learning.” arXiv preprint arXiv:1509.02971 (2015). |
CTD4 | https://arxiv.org/abs/2405.02576 | Valencia, David, Henry Williams, Yuning Xing, Trevor Gee, Bruce A. MacDonaland, and Minas Liarokapis. “CTD4-A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple Critics.” arXiv preprint arXiv:2405.02576 (2024). |
PPO | https://arxiv.org/abs/1707.06347 | Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. “Proximal policy optimization algorithms.” arXiv preprint arXiv:1707.06347 (2017). |
DQN | https://www.nature.com/articles/nature14236 | Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. “Human-level control through deep reinforcement learning.” nature 518, no. 7540 (2015): 529-533. |