贝尔曼公式

State value: $v_{\pi}(s)=E[G_t|S_t=s]$
Action value: $q_{\pi}(s,a)=E[G_t|S_t=s,A_t=a]$

The Bellman equation(elementwise form):

$$ \begin{aligned} v_{\pi}(s) &=\sum_{a}\pi(a|s)[\sum_r p(r|s,a)r+\gamma\sum_{s'}p(s'|s,a)v_{\pi}(s')]\\ &=\sum_a\pi(a|s)q_{\pi}(s,a) \end{aligned} $$

The Bellman equation(matrix-vector form)

$$ \begin{aligned} v_{\pi}=r_{\pi}+\gamma P_{\pi}v_{\pi} \end{aligned} $$

How to solve the Bellman equation:

  • closed-form solution:需要求逆,不推荐
  • iterative solution: 构造迭代序列满足关系$v_{k+1}=r_{\pi}+\gamma P_{\pi}v_k$,序列极限即为所求
作者

Jiamin Liu

发布于

2025-06-26

更新于

2025-06-26

许可协议

评论