贝尔曼公式
State value: $v_{\pi}(s)=E[G_t|S_t=s]$
Action value: $q_{\pi}(s,a)=E[G_t|S_t=s,A_t=a]$
The Bellman equation(elementwise form):
$$
\begin{aligned}
v_{\pi}(s) &=\sum_{a}\pi(a|s)[\sum_r p(r|s,a)r+\gamma\sum_{s'}p(s'|s,a)v_{\pi}(s')]\\
&=\sum_a\pi(a|s)q_{\pi}(s,a)
\end{aligned}
$$
The Bellman equation(matrix-vector form)
$$
\begin{aligned}
v_{\pi}=r_{\pi}+\gamma P_{\pi}v_{\pi}
\end{aligned}
$$
How to solve the Bellman equation:
- closed-form solution:需要求逆,不推荐
- iterative solution: 构造迭代序列满足关系$v_{k+1}=r_{\pi}+\gamma P_{\pi}v_k$,序列极限即为所求