add fourth chapter

This commit is contained in:
skindhu 2024-11-04 19:44:48 +08:00
parent 95044d6979
commit 20f3c0c7f8
1 changed files with 2 additions and 2 deletions

View File

@ -668,13 +668,13 @@ layers.4.0.weight has gradient mean of 1.3258541822433472
>
> - 根据反向传播的原理,**无快捷连接**时,梯度必须逐层传递,如下:
>
> $$ \frac{\partial L}{\partial X_{1}}=\frac{\partial L}{\partial X_{3}} \cdot \frac{\partial X_{3}}{\partial X_{2}} \cdot \frac{\partial X_{2}}{\partial X_{1}} $$
> $$ \frac{\partial L}{\partial X_{1}}=\frac{\partial L}{\partial X_{3}} \cdot \frac{\partial X_{3}}{\partial X_{2}} \cdot \frac{\partial X_{2}}{\partial X_{1}} $$
>
> 这里,如果某一层的梯度值很小,那么梯度会被逐层缩小,导致梯度消失。
>
> - **有快捷连接**时,假设我们在每一层之间都添加快捷连接,梯度的传播路径就多了一条直接路径:
>
> $$ \frac{\partial L}{\partial X_{1}}=\frac{\partial L}{\partial\left(X_{1}+F\left(X_{1}\right)\right)} \cdot\left(1+\frac{\partial F\left(X_{1}\right)}{\partial X_{1}}\right) $$
> $$\frac{\partial L}{\partial X_{1}}=\frac{\partial L}{\partial\left(X_{1}+F\left(X_{1}\right)\right)} \cdot\left(1+\frac{\partial F\left(X_{1}\right)}{\partial X_{1}}\right)$$
>
> 这样,即使 $` \frac{\partial F\left(X_{1}\right)}{\partial X_{1}} `$ 很小,梯度依然可以通过 111 这条路径直接传递到更前面的层。