Logistic回歸的成本函數如何區分
我正在 Coursera 上學習斯坦福機器學習課程。
我嘗試獲得成本函數的導數,但我得到了完全不同的東西。
導數是怎麼得到的?
中間步驟有哪些?
改編自課程中的筆記,除了Andrew Ng 的 Coursera 機器學習課程頁面中學生貢獻的筆記之外,我看不到這些筆記可用(包括此推導)。
接下來,上標 $ (i) $ 表示單個測量或訓練“示例”。
$ \small \frac{\partial J(\theta)}{\partial \theta_j} = \frac{\partial}{\partial \theta_j} ,\frac{-1}{m}\sum_{i=1}^m \left[ y^{(i)}\log\left(h_\theta \left(x^{(i)}\right)\right) + (1 -y^{(i)})\log\left(1-h_\theta \left(x^{(i)}\right)\right)\right] \[2ex]\small\underset{\text{linearity}}= ,\frac{-1}{m},\sum_{i=1}^m \left[ y^{(i)}\frac{\partial}{\partial \theta_j}\log\left(h_\theta \left(x^{(i)}\right)\right) + (1 -y^{(i)})\frac{\partial}{\partial \theta_j}\log\left(1-h_\theta \left(x^{(i)}\right)\right) \right] \[2ex]\Tiny\underset{\text{chain rule}}= ,\frac{-1}{m},\sum_{i=1}^m \left[ y^{(i)}\frac{\frac{\partial}{\partial \theta_j}h_\theta \left(x^{(i)}\right)}{h_\theta\left(x^{(i)}\right)} + (1 -y^{(i)})\frac{\frac{\partial}{\partial \theta_j}\left(1-h_\theta \left(x^{(i)}\right)\right)}{1-h_\theta\left(x^{(i)}\right)} \right] \[2ex]\small\underset{h_\theta(x)=\sigma\left(\theta^\top x\right)}=,\frac{-1}{m},\sum_{i=1}^m \left[ y^{(i)}\frac{\frac{\partial}{\partial \theta_j}\sigma\left(\theta^\top x^{(i)}\right)}{h_\theta\left(x^{(i)}\right)} + (1 -y^{(i)})\frac{\frac{\partial}{\partial \theta_j}\left(1-\sigma\left(\theta^\top x^{(i)}\right)\right)}{1-h_\theta\left(x^{(i)}\right)} \right] \[2ex]\Tiny\underset{\sigma'}=\frac{-1}{m},\sum_{i=1}^m \left[ y^{(i)}, \frac{\sigma\left(\theta^\top x^{(i)}\right)\left(1-\sigma\left(\theta^\top x^{(i)}\right)\right)\frac{\partial}{\partial \theta_j}\left(\theta^\top x^{(i)}\right)}{h_\theta\left(x^{(i)}\right)} - (1 -y^{(i)}),\frac{\sigma\left(\theta^\top x^{(i)}\right)\left(1-\sigma\left(\theta^\top x^{(i)}\right)\right)\frac{\partial}{\partial \theta_j}\left(\theta^\top x^{(i)}\right)}{1-h_\theta\left(x^{(i)}\right)} \right] \[2ex]\small\underset{\sigma\left(\theta^\top x\right)=h_\theta(x)}= ,\frac{-1}{m},\sum_{i=1}^m \left[ y^{(i)}\frac{h_\theta\left( x^{(i)}\right)\left(1-h_\theta\left( x^{(i)}\right)\right)\frac{\partial}{\partial \theta_j}\left(\theta^\top x^{(i)}\right)}{h_\theta\left(x^{(i)}\right)} - (1 -y^{(i)})\frac{h_\theta\left( x^{(i)}\right)\left(1-h_\theta\left(x^{(i)}\right)\right)\frac{\partial}{\partial \theta_j}\left( \theta^\top x^{(i)}\right)}{1-h_\theta\left(x^{(i)}\right)} \right] \[2ex]\small\underset{\frac{\partial}{\partial \theta_j}\left(\theta^\top x^{(i)}\right)=x_j^{(i)}}=,\frac{-1}{m},\sum_{i=1}^m \left[y^{(i)}\left(1-h_\theta\left(x^{(i)}\right)\right)x_j^{(i)}- \left(1-y^{i}\right),h_\theta\left(x^{(i)}\right)x_j^{(i)} \right] \[2ex]\small\underset{\text{distribute}}=,\frac{-1}{m},\sum_{i=1}^m \left[y^{i}-y^{i}h_\theta\left(x^{(i)}\right)- h_\theta\left(x^{(i)}\right)+y^{(i)}h_\theta\left(x^{(i)}\right) \right],x_j^{(i)} \[2ex]\small\underset{\text{cancel}}=,\frac{-1}{m},\sum_{i=1}^m \left[y^{(i)}-h_\theta\left(x^{(i)}\right)\right],x_j^{(i)} \[2ex]\small=\frac{1}{m}\sum_{i=1}^m\left[h_\theta\left(x^{(i)}\right)-y^{(i)}\right],x_j^{(i)} $
sigmoid 函數的導數是
$ \Tiny\begin{align}\frac{d}{dx}\sigma(x)&=\frac{d}{dx}\left(\frac{1}{1+e^{-x}}\right)\[2ex] &=\frac{-(1+e^{-x})'}{(1+e^{-x})^2}\[2ex] &=\frac{e^{-x}}{(1+e^{-x})^2}\[2ex] &=\left(\frac{1}{1+e^{-x}}\right)\left(\frac{e^{-x}}{1+e^{-x}}\right)\[2ex] &=\left(\frac{1}{1+e^{-x}}\right),\left(\frac{1+e^{-x}}{1+e^{-x}}-\frac{1}{1+e^{-x}}\right)\[2ex] &=\sigma(x),\left(\frac{1+e^{-x}}{1+e^{-x}}-\sigma(x)\right)\[2ex] &=\sigma(x),(1-\sigma(x)) \end{align} $