Back to : ml-study
Contents

### Classification

• Binary ํ๊ฒ (๋๋ Discreteํ๊ฒ) ๋ญ๊ฐ๋ฅผ ๊ฒฐ์ ํ๋ ํํ์ ๋ฌธ์ .
• ex) ์ข์์ ์์ฑ/์์ฑ, ๋ฉ์ผ์ด ์คํธ์ด๋ค/์๋๋ค ๋ฑ๋ฑโฆ
• Idea : Linear Regression + Threshold. Linearํ๊ฒ hypothesis๋ฅผ ์ก๊ณ , ์ด๋ค ๊ฐ (0.5) ์ด์์ด๋ฉด 1๋ก ์์ธกํ๋ ํํ.
• ํ๊ณ์  : ์๋ฅผ ๋ค์ด, ์์ฑ ๋ฐ์ดํฐ๊ฐ (3, 4, 5, 100) ์ด๊ณ  ์์ฑ ๋ฐ์ดํฐ๊ฐ (1, 2, 2) ์ด๋ฉด? Linear hypothesis๊ฐ ๋ณ๋ก ์ ์ ํ์ง ์์ ๊ฒฝ์ฐ๊ฐ ๋ง๋ค. 100์ ์ํด threshold๊ฐ ์ง๋์น๊ฒ ์ค๋ฅธ์ชฝ์ผ๋ก ์ด๋ํ๊ฒ ๋๊ธฐ ๋๋ฌธ.
• ๊ฐ์  : ์ ๋ฌธ์ ์ ์ Linear ๋๋ฌธ์ ์๊ธฐ๋ ๋ฌธ์ ๋ค. Linear ๋ณด๋ค ๋ ์ด๋ฐ ํํ์ ์ ํฉํ๊ฒ ์๊ธด ํจ์๋ฅผ ์ฐ๋ฉด ์ด๋จ๊น? $h$ ํจ์์ ์ต์์ ์ต๋๋ ๋ญ๊ฐ 0๊ณผ 1๋ก ๊ณ ์ ํ๊ณ  ์ถ๋ค. $h_\theta(x)$ ๊ฐ 1๋ณด๋ค ํฌ๊ฑฐ๋ 0๋ณด๋ค ์์ ๊ฒ์ ๋ญ๊ฐ ๋ฐ๋์งํ์ง ์์ ์ํ์ธ๊ฒ์ผ๋ก ๋ณด์ธ๋ค.
• Logistic regression : ๋ค์๊ณผ ๊ฐ์ ํํ์ sigmoid ํจ์๋ฅผ ์ฌ์ฉํ๋ค. $$h_\theta(x) = \frac{1}{1 + e^{-\theta^T x}}$$
• Why? ๊ทธ๋ํ๊ฐ ๋งค์ฐ ์ ์ฉํ ์ฑ์ง๋ค์ ๋ณด์ด๊ธฐ ๋๋ฌธ.
• Interpretation : $h_\theta(x)$ = $y = 1$์ผ ํ๋ฅ  ์ ๋๋ ค์ค๋ค๊ณ  ์๊ฐํ์. $$h_\theta(x) = \mathsf{P}(y = 1 \ |\ x ; \theta)$$

### Multiple Features

• $\theta$ ์ $x$๋ฅผ ๋ฒกํฐ๋ก ์๊ฐํ๋ ์์์ ๋ฐฉ๋ฒ์ ๊ทธ๋๋ก ์ด์ฉํ๋ฉด, Logistic regression๋ ๋๊ฐ์ด multiple feature์ ์ ์ฉ ๊ฐ๋ฅ.
• ์ด๋๋, $h_\theta(x) = 0.5$ ์ธ ๊ฒฝ๊ณ๋ฉด์ด $\R^n$ ์์ ์ดํ๋ฉด์ผ๋ก ์ ๊ณตํ๋ ํํ๊ฐ ๋๋ค.
• ์ด๋ฅผ Decision boundary ๋ผ๊ณ  ๋ถ๋ฅธ๋ค.
• Logistic regression๋ ๋ค์๊ณผ ๊ฐ์ ํํ๋ก ์ผ๋ฐํํ  ์ ์๋ค.
• $h_\theta(x) = g(p(\theta, x))$, such that $g(z) = \frac{1}{1 + e^{-z}}$ ๋ก ์ธ ์ ์๊ณ ,
• $p$์๋ ๋ค์ํ ํจ์๋ค์ด ๋ค์ด๊ฐ ์ ์๋ค. ์๋ฅผ ๋ค์ด, $p(\theta, x) = \theta_0 + \theta_1 x_1^2 + \theta_2 x_2^2$ ๊ฐ์ ๋คํญ์โฆ
• ์ด๊ฒฝ์ฐ, Decision boundary๊ฐ ์์ด๋ ํ์, ๋๋ ๋ค๋ฅธ ํํ๋ก ๋ํ๋๋ ๋ฌธ์ ๋ค๋ ํด๊ฒฐ ๊ฐ๋ฅํ๋ค.

### Logistic Regression

• Cost function ๊ณผ ๊ทธ ํธ๋ํจ์๋ค์ ์๋ค๋ฉด, gradient descent๋ฅผ ์ธ ์ ์๋ค. $h$๋ ์ด๋ฏธ ์ ํ์ผ๋ฏ๋กโฆ
• Linear regression์์์ฒ๋ผ, $\frac{1}{2m}\sum_{i = 1}^{m} \ (h_\theta(x_i) - y_i)^2$ ๋ฅผ ์ด๋ค๋ฉด, ์ด ํจ์๋ Convexํ์ง ์๋ค.
• Convexํ์ง ์์ผ๋ฉด Gradient Descent์ ์๋ ด์ฑ์ด ๋ณด์ฅ๋์ง ์๋๋ค!
• ๊ฐ๋ฅํ๋ฉด Convexํ ํจ์๋ฅผ ์ก์์ ์จ์ผ ํ๋ค. ๋ค์ ํจ์๊ฐ ์ ์๋ํจ์ด ์๋ ค์ ธ ์๋ค. $$Cost_\theta(x, y) = \begin{cases} -\log(h_\theta(x)) & \text{if } y = 1 \\ -\log(1 - h_\theta(x)) & \text{if } y = 0 \end{cases}$$
• $y = 1, h_\theta(x) = 1$ ์ด๋ผ๋ฉด, cost๊ฐ 0์ด๋ค. ์ด๋ ์ฌ๋ฐ๋ฅธ ์์ธก์์ cost ํจ์๊ฐ 0์ด ๋๋ค๋ ๊ฒ์ด๋ฏ๋ก, desirableํ๋ค.
• $y = 1, h_\theta(x) \to 0$ ์ผ ๋, cost๊ฐ ๋ฌดํ๋๋ก ๋ฐ์ฐํ๋ค. ์ด๋, 1์ด์ด์ผ ํ  ๊ฐ์ 0์ผ๋ก ์์ธกํ๋ฉด ํฐ penalty term์ ์ฃผ๊ฒ ๋ค๋ ์๋ฏธ๊ฐ ๋๋ค. ์ด๋ฌํ intuition์ ์ฐ๋ฆฌ๊ฐ logistic regression์์ ์ํ๋ ๋ฐ์ ์ ๋ง์๋จ์ด์ง๋ค. $y = 0$์์๋ ์ ๋ ๊ฐ์ง๊ฐ ๋ง์ฐฌ๊ฐ์ง๋ก ์ฑ๋ฆฝํ๋ค.
• ์ ์์ ๊ฒฝ์ฐ๋ก ๋๋์ด์ ธ ์์ด์ ๋ณต์กํ๋ค (ํนํ Grad-descent ์ฐ๊ธฐ์). ์ด๋ฅผ ์ ์ ๋ฆฌํด์โฆ $$Cost_\theta(x, y) = -y\log(h_\theta(x)) - (1-y)\log(1 - h_\theta(x))$$
• ์ด์ , Gradient descent๋ฅผ ์ธ ์ ์๋ค! $(x_i, y_i)$ ๊ฐ training set์ด๋ผ๊ณ  ํ๋ฉด.. $$J(\theta) = -\frac{1}{m}\left(\sum_{i = 1}^{m} y_i\log(h_\theta(x_i)) + (1-y_i)\log(1 - h_\theta(x_i))\right)$$ $$\pdv{}{x_j}J(\theta) = \sum_{i = 1}^{m} (h_\theta(x_i) - y_i) x_j$$
• Linear regression ๋์ gradient descent์ ๋๊ฐ์ ํํ์ ํธ๋ํจ์๋ฅผ ์ป๋๋ค.

• Optimization Algorithm์ ๋ค์ํ๋ค. ๋ ๊ฐํ ์๊ณ ๋ฆฌ์ฆ๋ค์ด ์๋ค.
• ์ฃผ๋ก Gradient Descent๋ณด๋ค ๋น ๋ฅด๊ณ , $\alpha$๋ฅผ ์ง์  ๊ณ ๋ฅด์ง ์์๋ ๋๋ (Line Search) ๊ณ ๊ธ ์๊ณ ๋ฆฌ์ฆ๋ค. ๋์ฒด๋ก ํจ์ฌ ๋ณต์กํ์ง๋ง ๋ ์ข์ ์ฑ๋ฅ์ ๋ณด์ธ๋ค.
• One-vs-All : ๋ฌธ์ ๋ฅผ one-vs-all ํํ์ binary classification์ผ๋ก ๋ฐ๊พธ์ด, classifier $h_\theta$๋ฅผ ๊ฐ๊ฐ ๋ง์ถ๋ค.
• ๊ฐ๊ฐ์ ํด๋์ค์ ๋ํ best $h$๋ฅผ ํ์ตํ ํ, ์๋ก์ด ๋ฐ์ดํฐ์ ๋ํด์๋ ๋ชจ๋  $h$๋ค์ ๋๋ ค๋ณด๊ณ  ํ๋ฅ ์ด ๊ฐ์ฅ ๋๊ฒ ๋์ค๋ ํด๋์ค๋ก ํ์ ํ๋ค.