Back to : ml-study
Contents

Classification

  • Binary ํ•˜๊ฒŒ (๋˜๋Š” Discreteํ•˜๊ฒŒ) ๋ญ”๊ฐ€๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ํ˜•ํƒœ์˜ ๋ฌธ์ œ.
  • ex) ์ข…์–‘์˜ ์–‘์„ฑ/์Œ์„ฑ, ๋ฉ”์ผ์ด ์ŠคํŒธ์ด๋‹ค/์•„๋‹ˆ๋‹ค ๋“ฑ๋“ฑโ€ฆ
  • Idea : Linear Regression + Threshold. Linearํ•˜๊ฒŒ hypothesis๋ฅผ ์žก๊ณ , ์–ด๋–ค ๊ฐ’ (0.5) ์ด์ƒ์ด๋ฉด 1๋กœ ์˜ˆ์ธกํ•˜๋Š” ํ˜•ํƒœ.
  • ํ•œ๊ณ„์  : ์˜ˆ๋ฅผ ๋“ค์–ด, ์–‘์„ฑ ๋ฐ์ดํ„ฐ๊ฐ€ (3, 4, 5, 100) ์ด๊ณ  ์Œ์„ฑ ๋ฐ์ดํ„ฐ๊ฐ€ (1, 2, 2) ์ด๋ฉด? Linear hypothesis๊ฐ€ ๋ณ„๋กœ ์ ์ ˆํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค. 100์— ์˜ํ•ด threshold๊ฐ€ ์ง€๋‚˜์น˜๊ฒŒ ์˜ค๋ฅธ์ชฝ์œผ๋กœ ์ด๋™ํ•˜๊ฒŒ ๋˜๊ธฐ ๋•Œ๋ฌธ.
  • ๊ฐœ์„  : ์œ„ ๋ฌธ์ œ์ ์€ Linear ๋•Œ๋ฌธ์— ์ƒ๊ธฐ๋Š” ๋ฌธ์ œ๋‹ค. Linear ๋ณด๋‹ค ๋” ์ด๋Ÿฐ ํ˜•ํƒœ์— ์ ํ•ฉํ•˜๊ฒŒ ์ƒ๊ธด ํ•จ์ˆ˜๋ฅผ ์“ฐ๋ฉด ์–ด๋–จ๊นŒ? $h$ ํ•จ์ˆ˜์˜ ์ตœ์†Œ์™€ ์ตœ๋Œ€๋„ ๋ญ”๊ฐ€ 0๊ณผ 1๋กœ ๊ณ ์ •ํ•˜๊ณ  ์‹ถ๋‹ค. $h_\theta(x)$ ๊ฐ€ 1๋ณด๋‹ค ํฌ๊ฑฐ๋‚˜ 0๋ณด๋‹ค ์ž‘์€ ๊ฒƒ์€ ๋ญ”๊ฐ€ ๋ฐ”๋žŒ์งํ•˜์ง€ ์•Š์€ ์ƒํƒœ์ธ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค.
  • Logistic regression : ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ˜•ํƒœ์˜ sigmoid ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. \(h_\theta(x) = \frac{1}{1 + e^{-\theta^T x}}\)
  • Why? ๊ทธ๋ž˜ํ”„๊ฐ€ ๋งค์šฐ ์œ ์šฉํ•œ ์„ฑ์งˆ๋“ค์„ ๋ณด์ด๊ธฐ ๋•Œ๋ฌธ.
    picture 2
  • Interpretation : $h_\theta(x)$ = $y = 1$์ผ ํ™•๋ฅ  ์„ ๋Œ๋ ค์ค€๋‹ค๊ณ  ์ƒ๊ฐํ•˜์ž. \(h_\theta(x) = \mathsf{P}(y = 1 \ |\ x ; \theta)\)

Multiple Features

  • $\theta$ ์™€ $x$๋ฅผ ๋ฒกํ„ฐ๋กœ ์ƒ๊ฐํ•˜๋Š” ์•ž์„œ์˜ ๋ฐฉ๋ฒ•์„ ๊ทธ๋Œ€๋กœ ์ด์šฉํ•˜๋ฉด, Logistic regression๋„ ๋˜‘๊ฐ™์ด multiple feature์— ์ ์šฉ ๊ฐ€๋Šฅ.
  • ์ด๋•Œ๋Š”, $h_\theta(x) = 0.5$ ์ธ ๊ฒฝ๊ณ„๋ฉด์ด $\R^n$ ์ƒ์˜ ์ดˆํ‰๋ฉด์œผ๋กœ ์ œ๊ณตํ•˜๋Š” ํ˜•ํƒœ๊ฐ€ ๋œ๋‹ค.
  • ์ด๋ฅผ Decision boundary ๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค.
  • Logistic regression๋„ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ˜•ํƒœ๋กœ ์ผ๋ฐ˜ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค.
    • $h_\theta(x) = g(p(\theta, x))$, such that $g(z) = \frac{1}{1 + e^{-z}}$ ๋กœ ์“ธ ์ˆ˜ ์žˆ๊ณ ,
    • $p$์—๋Š” ๋‹ค์–‘ํ•œ ํ•จ์ˆ˜๋“ค์ด ๋“ค์–ด๊ฐˆ ์ˆ˜ ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, $p(\theta, x) = \theta_0 + \theta_1 x_1^2 + \theta_2 x_2^2$ ๊ฐ™์€ ๋‹คํ•ญ์‹โ€ฆ
    • ์ด๊ฒฝ์šฐ, Decision boundary๊ฐ€ ์›์ด๋‚˜ ํƒ€์›, ๋˜๋Š” ๋‹ค๋ฅธ ํ˜•ํƒœ๋กœ ๋‚˜ํƒ€๋‚˜๋Š” ๋ฌธ์ œ๋“ค๋„ ํ•ด๊ฒฐ ๊ฐ€๋Šฅํ•˜๋‹ค.

Logistic Regression

  • Cost function ๊ณผ ๊ทธ ํŽธ๋„ํ•จ์ˆ˜๋“ค์„ ์•ˆ๋‹ค๋ฉด, gradient descent๋ฅผ ์“ธ ์ˆ˜ ์žˆ๋‹ค. $h$๋Š” ์ด๋ฏธ ์ •ํ–ˆ์œผ๋ฏ€๋กœโ€ฆ
  • Linear regression์—์„œ์ฒ˜๋Ÿผ, $\frac{1}{2m}\sum_{i = 1}^{m} \ (h_\theta(x_i) - y_i)^2$ ๋ฅผ ์“ด๋‹ค๋ฉด, ์ด ํ•จ์ˆ˜๋Š” Convexํ•˜์ง€ ์•Š๋‹ค.
  • Convexํ•˜์ง€ ์•Š์œผ๋ฉด Gradient Descent์˜ ์ˆ˜๋ ด์„ฑ์ด ๋ณด์žฅ๋˜์ง€ ์•Š๋Š”๋‹ค!
  • ๊ฐ€๋Šฅํ•˜๋ฉด Convexํ•œ ํ•จ์ˆ˜๋ฅผ ์žก์•„์„œ ์จ์•ผ ํ•œ๋‹ค. ๋‹ค์Œ ํ•จ์ˆ˜๊ฐ€ ์ž˜ ์ž‘๋™ํ•จ์ด ์•Œ๋ ค์ ธ ์žˆ๋‹ค. \(Cost_\theta(x, y) = \begin{cases} -\log(h_\theta(x)) & \text{if } y = 1 \\ -\log(1 - h_\theta(x)) & \text{if } y = 0 \end{cases}\)
  • $y = 1, h_\theta(x) = 1$ ์ด๋ผ๋ฉด, cost๊ฐ€ 0์ด๋‹ค. ์ด๋Š” ์˜ฌ๋ฐ”๋ฅธ ์˜ˆ์ธก์—์„œ cost ํ•จ์ˆ˜๊ฐ€ 0์ด ๋œ๋‹ค๋Š” ๊ฒƒ์ด๋ฏ€๋กœ, desirableํ•˜๋‹ค.
  • $y = 1, h_\theta(x) \to 0$ ์ผ ๋•Œ, cost๊ฐ€ ๋ฌดํ•œ๋Œ€๋กœ ๋ฐœ์‚ฐํ•œ๋‹ค. ์ด๋Š”, 1์ด์–ด์•ผ ํ•  ๊ฐ’์„ 0์œผ๋กœ ์˜ˆ์ธกํ•˜๋ฉด ํฐ penalty term์„ ์ฃผ๊ฒ ๋‹ค๋Š” ์˜๋ฏธ๊ฐ€ ๋œ๋‹ค. ์ด๋Ÿฌํ•œ intuition์€ ์šฐ๋ฆฌ๊ฐ€ logistic regression์—์„œ ์›ํ•˜๋Š” ๋ฐ”์™€ ์ž˜ ๋งž์•„๋–จ์–ด์ง„๋‹ค. $y = 0$์—์„œ๋„ ์œ„ ๋‘ ๊ฐ€์ง€๊ฐ€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์„ฑ๋ฆฝํ•œ๋‹ค.
  • ์œ„ ์‹์€ ๊ฒฝ์šฐ๋กœ ๋‚˜๋ˆ„์–ด์ ธ ์žˆ์–ด์„œ ๋ณต์žกํ•˜๋‹ค (ํŠนํžˆ Grad-descent ์“ฐ๊ธฐ์—). ์ด๋ฅผ ์ž˜ ์ •๋ฆฌํ•ด์„œโ€ฆ \(Cost_\theta(x, y) = -y\log(h_\theta(x)) - (1-y)\log(1 - h_\theta(x))\)
  • ์ด์ œ, Gradient descent๋ฅผ ์“ธ ์ˆ˜ ์žˆ๋‹ค! $(x_i, y_i)$ ๊ฐ€ training set์ด๋ผ๊ณ  ํ•˜๋ฉด.. \(J(\theta) = -\frac{1}{m}\left(\sum_{i = 1}^{m} y_i\log(h_\theta(x_i)) + (1-y_i)\log(1 - h_\theta(x_i))\right)\) \(\pdv{}{x_j}J(\theta) = \sum_{i = 1}^{m} (h_\theta(x_i) - y_i) x_j\)
  • Linear regression ๋•Œ์˜ gradient descent์™€ ๋˜‘๊ฐ™์€ ํ˜•ํƒœ์˜ ํŽธ๋„ํ•จ์ˆ˜๋ฅผ ์–ป๋Š”๋‹ค.

Advanced Optimization Ideas

  • Optimization Algorithm์€ ๋‹ค์–‘ํ•˜๋‹ค. ๋” ๊ฐ•ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์ด ์žˆ๋‹ค.
    • Gradient Descent
    • Conjuagte Gradient
    • BFGS algorithm, L-BFGS algorithm
  • ์ฃผ๋กœ Gradient Descent๋ณด๋‹ค ๋น ๋ฅด๊ณ , $\alpha$๋ฅผ ์ง์ ‘ ๊ณ ๋ฅด์ง€ ์•Š์•„๋„ ๋˜๋Š” (Line Search) ๊ณ ๊ธ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค. ๋Œ€์ฒด๋กœ ํ›จ์”ฌ ๋ณต์žกํ•˜์ง€๋งŒ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค.

Multiclass Classification

  • 0/1์ด ์•„๋‹Œ, ์—ฌ๋Ÿฌ ๊ฐœ ์ค‘ ํ•˜๋‚˜๋ฅผ ๊ณ ๋ฅด๋Š” ํ˜•ํƒœ์˜ Classification
  • ex) Email Classification : Work / Friends / Family / Hobby๋ฅผ 0 / 1 / 2 / 3 ์œผ๋กœ.
  • One-vs-All : ๋ฌธ์ œ๋ฅผ one-vs-all ํ˜•ํƒœ์˜ binary classification์œผ๋กœ ๋ฐ”๊พธ์–ด, classifier $h_\theta$๋ฅผ ๊ฐ๊ฐ ๋งž์ถ˜๋‹ค.
  • ๊ฐ๊ฐ์˜ ํด๋ž˜์Šค์— ๋Œ€ํ•œ best $h$๋ฅผ ํ•™์Šตํ•œ ํ›„, ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋Š” ๋ชจ๋“  $h$๋“ค์„ ๋Œ๋ ค๋ณด๊ณ  ํ™•๋ฅ ์ด ๊ฐ€์žฅ ๋†’๊ฒŒ ๋‚˜์˜ค๋Š” ํด๋ž˜์Šค๋กœ ํŒ์ •ํ•œ๋‹ค.
  • ๊ฐ€์žฅ ์ž์—ฐ์Šค๋Ÿฌ์šด? ํ˜•ํƒœ์˜ extension.