Back to : ml-study
Contents

Overfitting Issue

  • Underfitting : ๋ฐ์ดํ„ฐ๊ฐ€ Linearํ•˜์ง€ ์•Š์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , Linear function fitting์„ ํ•˜๋Š” ๋“ฑ์˜ ์ด์œ ๋กœ fitting๋˜์ง€ ์•Š๋Š” ํ˜„์ƒ
  • Overfitting : 5๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ 4์ฐจํ•จ์ˆ˜๋กœ fittingํ•œ๋‹ค๋ฉด? ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋Š” 100%์˜ ์ •ํ™•๋„๋ฅผ ๊ฐ–์ง€๋งŒ, ์‹ค์ œ๋กœ ์ข‹์€ ๋ชจ๋ธ๋ง์€ ์•„๋‹˜.
  • ์ด๋ฅผ High-variance๋ผ๊ณ  ํ•œ๋‹ค. High-order ๋‹คํ•ญ์‹์„ ์“ธ ๋•Œ์˜ ๋ฌธ์ œ์ . ์ง€๋‚˜์น˜๊ฒŒ ๋งŽ์€ ์ž์œ ๋„์˜ ๊ฐ€์„ค์„ ํ—ˆ์šฉํ•˜์—ฌ, ๋ณ„๋กœ ์ข‹์€ ๊ฒฐ๊ณผ๊ฐ€ ์•„๋‹ˆ๊ฒŒ ๋จ.
  • Too many features -> Cost function์ด ๋งค์šฐ ์ž‘์ง€๋งŒ ์‹ค์šฉ์ ์œผ๋กœ ๋„์›€์ด ๋˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ ์žˆ์Œ.
  • ์ง€๋‚˜์น˜๊ฒŒ ์ •ํ™•ํ•œ Fitting ๊ณผ์ • ๋•Œ๋ฌธ์—, ํŒŒ์•…ํ•ด์•ผ ํ•  ๊ฒฝํ–ฅ์„ฑ์„ ๋†“์น˜๋Š” ํ˜„์ƒ!!

How to deal with?

  • Feature๊ฐœ์ˆ˜ ์ค„์ด๊ธฐ. ์ด๋ถ€๋ถ„์€ Manualํ•˜๊ฒŒ ํ•  ์ˆ˜๋„ ์žˆ๊ณ , Model selection algorithm์„ ์“ธ ์ˆ˜๋„ ์žˆ์Œ.
    • ์ด ๊ณผ์ •์—์„œ ์ง„์งœ ํ•„์š”ํ•œ ์ •๋ณด๋ฅผ ๋†“์น  ์ˆ˜๋„ ์žˆ์Œ. ์‹ค์ œ Feature๊ฐ€ ์ •๋ง ๋ถˆํ•„์š”ํ•œ์ง€ ํŒ์ •ํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต๋‹ค.
  • Regularization. Feature๋Š” ๊ทธ๋Œ€๋กœ ๋“ค๊ณ  ๊ฐ€๋˜, magnitude / value of parameter๋ฅผ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•.

Regularization

  • ex) ํŽ˜๋„ํ‹ฐ๋ฅผ ํ†ตํ•ด $\theta_3, \theta_4$ ๋ฅผ ์ž‘์€ ๊ฐ’์œผ๋กœ ์œ ์ง€ํ•˜๋„๋ก ๊ฐ•์ œํ•˜๊ธฐ. \(J_{\text{new}}(\theta) = J(\theta) + 1000\theta_3^2 + 1000\theta_4^2\)
  • ๊ฒฐ๊ตญ์€ Hypothesis๋ฅผ ๋” ๊ฐ„๋‹จํ•˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ. Overfitting ๋ฌธ์ œ๊ฐ€ ์ค„์–ด๋“ ๋‹ค.
  • ex) Regularization parameter๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, tradeoff๋ฅผ ๊ฐ•์ œํ•˜๊ธฐ. \(J(\theta) = \frac{1}{2m}\left(\sum_{i = 1}^{m} (h_{\theta}(x_i) - y_i)^2 + \lambda \sum_{i = 1}^{n} \theta_j^2\right)\)
  • $\lambda$๊ฐ€ ๋„ˆ๋ฌด ํฌ๋ฉด -> ์ง€๋‚˜์น˜๊ฒŒ ํฐ Penalty term ๋•Œ๋ฌธ์— Underfitting ๋ฐœ์ƒ.

Regularized Linear Regression

\(\begin{aligned} \pdv{}{\theta_j}J(\theta) = \frac{1}{m} \sum_{i = 1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x^{(i)}_j + \frac{\lambda}{m}\theta_j \end{aligned}\)

  • ํŽธ๋ฏธ๋ถ„์‹์„ ์ž˜ ๋ณด๋ฉด, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์—…๋ฐ์ดํŠธ๊ฐ€ ์ด๋ฃจ์–ด์งˆ ๊ฒƒ์ž„์„ ์•ˆ๋‹ค. \(\theta_j := \theta_j \left( 1- \alpha \frac{\lambda}{m}\right) - \alpha \frac{1}{m} \sum_{i = 1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x^{(i)}_j\)
  • $\left( 1- \alpha \frac{\lambda}{m}\right)$ ์„ ๋งค๋ฒˆ ๊ณฑํ•˜๋Š” ๋Š๋‚Œ์˜ Gradient Descent.

  • Normal equation์„ ์ด์šฉํ•ด์„œ๋„ ๋น„์Šทํ•˜๊ฒŒ ํ•  ์ˆ˜ ์žˆ๋‹ค. \(\theta = \left(X^T X + \lambda L\right)^{-1} X^T y\) ์ด๋•Œ $L$ ์€, Identity์—์„œ ๋งจ ์™ผ์ชฝ ์œ„ ํ•ญ์ด 0์ธ matrix์ด๋‹ค. [[0, 0, 0], [0, 1, 0], [0, 0, 1]] ์ •๋„ ๋Š๋‚Œ.
  • ์›๋ž˜์˜ Linear regression์€ Example๋ณด๋‹ค Feature๊ฐ€ ๋งŽ์œผ๋ฉด Non-invertibleํ•˜๋‹ค. ์ด๋•Œ, Regularization์„ ์“ฐ๋ฉด, $\lambda > 0$์ผ ๋•Œ, $X^T X + \lambda L$๊ฐ€ ๋ฐ˜๋“œ์‹œ invertibleํ•จ์„ ๋ณด์ผ ์ˆ˜ ์žˆ๋‹ค.

Regularized Logistic Regression

  • ๋‹ค์Œ๊ณผ ๊ฐ™์€ update๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. \(\theta_0 := \theta_0 - \alpha \frac{1}{m} \sum_{i = 1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x^{(i)}_0\) \(\theta_j := \theta_j \left( 1- \alpha \frac{\lambda}{m}\right) - \alpha \frac{1}{m} \sum_{i = 1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x^{(i)}_j\)
  • ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, ์‹์€ Linear ๋ฒ„์ „๊ณผ ๋˜‘๊ฐ™์ด ์ƒ๊ฒผ๋‹ค. ์ฐจ์ด๋Š” $h_\theta$๋ฟ.