$$
\newcommand{\floor}[1]{\left\lfloor #1 \right\rfloor}
\newcommand{\ceil}[1]{\left\lceil #1 \right\rceil}
\newcommand{\N}{\mathbb{N}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\Z}{\mathbb{Z}}
\newcommand{\Q}{\mathbb{Q}}
\newcommand{\C}{\mathbb{C}}
\renewcommand{\L}{\mathcal{L}}
\newcommand{\x}{\times}
\newcommand{\contra}{\scalebox{1.5}{$\lightning$}}
\newcommand{\inner}[2]{\left\langle #1 , #2 \right\rangle}
\newcommand{\st}{\text{ such that }}
\newcommand{\for}{\text{ for }}
\newcommand{\Setcond}[2]{ \left\{\, #1 \mid #2 \, \right\}}
\newcommand{\setcond}[2]{\Setcond{#1}{#2}}
\newcommand{\seq}[1]{ \left\langle #1 \right\rangle}
\newcommand{\Set}[1]{ \left\{ #1 \right\}}
\newcommand{\set}[1]{ \set{#1} }
\newcommand{\sgn}{\text{sign}}
\newcommand{\halfline}{\vspace{0.5em}}
\newcommand{\diag}{\text{diag}}
\newcommand{\legn}[2]{\left(\frac{#1}{#2}\right)}
\newcommand{\ord}{\text{ord}}
\newcommand{\di}{\mathrel{|}}
\newcommand{\gen}[1]
\newcommand{\irr}{\mathrm{irr }}
\renewcommand{\deg}{\mathrm{deg }}
\newcommand{\nsgeq}{\trianglelefteq}
\newcommand{\nsg}{\triangleleft}
\newcommand{\argmin}{\mathrm{argmin}}
\newcommand{\argmax}{\mathrm{argmax}}
\newcommand{\minimize}{\mathrm{minimize}}
\newcommand{\maximize}{\mathrm{maximize}}
\newcommand{\subto}{\mathrm{subject\ to}}
\newcommand{\DKL}[2]{D_{\mathrm{KL}}\left(#1 \di\di #2\right)}
\newcommand{\ReLU}{\mathrm{ReLU}}
\newcommand{\E}{\mathsf{E}}
\newcommand{\V}{\mathsf{Var}}
\newcommand{\Corr}{\mathsf{Corr}}
\newcommand{\Cov}{\mathsf{Cov}}
\newcommand{\covariance}[1]{\Cov\left(#1\right)}
\newcommand{\variance}[1]{\V\left[#1\right]}
\newcommand{\variancewith}[1]{\V\left[#1\right]}
\newcommand{\expect}[1]{\E\left[#1\right]}
\newcommand{\expectwith}[2]{\E_{#1}\left[#2\right]}
\renewcommand{\P}{\mathsf{P}}
\newcommand{\uniform}[2]{\mathrm{Uniform}\left(#1 \dots #2\right)}
\newcommand{\gdist}[2]{\mathcal{N}\left(#1, #2\right)}
\DeclarePairedDelimiter{\norm}{\lVert}{\rVert}
$$
\everymath{\displaystyle}
Back to : ml-study
Contents
Normal Equation
- Iteration์ ํตํด ๊ทน์์ ์ ์๋ ดํ๋ ๊ฒ์ด ์๋๋ผ, Analytically ์ต์ ํด $\theta$๋ฅผ ๊ตฌํ๋ ๋ฐฉ๋ฒ.
- ex) $J(\theta) = a\theta^2 + b\theta + c$ ($a > 0$) ๋ฅผ ์ต์ํํ๋ $\theta$ ๋ $-\frac{b}{2a}$ ์์ ์ฝ๊ฒ ์ ์ ์๋ค.
- How to do for vector parameter $J$?
- => Vector Calculus. $\pdv{}{\theta_i} J(\theta)$ ๊ฐ ๋ชจ๋ 0์ด ๋๋ $\theta$ ๋ฅผ ์ฐพ์ผ๋ฉด ๋๋ค.
- Parameter๋ค์ ํ๋ ฌ $X$๋ก ๋ง๋ค๊ณ , ์ด์ ๋์ํ๋ ๊ฐ๋ค์ $y$๋ก ๋ง๋ค์.
- $\theta = (X^T X)^{-1} X^T y$ ๊ฐ ์ฐ๋ฆฌ์ Linear Regression์ ๋์ํจ์ด ์๋ ค์ ธ ์๋ค.
- Feature scaling ๊ฐ์ ํ
ํฌ๋ ๋ถํ์.
- Gradient Descent์ ๋๋นํ์ฌ..
-
์ฅ์ : $\alpha$๋ฅผ ์๊ฐํ์ง ์์๋ ๋๊ณ , ๋ฐ๋ณต์ ์ผ๋ก ์ ์ ํ $\alpha$๋ฅผ ์ฐพ์ ํ์๊ฐ ์๋ค.
-
๋จ์ : ํ๋ ฌ๊ณฑ์
๋ฐ inverse๋ ๊ต์ฅํ ๋๋ฆผ. ํนํ $n$์ด ํฌ๋ฉด ํ๋ ฌ๊ณฑ์
์ ์ฐ๊ธฐ ์ด๋ ต๋ค.
Noninvertible Case
- $(X^T X)$๊ฐ invertibleํ์ง ์์ผ๋ฉด??
- Pseudoinverse (octave pinv ํจ์)
- ํฌ๊ฒ ๋ ๊ฐ์ง ๊ฒฝ์ฐ
- ๋ feature๊ฐ ์ฌ์ค linear ๊ด๊ณ์ ์๋ ๊ฒฝ์ฐ.
- ex) size in feet^2 ์ size in m^2
- Design matrix $X$๊ฐ dependent column ๊ฐ์ง๋ค.
- Redundant features -> Throw away.
- Too many features.
- Data๋ ์ ์๋ฐ feature๋ ๋ง์ ๊ฒฝ์ฐ.
- Feature ๋ช๊ฐ ๋ฒ๋ฆฌ๊ธฐ / ๋๋ Regularization.
Back to : ml-study