์˜ค๋Š˜ ํฌ์ŠคํŒ…์—์„œ๋Š” randomized algorithm ์„ ๋น„๋กฏํ•˜์—ฌ, estimation ์—์„œ ๋งค์šฐ ํ”ํžˆ ํ™œ์šฉ๋˜๋Š” ์ค‘๊ฐ„๊ฐ’์„ ์ด์šฉํ•œ ๊ธฐ๋ฒ•์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.

์ดํ•˜, ํ™•๋ฅ ๋ณ€์ˆ˜ $X$์˜ ๊ธฐ๋Œ“๊ฐ’, ๋ถ„์‚ฐ, ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ๊ธฐ๋Œ“๊ฐ’ $\E[X]$, ๋ถ„์‚ฐ $\V[X]$, ํ‘œ์ค€ํŽธ์ฐจ $\sigma[X] = \sqrt{\V[X]}$ ๋กœ ์“ฐ๊ฒ ์Šต๋‹ˆ๋‹ค.

Problem

์–ด๋–ค ๋ฏธ์ง€์˜ ๊ฐ’์— ๋Œ€ํ•œ unbiased estimator random variable $X$๊ฐ€ ์ฃผ์–ด์ง„๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค. (์—ฌ๊ธฐ์„œ unbiased๋ผ ํ•จ์€, $\E[X] = \mu$ ๊ฐ€ ์šฐ๋ฆฌ๊ฐ€ ์ถ”์ •ํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฏธ์ง€์˜ ๊ฐ’๊ณผ ๊ฐ™์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.) Randomized algorithm์˜ ๋งฅ๋ฝ์—์„œ๋Š”, ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ถœ๋ ฅ์ด ํ™•๋ฅ ๋ณ€์ˆ˜๋กœ ํ‘œํ˜„๋˜๋ฏ€๋กœ ๊ทธ ๊ธฐ๋Œ“๊ฐ’์ด ์ฐธ๊ฐ’๊ณผ ๊ฐ™์€ ์ƒํ™ฉ์ด๋ผ๊ณ  ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์„œ ์ถ”๊ฐ€๋กœ, $\V[X]$ ๊ฐ€ ์ตœ๋Œ€ $\E[X]^2$ ์— ๋น„๋ก€ํ•˜๋Š” ์ƒํ™ฉ์„ ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค. (์ฆ‰, $\sigma[X] = c\E[X]$. ์‚ฌ์‹ค ์ดํ•˜์˜ ๋…ผ์˜๋Š” ๋ถ€๋“ฑํ˜ธ $\leq$ ๋กœ๋„ ์ž˜ ์„ฑ๋ฆฝํ•ฉ๋‹ˆ๋‹ค) ์–ผํ• ๋ณด๋ฉด ์•ฝ๊ฐ„ ํŠน์ˆ˜ํ•ด ๋ณด์ด์ง€๋งŒ, ์‚ฌ์‹ค ์ด ์ƒํ™ฉ์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ถ„์„์„ ๋น„๋กฏํ•ด์„œ ๋งค์šฐ ํ”ํžˆ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

์šฐ๋ฆฌ์˜ ๋ชฉํ‘œ๋Š” ์–ด๋–ค ์ƒ์ˆ˜ $\delta$์™€ $\epsilon$์— ๋Œ€ํ•ด, $1 - \delta$ ํ™•๋ฅ ๋กœ $\epsilon$ ๋งŒํผ์˜ ์ƒ๋Œ€ ์˜ค์ฐจ๋ฅผ ๋ณด์žฅํ•˜๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค. ์ฆ‰, ํ™•๋ฅ ๋ณ€์ˆ˜ $Y$๊ฐ€ ๋‹ค์Œ์„ ๋งŒ์กฑํ•˜๋„๋ก ํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. ํ™•๋ฅ ๋ณ€์ˆ˜์— ์˜ํ•œ $(\epsilon, \delta)$-๊ทผ์‚ฌ
Unbiased estimator $Y$์™€ ์ฐธ๊ฐ’ $\mu = \E[Y]$์— ๋Œ€ํ•ด, ๋‹ค์Œ์ด ์„ฑ๋ฆฝํ•  ๋•Œ $Y$๋ฅผ $\mu$์— ๋Œ€ํ•œ $(\epsilon, \delta)$-๊ทผ์‚ฌ๋ผ๊ณ  ์ •์˜ํ•œ๋‹ค. \(\P[\abs{Y - \mu} \geq \epsilon\mu] \leq \delta\)
(์ด ์šฉ์–ด๋Š” ํ‘œ์ค€์œผ๋กœ ์“ฐ๋Š” ์šฉ์–ด๋Š” ์•„๋‹ˆ๊ณ , ํŽธ์˜์ƒ ๋„์ž…ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค)


Naive Approach

์–ด๋–ค ํ™•๋ฅ ๋ณ€์ˆ˜ $X$์— ๋Œ€ํ•ด, ๋‹ค์Œ์ด ์ž˜ ์•Œ๋ ค์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.

Chebyshevโ€™s Inequality
ํ™•๋ฅ ๋ณ€์ˆ˜ $X$์˜ ๊ธฐ๋Œ“๊ฐ’ $\mu = \E[X]$, ๋ถ„์‚ฐ $\V[X]$, ํ‘œ์ค€ํŽธ์ฐจ $\sigma[X] = \sqrt{\V[X]}$์— ๋Œ€ํ•˜์—ฌ, ๋‹ค์Œ์ด ์„ฑ๋ฆฝํ•œ๋‹ค. \(\P[\abs{X - \mu} \geq k\sigma[X]] \leq \frac{1}{k^2}\)
์šฐ๋ฆฌ๋Š” $\sigma[X] = c\mu$๋ฅผ ๊ฐ€์ •ํ•˜์˜€์œผ๋ฏ€๋กœ, ์ด๋Š” ๋‹ค์‹œ ๋งํ•ด $X$์— ๋Œ€ํ•ด ๋‹ค์Œ์ด ์„ฑ๋ฆฝํ•œ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค. \(\P[\abs{X - \mu} \geq kc\mu] \leq \frac{1}{k^2}\)

๋ฌด์–ธ๊ฐ€๋ฅผ ๋” ์ž˜ ์ถ”์ •ํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€์žฅ ๋‚˜์ด๋ธŒํ•œ ๋ฐฉ๋ฒ•์€, ์—ฌ๋Ÿฌ ๊ฐœ์˜ estimator๋ฅผ independentํ•˜๊ฒŒ ์‹คํ–‰ํ•œ ๋‹ค์Œ ๊ทธ๋“ค์˜ ํ‰๊ท ์„ ์ทจํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. $T$๊ฐœ์˜ ํ‰๊ท ์„ ์ทจํ•˜๋ฉด, ๊ธฐ๋Œ“๊ฐ’๊ณผ ๋ถ„์‚ฐ์—๋Š” ์•„๋ž˜ ๊ด€๊ณ„๊ฐ€ ์„ฑ๋ฆฝํ•ฉ๋‹ˆ๋‹ค. \(Y = \frac{1}{T}\sum_{i = 1}^{T} X_i \qquad \Rightarrow \qquad \E[Y] = \E[X],\quad \V[Y] = \V[X] / T\)

๋”ฐ๋ผ์„œ, $k = 1 / \sqrt{\delta}$๋กœ ๋‘๊ณ , $\sigma[Y] = \epsilon\sqrt{\delta}\mu$ ๊ฐ€ ๋˜๋„๋ก $T = \frac{c}{\epsilon^2 \delta}$ ๊ฐœ๋ฅผ ์žก์œผ๋ฉด ๋ชฉํ‘œํ•˜๋Š” $(\epsilon, \delta)$ ๊ทผ์‚ฌ๋ฅผ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Theorem.
Unbiased estimator $X$๊ฐ€ $\V[X] = O(\E[X]^2)$๋ฅผ ๋งŒ์กฑํ•˜๋ฉด, $O(1 / (\epsilon^2 \delta))$ ๊ฐœ๋กœ $(\epsilon, \delta)$ ๊ทผ์‚ฌ๋ฅผ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.


Median-of-Mean Trick

์˜ค๋Š˜ ํฌ์ŠคํŒ…์˜ ๋ฉ”์ธ ์ฃผ์ œ์ธ, ์ด ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋” ์ ์€ ๊ฐœ์ˆ˜๋กœ ๋ชฉํ‘œ๋ฅผ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

  • Estimator $X$๋ฅผ $4c / \epsilon^2$ ๋ฒˆ๋งŒํผ ์‹คํ–‰ํ•˜์—ฌ ํ‰๊ท ์„ ์ทจํ•œ๋‹ค. ์ด๋ฅผ Estimator $Z$ ๋ผ ํ•œ๋‹ค.
  • Estimator $Z$๋ฅผ $12 \log (1 / \delta)$ ๋ฒˆ๋งŒํผ ์‹คํ–‰ํ•˜์—ฌ ์ค‘๊ฐ„๊ฐ’ ์„ ์ทจํ•œ๋‹ค. ์ด๋ฅผ Estimator $Y$ ๋ผ ํ•œ๋‹ค.

์ด๋ ‡๊ฒŒ ์–ป์–ด์ง„ $Y$๊ฐ€ $\mu$์˜ $(\epsilon, \delta)$ ๊ทผ์‚ฌ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค!

์ˆ˜์‹์„ ์ •ํ™•ํžˆ ์ „๊ฐœํ•˜๊ธฐ ์œ„ํ•ด ์ƒ์ˆ˜๋ฅผ ์ž˜ ๋งž์ถฐ๋†“์•˜์ง€๋งŒ, ์‹ค์ œ๋กœ๋Š” $O(1 / \epsilon^2)$, $O(\log (1 / \delta))$ ๋งŒ ๊ธฐ์–ตํ•˜๋ฉด ์ถฉ๋ถ„ํ•ฉ๋‹ˆ๋‹ค.

Intuitive Explanation

์ค‘๊ฐ„๊ฐ’์€ ํ‰๊ท ์— ๋น„ํ•ด ์–‘์ชฝ ๋์˜ outlier์— ๋œ ๋ฏผ๊ฐํ•œ ๊ฐ’์ž…๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ ๊ฐœ์˜ estimator๋“ค์ด ๋Œ€์ฒด๋กœ ์ข‹์€ ๊ฐ’์„ ์ œ๊ณตํ•œ๋‹ค๋ฉด, ๋‚ฎ์€ ํ™•๋ฅ ๋กœ ๊ฐ’์ด ํฌ๊ฒŒ ํŠ€๋”๋ผ๋„ ์ค‘๊ฐ„๊ฐ’์€ ๊ทธ ์˜ํ–ฅ์„ ๋งŽ์ด ๋ฐ›์ง€ ์•Š๊ณ  robustํ•˜๊ฒŒ ์ข‹์€ ๊ฐ’์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์•„๋ž˜์˜ ์ฆ๋ช…์€ ์‚ฌ์‹ค ์ด ์•„์ด๋””์–ด๋ฅผ formalํ•˜๊ฒŒ ์ž‘์„ฑํ•œ ๊ฒƒ์— ์ง€๋‚˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.


Proof of $(\epsilon, \delta)$ Approximation

์œ„ ์„ค๋ช…์—์„œ, ๋Œ€์ฒด๋กœ ์ข‹์€ ๊ฐ’์„ ์ œ๊ณตํ•œ๋‹ค ๋ฅผ ๋จผ์ € ์ƒ๊ฐํ•˜๊ณ  ๋“ค์–ด๊ฐ‘๋‹ˆ๋‹ค. Estimator $Z$์— ๋Œ€ํ•ด, ๋‹ค์Œ์ด Chebyshev ๋ถ€๋“ฑ์‹์— ์˜ํ•ด ์„ฑ๋ฆฝํ•ฉ๋‹ˆ๋‹ค. \(\P[\abs{Z - \mu} \geq \epsilon\mu] \leq \frac{\V[Z]}{\epsilon^2\mu^2} = \frac{1}{4}\) ์ฆ‰, $Z$ ๋Š” ์ƒ์ˆ˜ (3/4) ํ™•๋ฅ ๋กœ ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ์ข‹์€ estimation์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋ชฉํ‘œ๋Š” ์ด ํ™•๋ฅ ์„ $1 - \delta$ ๊นŒ์ง€ ๋†’์ด๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

$r$๊ฐœ์˜ $Z$ estimator๋“ค์˜ ์ค‘๊ฐ„๊ฐ’์ด ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚˜๊ธฐ ์œ„ํ•ด์„œ๋Š”, ์ ์–ด๋„ $r / 2$ ๊ฐœ๋งŒํผ์ด ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ๋ฒ”์œ„ ๋ฐ–์— ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ($r / 2$๊ฐœ์˜ ๊ฐ’์ด ๊ตฌ๊ฐ„ $[a, b]$ ์‚ฌ์ด์— ์žˆ๋‹ค๋ฉด, ์ค‘๊ฐ„๊ฐ’์ด ๊ทธ ๋ฒ”์œ„์— ๋“ค์–ด๊ฐ„๋‹ค๋Š” ๊ฒƒ์€ ์‰ฝ๊ฒŒ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค) ์ด๋ฅผ ๋‚˜์œ estimator๋ผ ํ•˜๋ฉด, $r / 2$๊ฐœ์˜ ๊ฐ’์ด ๋‚˜์  ํ™•๋ฅ ์€, ์„ฑ๊ณต ํ™•๋ฅ ์ด $1/4$ ์ดํ•˜์ธ ๋ฒ ๋ฅด๋ˆ„์ด ์‹œํ–‰์„ $r$๋ฒˆ ํ•ด์„œ ์„ฑ๊ณต ํšŸ์ˆ˜๊ฐ€ $r/2$๋ฒˆ ์ด์ƒ์ด๊ธฐ๋ฅผ ๊ธฐ๋Œ€ํ•œ๋‹ค๋Š” ์˜๋ฏธ์ด๋ฏ€๋กœ \(\P[\text{at least } r / 2 \text{ estimators are bad}] = \P[B(r, 1/4) \geq r/2]\) ๋กœ ์“ธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด, Chernoff Bound๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.

Chernoff Bound (for sum of indep. random variables).
ํ™•๋ฅ ๋ณ€์ˆ˜ $X_1, X_2, \dots X_n$์ด ๊ฐ๊ฐ $\set{0, 1}$ ์—์„œ ๊ฐ’์„ ๊ฐ–๋Š” independent random variable ์ด๊ณ , $X = \sum X_i, \mu = \E[X]$ ์ผ ๋•Œ, $t \in (0, 1)$์— ๋Œ€ํ•ด ๋‹ค์Œ์ด ์„ฑ๋ฆฝํ•œ๋‹ค. \(\P[X \geq (1 + t) \mu] \leq \exp\left(-t^{2}\mu/3\right)\)
์ด๋ฅผ ์ ์šฉํ•˜๋ฉด, $\P[B(r, 1/4) \geq r/2] \leq e^{-r / 12}$ ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, $r = 12 \log (1 / \delta)$ ๋ฅผ ํƒํ•˜๋ฉด, \(\P[\text{median is bad}] \leq \P[\text{at least } r / 2 \text{ estimators are bad}] \leq \delta\) ๊ฐ€ ์„ฑ๋ฆฝํ•˜๋ฏ€๋กœ, ์ค‘๊ฐ„๊ฐ’ $Y$๋Š” $(\epsilon, \delta)$ ๊ทผ์‚ฌ์ž…๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋ฏ€๋กœ, ๋‹ค์Œ์˜ ์ •๋ฆฌ๊ฐ€ ์„ฑ๋ฆฝํ•ฉ๋‹ˆ๋‹ค. Theorem.
Unbiased estimator $X$๊ฐ€ $\V[X] = O(\E[X]^2)$๋ฅผ ๋งŒ์กฑํ•˜๋ฉด, $O(\frac{1}{\epsilon^2}\log(1 / \delta))$ ๊ฐœ๋กœ $(\epsilon, \delta)$ ๊ทผ์‚ฌ๋ฅผ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

์•ž์„  ์ •๋ฆฌ์™€ ๋น„๊ตํ•˜๋ฉด, $1 / \delta$ term์„ $\log(1 / \delta)$๋กœ ๊ฐœ์„ ํ•œ ๊ฒƒ์ด ๋ฉ๋‹ˆ๋‹ค.

์ค‘๊ฐ„๊ฐ’์— ๋Œ€ํ•ด์„œ๋Š” ์žฌ๋ฐŒ๋Š” ์„ฑ์งˆ์ด ์•Œ๋ ค์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.

Popultation median 1 ์ด $\tilde{\mu}$์ธ $X$์—์„œ ์ถ”์ถœํ•œ sample์˜ median์— ๋Œ€ํ•ด, Theorem. [1]
(์•ฝ๊ฐ„์˜ ๊ฐ€์ • ํ•˜์—์„œ) $X$์˜ density function์ด $f$, $X$์˜ median์ด $\tilde{\mu}$์ผ ๋•Œ, $X$์—์„œ $2m+1$ ๊ฐœ๋ฅผ sampleํ•˜์—ฌ ์–ป์€ median์˜ ๋ถ„ํฌ๋Š” ํ‰๊ท ์ด $\tilde{\mu}$, ๋ถ„์‚ฐ์ด $\frac{1}{8f(\tilde{\mu})^2 m}$์ธ ์ •๊ทœ๋ถ„ํฌ๋กœ ์ˆ˜๋ ดํ•œ๋‹ค.
์œ„ ์ •๋ฆฌ๋Š” absolute continuity, infinite population ๋“ฑ ๋ช‡๊ฐ€์ง€ technicalํ•˜๊ณ  mildํ•œ ๊ฐ€์ •์ด ๋“ค์–ด๊ฐ€์ง€๋งŒ, ์ผ๋‹จ์€ ์ด๋ถ€๋ถ„์€ ์šฐ๋ฆฌ์˜ ๋ชฉํ‘œ์ธ randomized algorithm์˜ ๊ฒฝ์šฐ์—๋Š” ํฌ๊ฒŒ ์ค‘์š”ํ•˜์ง€๋Š” ์•Š์Šต๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ, ํ‰๊ท ์˜ ์ค‘๊ฐ„๊ฐ’์ด ์–ด๋–ค ๋ถ„ํฌ๋ฅผ ๋ณด์ผ์ง€๋Š” ๋Œ€๋žต ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ € ์ •๋ฆฌ์—์„œ ์šฐ๋ฆฌ๊ฐ€ ํ•„์š”ํ•œ $f$ ๋Š” (์•Œ๊ธฐ ์–ด๋ ค์šด) ์›๋ž˜ estimator์˜ probability density๊ฐ€ ์•„๋‹Œ, ๊ทธ ํ‘œ๋ณธํ‰๊ท ์˜ pdf์ž…๋‹ˆ๋‹ค ($Z$์— ๋Œ€ํ•ด ์ค‘๊ฐ„๊ฐ’์„ ์ทจํ•˜๋ฏ€๋กœ) Central Limit Theorem์— ์˜ํ•ด, ํ‘œ๋ณธ ํ‰๊ท ์˜ ๋ถ„ํฌ ๋Š” ์ •๊ทœ๋ถ„ํฌ๋กœ ์ˆ˜๋ ดํ•˜๋ฏ€๋กœ, ์ถฉ๋ถ„ํžˆ ํฐ sample size์—์„œ $Z$ estimator๋Š” ๊ฑฐ์˜ ์ •๊ทœ๋ถ„ํฌ์ด๊ณ โ€ฆ ๊ทธ๋Ÿฌ๋ฉด $f$ ๊ฐ’๋„ ์ •ํ™•ํ•˜๊ฒŒ ๊ตฌํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

CLT์˜ ์ •ํ™•ํ•œ statement๋ฅผ ์ด์šฉํ•˜๋ฉด, ๋ถ„ํฌ์˜ ๊ฑฐ๋ฆฌ ์ƒ์—์„œ ๋งŽ์€ ๊ฒƒ์„ ๋” ๋…ผ์˜ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ์ด ํฌ์ŠคํŒ…์˜ ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚˜๋ฏ€๋กœ ์ƒ๋žตํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.


ํ‰๊ท ์˜ ์ค‘๊ฐ„๊ฐ’ vs ์ค‘๊ฐ„๊ฐ’์˜ ํ‰๊ท 

์ด ์žฅ์—์„œ, ๋ถ„ํฌ๊ฐ€ ์ •๊ทœ๋ถ„ํฌ์ด๋‹ค ๋ผ๊ณ  ๋งํ•˜๋Š” ๋งŽ์€ ๊ฒฝ์šฐ์— ์‚ฌ์‹ค์€ CLT์— ์˜ํ•ด ์ •๊ทœ๋ถ„ํฌ๋กœ ์ˆ˜๋ ดํ•œ ๊ฒฝ์šฐ๋ฅผ ๋งํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ์˜ ๋ถ„ํฌ๋Š” ์ •ํ™•ํžˆ ์ •๊ทœ๋ถ„ํฌ๊ฐ€ ๋˜์ง€ ์•Š์œผ๋ฏ€๋กœ, ๋ชจ๋“  ์ˆ˜์‹์ „๊ฐœ๋Š” ์•ฝ๊ฐ„์˜ :hand-waving: ์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. CLT์— ์˜ํ•ด ์–ป์–ด์ง€๋Š” ๋ถ„ํฌ์— ๋Œ€ํ•ด ๋ฌด์–ธ๊ฐ€๋ฅผ ๋…ผ์ฆํ•˜๋ ค๋ฉด convergence in distribution ๋“ฑ ๋ณต์žกํ•œ ๊ฐœ๋…์„ ์ ์šฉํ•ด์•ผ ํ•˜๋ฉฐ, ๊ณผ์ •์—์„œ ์ˆ˜๋ ด์„ ํ•œ๋ฒˆ์”ฉ ์“ธ๋•Œ๋งˆ๋‹ค ์•ฝ๊ฐ„์”ฉ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ‰๊ท ์˜ ์ค‘๊ฐ„๊ฐ’์ด์–ด์•ผ ํ•˜๋Š” ์ด์œ ๊ฐ€ ์žˆ์„๊นŒ์š”? ๊ตฌ์ฒด์ ์œผ๋กœ๋Š”, ์ค‘๊ฐ„๊ฐ’์˜ ํ‰๊ท  ์œผ๋กœ๋Š” ์•ˆ ๋ ๊นŒ์š”?

์ค‘๊ฐ„๊ฐ’์˜ ํ‰๊ท ์—๋Š” ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

  1. ์ฒซ์งธ๋กœ, ์ค‘๊ฐ„๊ฐ’์˜ ํ‰๊ท ์€ ์›๋ž˜ $X$์˜ ๊ธฐ๋Œ“๊ฐ’์ด ์œ ์ง€๋œ๋‹ค๋Š” (unbiasedness) ๋ณด์žฅ์ด ์—†์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ ๋ถ„ํฌ๊ฐ€ ํ•œ์ชฝ์œผ๋กœ ๊ธฐ์šธ์–ด์ ธ ์žˆ์–ด, ์ค‘๊ฐ„๊ฐ’์ด ํ‰๊ท ๊ณผ ๋‹ค๋ฅด๋‹ค๋ฉด, ๊ฐ sample์˜ median์— bias๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„ ํ‰๊ท ์„ ์ทจํ•˜๋Š” ๊ณผ์ •์—์„œ ์ด bias๊ฐ€ ์‚ฌ๋ผ์ง€์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

  2. ๊ทธ๋ ‡๋‹ค๋ฉด, ๋Œ€์นญ์ธ ๋ถ„ํฌ์—์„œ๋Š” ๋˜‘๊ฐ™์„๊นŒ์š”? ๊ทธ๋ ‡์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋Œ€์นญ์ธ ๋ถ„ํฌ์—์„œ๋Š” unbiasedness๋Š” ๋ณด์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค (median = mean์ด๋ฏ€๋กœ) ๊ทธ๋Ÿฌ๋‚˜ ์ด ๊ฒฝ์šฐ ์–ด๋Š ์ชฝ์ด ๋” ๋‚˜์€ estimator์ธ์ง€๋Š” ์•ฝ๊ฐ„ ์–ด๋ ค์šด ๋ฌธ์ œ์ด๊ณ , ์ผ๋ฐ˜์ ์œผ๋กœ๋Š” ํ‰๊ท ์˜ ์ค‘๊ฐ„๊ฐ’์ด ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

1๋ฒˆ์˜ ๊ฒฝ์šฐ ์กฐ๊ธˆ๋งŒ ์ƒ๊ฐํ•ด ๋ณด๋ฉด ๋ฐ”๋กœ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 2๋ฒˆ์˜ ๊ฒฝ์šฐ๋Š” ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฏ€๋กœ, ์ด๋ถ€๋ถ„์„ ๋” ์ƒ๊ฐํ•ด ๋ณด๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

์ˆ˜์‹์— ๋Œ€ํ•ด์„œ ์ƒ๊ฐํ•˜๊ธฐ ์ „์—, ์‹คํ—˜์„ ์•ฝ๊ฐ„ ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. $X$๊ฐ€ $[-1, 1]$ ์—์„œ uniformํ•œ ๋ถ„ํฌ๋ผ๊ณ  ์ƒ๊ฐํ•ด ๋ด…์‹œ๋‹ค. ์ด ๋ถ„ํฌ์˜ ํ‰๊ท ์€ 0์ด๊ณ , ๋ถ„์‚ฐ์€ $1 / 3$ ์ด ๋ฉ๋‹ˆ๋‹ค. (์•ž์„œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํ”„๋ ˆ์ž„์›Œํฌ์— ๋งž์ถ”๋ฉด, ์šฐ๋ฆฌ๊ฐ€ ์˜ˆ์ธกํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฐ’์ด 0์ด๊ณ  ๋ถ„์‚ฐ์ด $1/3$์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค)

์ด ๋ถ„ํฌ์—์„œ, 10,000๊ฐœ์˜ sample์„ ๋ฝ‘์•„ ์ด๋ฅผ 100๊ฐœ์”ฉ 100๊ฐœ์˜ ๊ทธ๋ฃน์œผ๋กœ ๋‚˜๋ˆ„์–ด, 1) ๊ฐ group์˜ mean์„ ๊ตฌํ•˜๊ณ  median์„ ์ทจํ•˜๋Š” ๊ฒฝ์šฐ 2) ๊ฐ group์˜ median์„ ๊ตฌํ•˜๊ณ  mean์„ ์ทจํ•˜๋Š” ๊ฒฝ์šฐ ๋ฅผ 5,000๋ฒˆ ๋ฐ˜๋ณตํ•˜๋ฉด, ์•„๋ž˜์™€ ๊ฐ™์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค.

experiment

Median-of-mean์˜ ๋ถ„ํฌ๊ฐ€ ๋” ์ข‹์€ (๋ถ„์‚ฐ์ด ์ž‘๊ณ , 0 ์ฃผ๋ณ€์œผ๋กœ ๋ชฐ๋ฆฐ) ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ๋Š” ํ‘œ์ค€ํŽธ์ฐจ๊ฐ€ ์•ฝ 0.0097 vs 0.007๋กœ, 40%์ •๋„์˜ ๊ฐœ์„ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. Kolmogorovโ€“Smirnov test๋ฅผ ์ˆ˜ํ–‰ํ–ˆ์„ ๋•Œ๋„ $p < 2.5 \times 10^{-13}$์œผ๋กœ, ๋ช…ํ™•ํžˆ ํ†ต๊ณ„์ ์œผ๋กœ ์œ ์˜ํ•˜๊ฒŒ ๊ฐœ์„ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์‹œ ์ˆ˜์‹์œผ๋กœ ๋Œ์•„์™€์„œ ์ƒ๊ฐํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์•ž์„œ์˜ theorem์— ๋Œ€์ž…ํ•ด ๋ณด๋ฉดโ€ฆ

$X$๊ฐ€ ๊ธฐ๋Œ“๊ฐ’ $\mu$์™€ ๋ถ„์‚ฐ $\sigma^2$์„ ๊ฐ€์ง„๋‹ค๊ณ  ํ•  ๋–„, $(2m+1)$๊ฐœ๋ฅผ ๋ฌถ์–ด median์„ ์ทจํ•˜๊ณ , ์ด๊ฒƒ์„ $(2k+1)$๊ฐœ ๋ฌถ์–ด์„œ mean์„ ์ทจํ•œ๋‹ค๋ฉด, ์ •๊ทœ๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ์–ป์€ sample mean์€ ๋ถ„์‚ฐ์„ sample size๋กœ ๋‚˜๋ˆˆ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฆ„์ด ์ž˜ ์•Œ๋ ค์ ธ ์žˆ์œผ๋ฏ€๋กœ, \(\hat{\mu} = \mu, \qquad \hat{\sigma}^2 = \frac{1}{(8f(\mu)^2 m) (2k + 1)}\) ์˜ parameter๋ฅผ ๊ฐ–๋Š” ์ •๊ทœ๋ถ„ํฌ๋กœ ์ˆ˜๋ ดํ•˜๊ฒŒ ๋˜๋ฉฐ, (๋Œ€์นญ์ธ ๋ถ„ํฌ๋ฅผ ์ƒ๊ฐํ•˜๋ฏ€๋กœ, $\tilde{\mu} = \mu$ ์ž…๋‹ˆ๋‹ค) ์—ฌ๊ธฐ์„œ $f$๋Š” ์›๋ž˜ $X$ estimator์˜ pdf์ž…๋‹ˆ๋‹ค.

๋ฐ˜๋Œ€๋กœ, $(2k+1)$ ๊ฐœ๋ฅผ ๋ฌถ์–ด mean์„ ์ทจํ•˜๊ณ , ์ด๊ฒƒ์„ $(2m+1)$๊ฐœ ๋ฌถ์–ด์„œ median์„ ์ทจํ•˜๋Š” ๊ฒฝ์šฐ๋ฅผ ์ƒ๊ฐํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. Central Limit Theorem์— ์˜ํ•ด, $(2k+1)$๊ฐœ์˜ sample mean์˜ ๋ถ„ํฌ๋Š” (์›๋ž˜ ๋ถ„ํฌ๊ฐ€ ์ •๊ทœ๋ถ„ํฌ๊ฐ€ ์•„๋‹ˆ์–ด๋„) ํ‰๊ท  $\mu$, ๋ถ„์‚ฐ $\sigma^2 / (2k + 1)$ ์˜ ์ •๊ทœ๋ถ„ํฌ์ž…๋‹ˆ๋‹ค. ์ •๊ทœ๋ถ„ํฌ์— ๋Œ€ํ•ด, ์œ„ median theorem์„ ์ ์šฉํ•˜๋ฉด, ํ‰๊ท ์˜ ์ค‘๊ฐ„๊ฐ’์€ \(\hat{\mu} = \mu, \qquad \hat{\sigma}^2 = \frac{1}{(8g(\mu)^2 m)}\) ์˜ parameter๋ฅผ ๊ฐ–๋Š” ์ •๊ทœ๋ถ„ํฌ๋กœ ์ˆ˜๋ ดํ•˜๊ฒŒ ๋˜๋ฉฐ, ์—ฌ๊ธฐ์„œ $g$๋Š” $X$๋กœ๋ถ€ํ„ฐ ์–ป์€ $(2k+1)$๊ฐœ์˜ sample mean์˜ ๋ถ„ํฌ๊ฐ€ ๊ฐ–๋Š” pdf์ž…๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋ฏ€๋กœ ๊ฒฐ๊ตญ ์ค‘์š”ํ•œ ๊ฒƒ์€ $f(\mu)$ ์™€ $g(\mu)$๊ฐ€ ์–ผ๋งˆ๋‚˜ ์ฐจ์ด๊ฐ€ ๋‚˜๋Š”๊ฐ€์˜ ๋ฌธ์ œ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ๋Š”, ํ‰๊ท ์˜ ์ค‘๊ฐ„๊ฐ’์ด ๋” ๋‚˜์€ estimator๋ผ๋Š” ์ฃผ์žฅ์ด ์ฐธ์ด๊ธฐ ์œ„ํ•ด์„œ๋Š”, $g(\mu)$ ๊ฐ€ $f(\mu)$ ์˜ $\sqrt{2k + 1}$ ๋ฐฐ ์ด์ƒ์ด ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

$X$๊ฐ€ ์ •๊ทœ๋ถ„ํฌ์ธ ๊ฒฝ์šฐ, $\mu$์—์„œ์˜ ๊ฐ’์ด $1 / \sigma$์— ๋น„๋ก€ํ•˜๋ฏ€๋กœ $g(\mu)$๊ฐ€ ์ •ํ™•ํžˆ $\sqrt{2k+1}$ ๋ฐฐ๊ฐ€ ๋˜๋ฉฐ, ์ด๋•Œ ๋‘ ๋ถ„ํฌ (med-of-mean / mean-of-med) ๊ฐ€ ์ •ํ™•ํžˆ ๊ฐ™์•„์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

$X$๊ฐ€ ๋‹ค๋ฅธ ๋ถ„ํฌ์ธ ๊ฒฝ์šฐ, $g(\mu)$๋Š” ์—ฌ์ „ํžˆ ์ •๊ทœ๋ถ„ํฌ์— ์ˆ˜๋ ดํ•˜๋ฏ€๋กœ $g(\mu)$ ๋Š” $\frac{\sqrt{2k+1}}{\sigma \sqrt{2\pi}}$ ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ $f(\mu)$ ๊ฐ€ $\frac{1}{\sigma \sqrt{2\pi}}$ ๋ณด๋‹ค ํฐ์ง€ ์ž‘์€์ง€์— ๋”ฐ๋ผ ๊ฐˆ๋ฆฌ๊ฒŒ ๋˜๋ฉฐ, ์ด ๊ฐ’์€ ์ •๊ทœ๋ถ„ํฌ์—์„œ $\mu$ ์œ„์น˜์˜ pdf๊ฐ’ ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ •๊ทœ๋ถ„ํฌ๋ณด๋‹ค concentrated๋œ2 ๋ถ„ํฌ๋ฅผ ๊ฐ–๋Š” $X$์— ๋Œ€ํ•ด์„œ๋Š” ์ค‘๊ฐ„๊ฐ’์˜ ํ‰๊ท ์ด, ๊ทธ๋ ‡์ง€ ์•Š์€ $X$์— ๋Œ€ํ•ด์„œ๋Š” ํ‰๊ท ์˜ ์ค‘๊ฐ„๊ฐ’์„ ํƒํ•˜๋Š” ๊ฒƒ์ด ์œ ๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

๊ทธ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” ๋Œ€ํ‘œ์ ์ธ ๋ถ„ํฌ๋กœ๋Š” Laplace ๋ถ„ํฌ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค (PDF์˜ ์ตœ๋Œ“๊ฐ’์ด $\sqrt{\pi}$๋ฐฐ๋งŒํผ ๋” ํฝ๋‹ˆ๋‹ค)

๊ฐ™์€ ์‹คํ—˜์„ Laplace๋ถ„ํฌ์— ๋Œ€ํ•ด ์ง„ํ–‰ํ•˜๋ฉด, ์•„๋ž˜ ๊ฒฐ๊ณผ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค. $\mu = 0, b = 1$ ์ธ Laplace ๋ถ„ํฌ๋กœ๋ถ€ํ„ฐ ์œ„์™€ ๋˜‘๊ฐ™์ด 5,000 * 100 * 100 ์‹คํ—˜์„ ํ•˜๋ฉด,

experiment

์™€ ๊ฐ™์ด, ์ค‘๊ฐ„๊ฐ’์˜ ํ‰๊ท ์ด ๋” ๋‚˜์€ ๋ถ„ํฌ๋ฅผ ๊ฐ€์ง์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค๋งŒ, ์ค‘๊ฐ„๊ฐ’์˜ ํ‰๊ท ์˜ ๊ฒฝ์šฐ ์œ„์™€ ๊ฐ™์ด Chebyshev-Chernoff๋ฅผ ์ด์šฉํ•œ ๋…ผ์ฆ์ด ๋˜‘๊ฐ™์ด ์ ์šฉ๋˜์ง€๋Š” ์•Š๊ณ , CLT์— ์˜์กดํ•ด์„œ ์ฆ๋ช…ํ•ด์•ผ ํ•˜๋ฉฐ, rigor๊ฐ€ ์•ฝ๊ฐ„ ์• ๋งคํ•ด์ง€๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.


Applications

Count-Min, Count-Sketch ๋“ฑ sublinear sketching์— ๋งŽ์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์–ธ์  ๊ฐ€ ํฌ์ŠคํŒ…ํ•˜๋ฉด ๋งํฌ๋ฅผ ์ถ”๊ฐ€ํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.


References

  1. A. Merberg, and S. J. Miller, Course Notes for Math 162: Mathematical Statistics, The Sample Distribution of the Median, (2008)


Footnotes

  1. ํ™•๋ฅ ๋ถ„ํฌ $X$์˜ median์ด๋ž€, cumulative distribution function $F_X$์— ๋Œ€ํ•ด $F_X(t) = 1/2$ ๊ฐ€ ๋˜๋Š” ์ง€์ ์„ ๋งํ•ฉ๋‹ˆ๋‹ค.ย โ†ฉ

  2. ์—ฌ๊ธฐ์„œ์˜ ์˜๋ฏธ๋Š”, ํ‰๊ท ์—์„œ์˜ ํ•จ์ˆ˜๊ฐ’ $f(\mu)$ ์ด ์ •๊ทœ๋ถ„ํฌ๋ณด๋‹ค ํฌ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ๋Œ€์นญ์ด๊ณ  $\abs{x - \mu}$์— ๋Œ€ํ•ด ๋‹จ์กฐ๊ฐ์†Œํ•˜๋Š” bell curveํ˜•ํƒœ์˜ pdf๋ฅผ ์ƒ๊ฐํ•˜๋ฉด, curve๊ฐ€ ๋พฐ์กฑํ•จ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.ย โ†ฉ