Back to : image-segmentation-2021
Back to : deep-learning-study
Contents

์ด๋ฒ ํฌ์คํ์์๋ ๊ตฌํ์ด ์ด๋ค ์์ผ๋ก ๋์ํด์ผ ํ๋์ง๋ฅผ ๋์ถฉ ์์๋ณด๊ธฐ ์ํด, ์๊ฐํ  ์ ์๋ ๊ฐ์ฅ ๋จ์ํ ๋ชจ๋ธ์ธ 1-layer convolution์ ์ด์ฉํด semantic segmentation์ ์๋ํฉ๋๋ค.

Preparation post ์์ ์ด์ด์ง๋๋ค.

## Model ๋ง๋ค๊ธฐ

Pytorch์์ model์ torch.nn.Module ํํ์ ํด๋์ค๋ก ๋ง๋ค ์ ์์ต๋๋ค.

# models/single_conv.py
import torch.nn as nn

class SingleConvolution(nn.Module):
def __init__(self):
super(SingleConvolution, self).__init__()
self.conv_layer1 = nn.Sequential (
nn.Conv2d(3, 23, kernel_size = 3, padding=1),
nn.ReLU()
)
def forward(self, x):
output = self.conv_layer1(x)
return output


์ฐ๋ฆฌ๋ RGB 3๊ฐ ์ฑ๋์ ๊ฐ๋ ์ด๋ฏธ์ง๋ฅผ ๋ฐ์์, 23๊ฐ ํด๋์ค ์ค ํ๋๋ฅผ ๊ตฌ๋ถํ  ๊ฒ์๋๋ค. ์ฆ, ์๋ ฅ์ $3 \times W \times H$ ํํ์ผ ๊ฒ์ด๋ฉฐ, ์ถ๋ ฅ์ ๊ฐ $(i, j)$ ๋ง๋ค 23๊ฐ์ ํด๋์ค์ ๋ํ probability๋ฅผ ์ถ๋ ฅํด์ผ ํฉ๋๋ค. ($23 \times W \times H$)

์ฐ๋ฆฌ๊ฐ ์๊ฐํ  ์ ์๋ ๊ฐ์ฅ ๊ฐ๋จํ ํํ์ ๋ชจ๋ธ์ ๋จ ํ ๋ฒ์ convolution layer๋ก ๊ตฌ์ฑ๋ ๋ชจ๋ธ์ผ ๊ฒ์๋๋ค. ์ด ๋ชจ๋ธ์ $3 \times f \times f$ ํฌ๊ธฐ์ convolution filter 23๊ฐ๊ฐ ๊ฐ ํด๋์ค์ ๋์ํ๋ฉฐ, ๊ฐ filter๋ trainable weight๊ณผ bias๋ฅผ ๊ฐ์ต๋๋ค. ์ฆ ํ๋ผ๋ฏธํฐ๋ ์ฌ๊ธฐ์ $27 \times 23$๊ฐ์ weight๊ณผ 23๊ฐ์ bias๋ก ์ด 644๊ฐ๊ฐ ๋ฉ๋๋ค. ๊ฐ ํด๋์ค์ ํ๋ฅ ๊ฐ์ convolution์ฐ์ฐ๊ณผ ReLU ํ๋ฒ์ผ๋ก ๋ฐ๋ก ๊ฒฐ๊ณผ๊ฐ์ด ๋์ถ๋ฉ๋๋ค.

์ด ๋ชจ๋ธ์ ์ ์ํ๋ ๊ฒ๊น์ง๋ฅผ ์ฝ๋๋ก ์ฎ๊ธฐ๋ฉด ๋ค์๊ณผ ๊ฐ์ต๋๋ค.

# main.py
from basics import *
from datautils import *
from metrics import *
from evaluate import *
from train import *
from models import *

batch_size = 6
train_set, test_set = import_drone_dataset()
model = SingleConvolution().to(device)
print(summary(model, (3, 600, 400)))


์ฌ๊ธฐ์ batch size๋ GPU ๋ฉ๋ชจ๋ฆฌ์ ๋ค์ด๊ฐ๋ ํ ๋ง์ด ์ฑ์ฌ๋ฃ๋ ๊ฒ์ด ์ผ๋ฐ์ ์๋๋ค. ์ ๋ 1070Ti๋ฅผ ์ฐ๊ธฐ ๋๋ฌธ์ 6์ ๋๋ ๊ด์ฐฎ์๊ฒ ๊ฐ์ต๋๋ค. ๋ค๋ฅธ ํจ์๋ค์ ์์ Prep์์ ์ค๋นํ ํจ์๋ค์๋๋ค. summary๊ฐ ๋ฐํํ๋ ๊ฒฐ๊ณผ๊ฐ ์๋์ ๊ฐ์ด ๋ํ๋ฉ๋๋ค.

----------------------------------------------------------------
Layer (type)               Output Shape         Param #
================================================================
Conv2d-1         [-1, 23, 600, 400]             644
ReLU-2         [-1, 23, 600, 400]               0
================================================================
Total params: 644
Trainable params: 644
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 2.75
Forward/backward pass size (MB): 84.23
Params size (MB): 0.00
Estimated Total Size (MB): 86.98
----------------------------------------------------------------


์ฐ๋ฆฌ๊ฐ ์์ํ๋ ๋๋ก 644๊ฐ์ trainable param์ ๊ฐ๋ ๊ฒ์ ๋ณผ ์ ์์ต๋๋ค.

## Training

์ด์  ๋ชจ๋ธ์ ์ ์ํ๋ค๋ฉด, ์ด ๋ชจ๋ธ์ 633๊ฐ์ parameter๋ฅผ ์ค์ ๋ก trainํด ์ค์ผ ํฉ๋๋ค. Train์ ํฌ๊ฒ ๋ ๊ณผ์ ์ผ๋ก ์ด๋ฃจ์ด์ง๋๋ค.

1. Forward pass๋ก ํ๋ จ์ฉ ๋ฐ์ดํฐ๋ฅผ ๋จน์ฌ์, ์ต์ข ๊ฒฐ๊ณผ๋ฅผ ๋์ถํ ๋ค์, ์ด ๊ฒฐ๊ณผ๋ฅผ ground truth์ ๋น๊ตํด์ ์ผ๋ง๋ ๋ค๋ฅธ์ง (loss function)์ ๊ฐ์ ์ธก์ 
2. ๊ทธ ๊ฐ์ ์ต์ํํ๋ ๋ฐฉํฅ์ผ๋ก ๋ญ๊ฐ optimization ์๊ณ ๋ฆฌ์ฆ์ ์ ์ฉ.

์ ์ฒด์ ์ธ CNN ๋ชจ๋ธ ํ๋ จ์ ์ด๋ก ์ ๋ํด์ ๋ค๋ฃฌ ํฌ์คํ๋ค๊ณผ, LeNet์ ์ด์ฉํด์ pytorch์์ classification ํ๋ ํฌ์คํ (LeNet์ผ๋ก MNIST ํ์ด๋ณด๊ธฐ)๊ฐ ์์ผ๋ฏ๋ก ์ด์ชฝ์ ์ฐธ๊ณ ํด ์ฃผ์ธ์.

์ฌ๊ธฐ์์ ํ๋ จ๊ณผ์ ์ LeNet MNISTํ๋ จ๊ณผ ํฌ๊ฒ ๋ค๋ฅด์ง ์์ต๋๋ค.

# train.py
from basics import *
from metrics import *

def train(
model : nn.Module,
epochs : int,
loss_func, optimizer,
acc_metric = pixel_accuracy
):
torch.cuda.empty_cache()
train_losses = []
train_acc = []
model.to(device)
start = time.time()
for _ in range(epochs):
print(f"EPOCH {_+1} training begins...")
train_start = time.time()
model.train()
train_accuracy = 0
train_loss = 0
img = img.to(device)
output = model(img)

train_loss += loss.item()
loss.backward()
optimizer.step()
print(f"Train epoch {_+1} / {epochs}",
f"Training Time {(time.time()-train_start)/60:.2f} min")
history = {'train_loss' : train_losses, 'train_acc' :train_acc}
print(f"Total training time {(time.time()-start)/60:.2f} minutes taken")
return history


LeNet์ผ๋ก MNIST ํ์ด๋ณด๋ ํฌ์คํ์์ ๋ค๋ค๋ ๊ฒ๊ณผ ๊ฑฐ์ ๊ฐ์ต๋๋ค. ์ฌ๋ฌ ๋ชจ๋ธ์ ๋ํด ์คํํ๊ธฐ ์ํด ํจ์๋ก ๋ง๋ค์๋ค๋ ์ ๋๋ง ์ฐจ์ด๊ฐ ์์ต๋๋ค. ๋ฌ๋ผ์ง๋ ๋ถ๋ถ์ด ๊ฑฐ์ ์์ผ๋ฏ๋ก, LeNet ํฌ์คํ์ ์ฐธ์กฐํด ์ฃผ์ธ์.

• train data๋ฅผ ๋ก๋ฉํ  data loader๋ฅผ ๋ฐ๊ณ
• ๋ช epoch ๋๋ฆด์ง๋ฅผ ํ๋ผ๋ฏธํฐ๋ก ๋ฐ๊ณ
• ์ด๋ค loss function์ ์ด๋ค optimizer๋ก ํ๋ จํ๊ณ
• ์ด๋ค ๋ฐฉ๋ฒ์ผ๋ก accuracy๋ฅผ ์ธก์ ํ ์ง (์ฌ์ค ํ๋ จ ์์ฒด์๋ ์๊ด์ด ์๋๋ฐ, ๋์ผ๋ก ๋ณด๊ธฐ ์ํด์์๋๋ค) ์ ํฉ๋๋ค.

์ด์ , ๋ง์ง๋ง์ผ๋ก ์ด ๋ชจ๋๋ฅผ ํฉ์ณ์ ์ต์ข ๋ก์ง์ ์์ฑํฉ๋๋ค.

# main.py
train(
model = model,
epochs = 5,
loss_func = nn.CrossEntropyLoss(),
)
evaluator = ModelEvaluation(model, test_set, pixel_accuracy)
evaluator.evaluate_all()


์ด ๋ชจ๋ธ์ ๋ญ๊ฐ ๋ธ๋ ฅ์ ๊ธฐ์ธ์ด๋ ๊ฒ์ ์๋ฏธ๊ฐ ์์ผ๋ฏ๋ก, ์๋ฌด๋ ๊ฒ๋ 5 epoch๋ฅผ ๋๋ฆฝ๋๋ค. loss function๊ณผ optimizer๋ ์ผ๋ฐ์ ์ธ Cross Entropy Loss ์ Adam์ ๊ทธ๋๋ก ์ง์ด๋ฃ์ต๋๋ค. ์ด๋ ๊ฒ ์ข ๊ธฐ๋ค๋ ค ๋ณด๋ฉดโฆ

## Results

์ด ๊ฒฐ๊ณผ๋ 8๋ฒ ์ด๋ฏธ์ง์ ๋ํ ๊ฒฐ๊ณผ์๋๋ค. ๋ชจ๋ธ์ด ๋๋ถ๋ถ์ void๋ก ์ก์๋ด๊ธด ํ๋๋ฐ, ๋ญ๊ฐ ์ด๋ฏธ์ง์ ํฐ ์ฒญํฌ๋ค์ ๋ํด ๋ถ๋ช trivialํ์ง ์๊ฒ ๋ญ๊ฐ๋ฅผ ์ก์๋ธ ๊ฒ ๊ฐ์ ๋ณด์๋๋ค.

์ด๋ show_qualitative๋ก ๋ฝ์ ๊ฒฐ๊ณผ์ธ๋ฐ, evaluateํ ๊ฒฐ๊ณผ๋ ํ๊ท  pixel accuracy 47%์ ๋๊ฐ ๋์์ต๋๋ค. ์ด ํ๋ก์ ํธ๋ ์์ผ๋ก ๋ค์ํ ๋ฐฉ๋ฒ์ ์ด์ฉํด ์ด๋ฅผ 85% ๋ด์ง๋ ๊ทธ ์ด์์ผ๋ก ์ฌ๋ฆด ๊ณํ์๋๋ค.