Back to : image-segmentation-2021
Back to : deep-learning-study
Contents

์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ๊ตฌํ˜„์ด ์–ด๋–ค ์‹์œผ๋กœ ๋™์ž‘ํ•ด์•ผ ํ•˜๋Š”์ง€๋ฅผ ๋Œ€์ถฉ ์•Œ์•„๋ณด๊ธฐ ์œ„ํ•ด, ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€์žฅ ๋‹จ์ˆœํ•œ ๋ชจ๋ธ์ธ 1-layer convolution์„ ์ด์šฉํ•ด semantic segmentation์„ ์‹œ๋„ํ•ฉ๋‹ˆ๋‹ค.

Preparation post ์—์„œ ์ด์–ด์ง‘๋‹ˆ๋‹ค.

Model ๋งŒ๋“ค๊ธฐ

Pytorch์—์„œ model์€ torch.nn.Module ํ˜•ํƒœ์˜ ํด๋ž˜์Šค๋กœ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

# models/single_conv.py
import torch.nn as nn

class SingleConvolution(nn.Module):
    def __init__(self):
        super(SingleConvolution, self).__init__()
        self.conv_layer1 = nn.Sequential (
            nn.Conv2d(3, 23, kernel_size = 3, padding=1),
            nn.ReLU()
        )
    def forward(self, x):
        output = self.conv_layer1(x)
        return output

์šฐ๋ฆฌ๋Š” RGB 3๊ฐœ ์ฑ„๋„์„ ๊ฐ–๋Š” ์ด๋ฏธ์ง€๋ฅผ ๋ฐ›์•„์„œ, 23๊ฐœ ํด๋ž˜์Šค ์ค‘ ํ•˜๋‚˜๋ฅผ ๊ตฌ๋ถ„ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ฆ‰, ์ž…๋ ฅ์€ $3 \times W \times H$ ํ˜•ํƒœ์ผ ๊ฒƒ์ด๋ฉฐ, ์ถœ๋ ฅ์€ ๊ฐ $(i, j)$ ๋งˆ๋‹ค 23๊ฐœ์˜ ํด๋ž˜์Šค์— ๋Œ€ํ•œ probability๋ฅผ ์ถœ๋ ฅํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ($23 \times W \times H$)

์šฐ๋ฆฌ๊ฐ€ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ํ˜•ํƒœ์˜ ๋ชจ๋ธ์€ ๋‹จ ํ•œ ๋ฒˆ์˜ convolution layer๋กœ ๊ตฌ์„ฑ๋œ ๋ชจ๋ธ์ผ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ $3 \times f \times f$ ํฌ๊ธฐ์˜ convolution filter 23๊ฐœ๊ฐ€ ๊ฐ ํด๋ž˜์Šค์— ๋Œ€์‘ํ•˜๋ฉฐ, ๊ฐ filter๋Š” trainable weight๊ณผ bias๋ฅผ ๊ฐ–์Šต๋‹ˆ๋‹ค. ์ฆ‰ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ์—ฌ๊ธฐ์„œ $27 \times 23$๊ฐœ์˜ weight๊ณผ 23๊ฐœ์˜ bias๋กœ ์ด 644๊ฐœ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ๊ฐ ํด๋ž˜์Šค์˜ ํ™•๋ฅ ๊ฐ’์€ convolution์—ฐ์‚ฐ๊ณผ ReLU ํ•œ๋ฒˆ์œผ๋กœ ๋ฐ”๋กœ ๊ฒฐ๊ณผ๊ฐ’์ด ๋„์ถœ๋ฉ๋‹ˆ๋‹ค.

์ด ๋ชจ๋ธ์„ ์ •์˜ํ•˜๋Š” ๊ฒƒ๊นŒ์ง€๋ฅผ ์ฝ”๋“œ๋กœ ์˜ฎ๊ธฐ๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

# main.py
from basics import *
from datautils import * 
from metrics import * 
from evaluate import *
from train import * 
from models import * 

batch_size = 6
train_set, test_set = import_drone_dataset()
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
model = SingleConvolution().to(device)
print(summary(model, (3, 600, 400)))

์—ฌ๊ธฐ์„œ batch size๋Š” GPU ๋ฉ”๋ชจ๋ฆฌ์— ๋“ค์–ด๊ฐ€๋Š” ํ•œ ๋งŽ์ด ์šฑ์—ฌ๋„ฃ๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜์ ์ž…๋‹ˆ๋‹ค. ์ €๋Š” 1070Ti๋ฅผ ์“ฐ๊ธฐ ๋•Œ๋ฌธ์— 6์ •๋„๋Š” ๊ดœ์ฐฎ์€๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ํ•จ์ˆ˜๋“ค์€ ์•ž์„œ Prep์—์„œ ์ค€๋น„ํ•œ ํ•จ์ˆ˜๋“ค์ž…๋‹ˆ๋‹ค. summary๊ฐ€ ๋ฐ˜ํ™˜ํ•˜๋Š” ๊ฒฐ๊ณผ๊ฐ€ ์•„๋ž˜์™€ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค.

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 23, 600, 400]             644
              ReLU-2         [-1, 23, 600, 400]               0
================================================================
Total params: 644
Trainable params: 644
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 2.75
Forward/backward pass size (MB): 84.23
Params size (MB): 0.00
Estimated Total Size (MB): 86.98
----------------------------------------------------------------

์šฐ๋ฆฌ๊ฐ€ ์˜ˆ์ƒํ–ˆ๋˜ ๋Œ€๋กœ 644๊ฐœ์˜ trainable param์„ ๊ฐ–๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Training

์ด์ œ ๋ชจ๋ธ์„ ์ •์˜ํ–ˆ๋‹ค๋ฉด, ์ด ๋ชจ๋ธ์˜ 633๊ฐœ์˜ parameter๋ฅผ ์‹ค์ œ๋กœ trainํ•ด ์ค˜์•ผ ํ•ฉ๋‹ˆ๋‹ค. Train์€ ํฌ๊ฒŒ ๋‘ ๊ณผ์ •์œผ๋กœ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค.

  1. Forward pass๋กœ ํ›ˆ๋ จ์šฉ ๋ฐ์ดํ„ฐ๋ฅผ ๋จน์—ฌ์„œ, ์ตœ์ข… ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ•œ ๋‹ค์Œ, ์ด ๊ฒฐ๊ณผ๋ฅผ ground truth์™€ ๋น„๊ตํ•ด์„œ ์–ผ๋งˆ๋‚˜ ๋‹ค๋ฅธ์ง€ (loss function)์˜ ๊ฐ’์„ ์ธก์ •
  2. ๊ทธ ๊ฐ’์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋ญ”๊ฐ€ optimization ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉ.

์ „์ฒด์ ์ธ CNN ๋ชจ๋ธ ํ›ˆ๋ จ์˜ ์ด๋ก ์— ๋Œ€ํ•ด์„œ ๋‹ค๋ฃฌ ํฌ์ŠคํŒ…๋“ค๊ณผ, LeNet์„ ์ด์šฉํ•ด์„œ pytorch์—์„œ classification ํ•˜๋Š” ํฌ์ŠคํŒ… (LeNet์œผ๋กœ MNIST ํ’€์–ด๋ณด๊ธฐ)๊ฐ€ ์žˆ์œผ๋ฏ€๋กœ ์ด์ชฝ์„ ์ฐธ๊ณ ํ•ด ์ฃผ์„ธ์š”.

์—ฌ๊ธฐ์„œ์˜ ํ›ˆ๋ จ๊ณผ์ •์€ LeNet MNISTํ›ˆ๋ จ๊ณผ ํฌ๊ฒŒ ๋‹ค๋ฅด์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

# train.py
from basics import * 
from metrics import *

def train(
    model : nn.Module, 
    epochs : int,
    train_loader : DataLoader,
    loss_func, optimizer,
    acc_metric = pixel_accuracy
):
    torch.cuda.empty_cache()
    train_losses = [] 
    train_acc = []
    model.to(device)
    start = time.time()
    for _ in range(epochs):
        print(f"EPOCH {_+1} training begins...")
        train_start = time.time()
        model.train()
        train_accuracy = 0 
        train_loss = 0
        for i, data in enumerate(tqdm(train_loader)):
            img, mask = data 
            img = img.to(device)
            mask = mask.to(device)
            output = model(img)

            train_accuracy += acc_metric(output, mask)
            loss = loss_func(output, mask)
            train_loss += loss.item()
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
        print(f"Train epoch {_+1} / {epochs}",
              f"Training Loss {train_loss/len(train_loader):.4f}",
              f"Training Accr {train_accuracy/len(train_loader):.4f}",
              f"Training Time {(time.time()-train_start)/60:.2f} min")
    history = {'train_loss' : train_losses, 'train_acc' :train_acc}
    print(f"Total training time {(time.time()-start)/60:.2f} minutes taken")
    return history

LeNet์œผ๋กœ MNIST ํ’€์–ด๋ณด๋Š” ํฌ์ŠคํŒ…์—์„œ ๋‹ค๋ค˜๋˜ ๊ฒƒ๊ณผ ๊ฑฐ์˜ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ ๋ชจ๋ธ์— ๋Œ€ํ•ด ์‹คํ—˜ํ•˜๊ธฐ ์œ„ํ•ด ํ•จ์ˆ˜๋กœ ๋งŒ๋“ค์—ˆ๋‹ค๋Š” ์ •๋„๋งŒ ์ฐจ์ด๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ฌ๋ผ์ง€๋Š” ๋ถ€๋ถ„์ด ๊ฑฐ์˜ ์—†์œผ๋ฏ€๋กœ, LeNet ํฌ์ŠคํŒ…์„ ์ฐธ์กฐํ•ด ์ฃผ์„ธ์š”.

  • train data๋ฅผ ๋กœ๋”ฉํ•  data loader๋ฅผ ๋ฐ›๊ณ 
  • ๋ช‡ epoch ๋Œ๋ฆด์ง€๋ฅผ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๋ฐ›๊ณ 
  • ์–ด๋–ค loss function์„ ์–ด๋–ค optimizer๋กœ ํ›ˆ๋ จํ•˜๊ณ 
  • ์–ด๋–ค ๋ฐฉ๋ฒ•์œผ๋กœ accuracy๋ฅผ ์ธก์ •ํ• ์ง€ (์‚ฌ์‹ค ํ›ˆ๋ จ ์ž์ฒด์—๋Š” ์ƒ๊ด€์ด ์—†๋Š”๋ฐ, ๋ˆˆ์œผ๋กœ ๋ณด๊ธฐ ์œ„ํ•ด์„œ์ž…๋‹ˆ๋‹ค) ์ •ํ•ฉ๋‹ˆ๋‹ค.

์ด์ œ, ๋งˆ์ง€๋ง‰์œผ๋กœ ์ด ๋ชจ๋‘๋ฅผ ํ•ฉ์ณ์„œ ์ตœ์ข… ๋กœ์ง์„ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.

# main.py
train(
    model = model, 
    epochs = 5,
    train_loader = train_loader, 
    loss_func = nn.CrossEntropyLoss(), 
    optimizer = torch.optim.Adam(model.parameters(), lr=0.003)
)
evaluator = ModelEvaluation(model, test_set, pixel_accuracy)
evaluator.evaluate_all()

์ด ๋ชจ๋ธ์— ๋ญ”๊ฐ€ ๋…ธ๋ ฅ์„ ๊ธฐ์šธ์ด๋Š” ๊ฒƒ์€ ์˜๋ฏธ๊ฐ€ ์—†์œผ๋ฏ€๋กœ, ์•„๋ฌด๋ ‡๊ฒŒ๋‚˜ 5 epoch๋ฅผ ๋Œ๋ฆฝ๋‹ˆ๋‹ค. loss function๊ณผ optimizer๋„ ์ผ๋ฐ˜์ ์ธ Cross Entropy Loss ์™€ Adam์„ ๊ทธ๋Œ€๋กœ ์ง‘์–ด๋„ฃ์Šต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์ข€ ๊ธฐ๋‹ค๋ ค ๋ณด๋ฉดโ€ฆ

Results

picture 2
์ด ๊ฒฐ๊ณผ๋Š” 8๋ฒˆ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ๋Œ€๋ถ€๋ถ„์„ void๋กœ ์žก์•„๋‚ด๊ธด ํ–ˆ๋Š”๋ฐ, ๋ญ”๊ฐ€ ์ด๋ฏธ์ง€์˜ ํฐ ์ฒญํฌ๋“ค์— ๋Œ€ํ•ด ๋ถ„๋ช… trivialํ•˜์ง€ ์•Š๊ฒŒ ๋ญ”๊ฐ€๋ฅผ ์žก์•„๋‚ธ ๊ฒƒ ๊ฐ™์•„ ๋ณด์ž…๋‹ˆ๋‹ค.

์ด๋Š” show_qualitative๋กœ ๋ฝ‘์€ ๊ฒฐ๊ณผ์ธ๋ฐ, evaluateํ•œ ๊ฒฐ๊ณผ๋Š” ํ‰๊ท  pixel accuracy 47%์ •๋„๊ฐ€ ๋‚˜์™”์Šต๋‹ˆ๋‹ค. ์ด ํ”„๋กœ์ ํŠธ๋Š” ์•ž์œผ๋กœ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•ด ์ด๋ฅผ 85% ๋‚ด์ง€๋Š” ๊ทธ ์ด์ƒ์œผ๋กœ ์˜ฌ๋ฆด ๊ณ„ํš์ž…๋‹ˆ๋‹ค.