Back to : image-segmentation-2021
Back to : deep-learning-study

์•ž์œผ๋กœ ์ด ํ”„๋กœ์ ํŠธ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ์ฝ”๋“œ๋Š” ๋ชจ๋‘ Github Repo ์— ์˜ฌ๋ผ๊ฐˆ ์˜ˆ์ •์ž…๋‹ˆ๋‹ค. ์˜ค๋Š˜์€ ๋จผ์ €, ๋ฐ์ดํ„ฐ ๋“ฑ์„ ์ค€๋น„ํ•˜๋Š” ๊ณผ์ •์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

Data preparation

TU Graz์—์„œ ์ œ๊ณตํ•˜๋Š” Drone aerial image ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๋งํฌ ์—์„œ ๋‹ค์šด๋กœ๋“œ๋ฐ›์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ์ง„ 400์žฅ์˜ ๋ฐ์ดํ„ฐ์…‹์ด์ง€๋งŒ ๊ต‰์žฅํžˆ ์šฉ๋Ÿ‰์ด ํฌ๊ณ  (4.1GB, ๊ฐ ์ด๋ฏธ์ง€๊ฐ€ ๋ฌด๋ ค 6000 by 4000 ์ž…๋‹ˆ๋‹ค) pixel-accurateํ•œ ๋ผ๋ฒจ์ด ๋‹ฌ๋ ค์žˆ๋Š”๋ฐ๋‹ค ํด๋ž˜์Šค๋Š” 23๊ฐœ๋กœ ๋งŽ์ง€ ์•Š์•„์„œ ์ ๋‹นํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” 360๊ฐœ๋ฅผ training์—, 40๊ฐœ๋ฅผ test์— ์“ฐ๊ฒ ์Šต๋‹ˆ๋‹ค.

๋จผ์ €, ํ•„์š”ํ•œ ๋ชจ๋“ˆ๋“ค์„ importํ•ด์„œ ๋•Œ๋ ค๋„ฃ์Šต๋‹ˆ๋‹ค. ๋ณ„๋กœ ์ข‹์€ practice๋Š” ์•„๋‹ˆ์ง€๋งŒ, ๋‹ค์–‘ํ•œ ๋ชจ๋ธ๋“ค์„ ํ…Œ์ŠคํŠธํ•ด๋ณด๋Š” ์˜๋ฏธ๊ฐ€ ์žˆ์œผ๋ฏ€๋กœ ์ฝ”๋“œ์˜ ์•„๋ฆ„๋‹ค์›€์€ ์ž ์‹œ ์ ‘์–ด๋‘๊ธฐ๋กœ ํ•ฉ์‹œ๋‹ค. Jupyter Notebook์ด๋‚˜ Colab์„ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด ํ›จ์”ฌ ํŽธํ•˜๊ฒŒ ํ…Œ์ŠคํŠธํ•  ์ˆ˜ ์žˆ๊ฒ ์ง€๋งŒ, ์ „์ฒด๋ฅผ ๊นƒํ—™์— ์˜ฌ๋ ค์„œ ๋ฐ”๋กœ ๋ณผ ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด ๊ทธ๋ƒฅ ์ผ๋ฐ˜ ํŒŒ์ด์ฌ ์ฝ”๋”ฉํ• ๋•Œ์ฒ˜๋Ÿผ ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

import pandas as pd, numpy as np 
import torch, torchvision, PIL.Image, cv2 
import os, sys
import torch.nn as nn
from import Dataset, DataLoader
import torchvision.transforms as T
import torch.nn.functional as F
from torchsummary import summary
import time
from tqdm import tqdm

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]

device ๋“ฑ์€ ์‚ฌ์‹ค ๋ชจ๋“  ๋”ฅ๋Ÿฌ๋‹์—์„œ ๊ณตํ†ต์ ์œผ๋กœ ์“ฐ๋Š” GPU ์ฝ”๋“œ์ด๋ฏ€๋กœ ๋ณ„๋กœ ํŠน๋ณ„ํ•œ ์˜๋ฏธ๊ฐ€ ์žˆ์ง€๋Š” ์•Š๊ณ , ํŠน์ดํ•œ ์ ์€ mean๊ณผ std์ž…๋‹ˆ๋‹ค. ์ด ๊ฐ’์€ RGB ๊ฐ ์ฑ„๋„์„ normalizeํ•˜๊ธฐ ์œ„ํ•œ ๊ฐ’์ธ๋ฐ์š”. 0.5๊ฐ€ ์•„๋‹Œ ์ด์œ ๋Š” ์ด ๊ฐ’๋“ค์ด ์‚ฌ์‹ค ImageNet์—์„œ ํ›ˆ๋ จ๋œ ๊ฒฐ๊ณผ ๊ฐ’์ธ๋ฐ, ์›์น™์ ์œผ๋กœ๋Š” ์ƒˆ๋กœ์šด mean๊ณผ std๋ฅผ trainํ•˜๋Š” ๊ฒƒ์ด ์˜๋ฏธ๊ฐ€ ์žˆ๊ฒ ์ง€๋งŒ 100๋งŒ์žฅ์˜ ImageNet ๋ฐ์ดํ„ฐ๋ฅผ ๋ฏฟ๊ณ  ๊ทธ๋ƒฅ ์จ๋„ ํฐ ๋ฌธ์ œ๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค.


pytorch์—์„œ custom dataset์„ ์‚ฌ์šฉํ•  ๋•Œ๋Š”, ํด๋ž˜์Šค๋ฅผ ๋งŒ๋“ค๋ฉด ๋ฉ๋‹ˆ๋‹ค.

from basics import * 
class DroneDataset(Dataset):
    def __init__(self, img_path, mask_path, X, test=False):
        self.img_path = img_path
        self.mask_path = mask_path
        self.X = X
        self.test = test
    def __len__(self):
        return len(self.X)

    def __getitem__(self, idx):
        img = cv2.imread(self.img_path + self.X[idx] + '.jpg')
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = cv2.resize(img, dsize=(600, 400), interpolation=cv2.INTER_NEAREST)
        mask = cv2.imread(self.mask_path + self.X[idx] + '.png', cv2.IMREAD_GRAYSCALE)
        mask = cv2.resize(mask, dsize=(600, 400), interpolation=cv2.INTER_NEAREST)

        t = T.Compose([T.ToTensor(), T.Normalize(mean, std)])
        img = t(img)
        mask = torch.from_numpy(mask).long()

        return img, mask

์ผ๋‹จ์€ data augmentation ๋“ฑ์€ ์•„๋ฌด๊ฒƒ๋„ ์ƒ๊ฐํ•˜์ง€ ๋ง๊ณ , ์ •๋ง ์ˆœ์ˆ˜ํ•œ bare minimum๋งŒ ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

๊ฐ„๋‹จํžˆ ํ•ด์„ํ•ด๋ณด๋ฉดโ€ฆ

  • __init__ ๋Š” img_path, mask_path ๋“ฑ์„ ๋ฐ›์•„์„œ ์ด ๋ฐ์ดํ„ฐ์…‹์˜ ์œ„์น˜์™€, ์–ด๋–ค transform์„ ์ ์šฉํ• ์ง€ (transform์ด๋ž€, ์ด๋ฏธ์ง€๋ฅผ ํ…์„œ๋กœ ๋ฐ”๊พธ๋Š” ์—ฐ์‚ฐ) ๊ธฐ์–ตํ•ฉ๋‹ˆ๋‹ค.
  • __getitem__์€ data[3] ๊ณผ ๊ฐ™์ด ์“ฐ๊ธฐ ์œ„ํ•ด์„œ overrideํ•˜๋Š” method๋กœ, ์ด๋ฏธ์ง€๋ฅผ ์ž˜ ์ฝ๊ณ  ์ ์ ˆํ•˜๊ฒŒ ๋ณ€ํ™˜ํ•ด์„œ ๋ฑ‰์–ด์ค๋‹ˆ๋‹ค.
  • 6000 * 4000์€ ์ง„์งœ ์ข€ ๋„ˆ๋ฌด ํฌ๊ธฐ ๋•Œ๋ฌธ์—, ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋Š” 600 * 400์œผ๋กœ ์ค„์˜€์Šต๋‹ˆ๋‹ค. ์ค„์ผ๋•Œ๋Š” NEAREST๋ฅผ ์จ์•ผ mask์˜ ๋ผ๋ฒจ์ด ์ด์ƒํ•ด์ง€์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  • test ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋Š” Image๋ฅผ ๊ทธ๋Œ€๋กœ ์ €์žฅํ•˜๊ณ , training ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋Š” ์ด๋ฅผ torch tensor๋กœ ๋ฐ”๊ฟ”์„œ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋Š” ์ด์œ ๋Š”, ๋‚˜์ค‘์— ์ •์„ฑ์ ์œผ๋กœ segmentation์˜ ํ€„๋ฆฌํ‹ฐ๋ฅผ ํ™•์ธํ•˜๊ณ  ์‹ถ์„ ๋•Œ ์ด๋ฏธ์ง€๋ฅผ ๊ฐ™์ด displayํ•˜๋ ค๋ฉด test์— ๋Œ€ํ•ด์„œ๋Š” ์ด๋ฏธ์ง€๋ฅผ ๊ฐ–๊ณ ์žˆ๋Š”๊ฒŒ ํŽธํ•˜๊ธฐ ๋•Œ๋ฌธ์ด

์ด์ œ ์ด ํŒŒ์ผ์„ ์‹ค์ œ ๋ชจ๋ธ์— ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด, training / test ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ์ž˜๋ผ์ค˜์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํŽธํ•˜๊ฒŒ ์ž˜๋ผ์ฃผ๋Š” sklearn.model_selection.train_test_split์ด ์žˆ์Šต๋‹ˆ๋‹ค.

from sklearn.model_selection import train_test_split
def import_drone_dataset():
    IMAGE_PATH = "../dataset/semantic_drone_dataset/original_images/"
    MASK_PATH = "../dataset/semantic_drone_dataset/label_images_semantic/"
    name = []
    for dirname, _, filenames in os.walk(IMAGE_PATH):
        for filename in filenames:
    df = pd.DataFrame({'id': name}, index = np.arange(0, len(name)))
    X_train, X_test = train_test_split(df['id'].values, test_size=0.1, random_state=0)
    train_set = DroneDataset(IMAGE_PATH, MASK_PATH, X_train, test=False)
    test_set = DroneDataset(IMAGE_PATH, MASK_PATH, X_test, test=True)
    return train_set, test_set

Evaluation of Model

๋ชจ๋ธ์„ ๋งŒ๋“ค๊ธฐ ์ „์— ์ผ๋‹จ ๋ชจ๋ธ์ด ์žˆ๋‹ค๋ฉด ์–ด๋–ป๊ฒŒ ๋™์ž‘ํ•ด์•ผ ํ• ์ง€๋ฅผ ๋จผ์ € ์ƒ๊ฐํ•ด ๋ด…๋‹ˆ๋‹ค. ์ข€ ์˜ค๋ž˜๋œ ๋ง์ด๊ธด ํ•˜์ง€๋งŒ, ๋จธ์‹ ๋Ÿฌ๋‹์„ ์ •์˜ํ•˜๋Š” ๋ฐฉ๋ฒ• ์ค‘ ํ•œ๊ฐ€์ง€๋Š” T, P, E ๋ผ๊ณ  ํ•ด์„œโ€ฆ

  • Task : ์–ด๋–ค ๋ช…ํ™•ํ•˜๊ฒŒ ์ •์˜๋˜๋Š” ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ์‹ถ๊ณ ,
  • Performance Measure : ํ˜„์žฌ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ํ”„๋กœ๊ทธ๋žจ์˜ ์„ฑ๋Šฅ์„ ์ธก์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ์œผ๋ฉฐ,
  • Experience : ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ํ”„๋กœ๊ทธ๋žจ์ด P๋ฅผ ๋ฐœ์ „์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๋…ธ๋ ฅํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์šฐ๋ฆฌ๋Š” ์•„์ง ํ”„๋กœ๊ทธ๋žจ์„ ์ž‘์„ฑํ•˜์ง€ ์•Š์•˜์ง€๋งŒ, semantic segmentation์ด๋ผ๋Š” T์— ์ง‘์ค‘ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. P๋ฅผ ์–ด๋–ป๊ฒŒ ํ• ์ง€๋Š” ์ด ์ž์ฒด๋กœ๋„ ๋…๋ฆฝ๋œ ํฌ์ŠคํŒ…์ด ํ•„์š”ํ•œ๋ฐ, mIoU, Hausdorff distance๋“ฑ ์žฌ๋ฐŒ๋Š”๊ฒŒ ๋งŽ์Šต๋‹ˆ๋‹ค. ์ด์ค‘ ๊ฐ€์žฅ ์ƒ๊ฐํ•˜๊ธฐ ์‰ฌ์šด ๊ฒƒ์€ ๊ทธ๋ƒฅ pixel๋‹จ์œ„๋กœ ๋งž์€ ํ”ฝ์…€์ˆ˜ / ์ „์ฒด ํ”ฝ์…€์ˆ˜๋ฅผ ์„ธ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Pytorch์—์„œ๋Š” ๋ชจ๋ธ์ด ์–ด๋–ค input image๋ฅผ ๋ฐ›์•„์„œ, model(x) ๊ณผ ๊ฐ™์€ ์‹์œผ๋กœ callํ•ด์„œ inference๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ์‹ค์ œ mask์™€ ๋น„๊ตํ•ด์„œ ์ •ํ™•๋„๋ฅผ ์ธก์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

Bare minimum์˜ ์ฒ ํ•™์— ๋”ฐ๋ผ ์ผ๋‹จ pixel accuracy๋งŒ์„ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ ๋‚˜์ค‘์— ์—ฌ๋Ÿฌ ๋‹ค๋ฅธ metric์„ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Œ์„ ์—ผ๋‘์— ๋‘๊ณ ,๋กœ ๋”ฐ๋กœ ํŒŒ์ผ์„ ๋นผ๊ฒ ์Šต๋‹ˆ๋‹ค.

def pixel_accuracy(output, mask):
    with torch.no_grad():
        output = torch.argmax(output, dim=1)
        correct = torch.eq(output, mask).int()
        accuracy = float(correct.sum()) / float(correct.numel())
    return accuracy

Pixel accuracy๋ฅผ ๊ณ„์‚ฐํ• ๋•Œ๋Š” backpropagation์šฉ gradient๊ฐ€ ํ•„์š”ํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ with torch.no_grad(): ๋กœ ๊ฐ์‹ธ์„œ ์ œ๋‚๋‹ˆ๋‹ค.

์ด์ œ, ํŽธํ•˜๊ฒŒ ํ…Œ์ŠคํŠธ๋ฅผ ์—ฌ๋Ÿฌ๋ฒˆ ์‹œ๋„ํ•˜๊ธฐ ์œ„ํ•ด ํ…Œ์ŠคํŠธ๋ฅผ ๋Œ๋ฆฌ๋Š” ํด๋ž˜์Šค๋ฅผ ๋”ฐ๋กœ ๋งŒ๋“ค๊ฒ ์Šต๋‹ˆ๋‹ค.

from basics import *

class ModelEvaluation():
    def __init__(self, model, test_dataset, metric):
        self.model = model
        self.test_dataset = test_dataset
        self.metric = metric
    def evaluate_single(self, image, mask):
        image =
        mask =
        with torch.no_grad():
            image = image.unsqueeze(0)
            mask = mask.unsqueeze(0)
            output = self.model(image)
            acc = self.metric(output, mask)
            masked = torch.argmax(output, dim=1)
            masked = masked.cpu().squeeze(0)
        return masked, acc

    def evaluate_all(self):
        accuracy = [] 
        for i in tqdm(range(len(self.test_dataset))):
            img, mask = self.test_dataset[i]
            pred, acc = self.evaluate_single(img, mask)
        print(f"Mean accruacy = {np.mean(accuracy)}")
        return accuracy

    def show_qualitative(self, ind):
        image, mask = self.test_dataset[ind]
        pred_mask, score = self.evaluate_single(image, mask)
        inv_normalize = T.Normalize(
            mean=[-0.485/0.229, -0.456/0.224, -0.406/0.225],
            std=[1/0.229, 1/0.224, 1/0.225]
        image = inv_normalize(image)
        image = image.cpu().numpy()
        image = image.swapaxes(0, 1)
        image = image.swapaxes(1, 2)
        import matplotlib.pyplot as plt
        fig, (ax1, ax2, ax3) = plt.subplots(1,3, figsize=(20,10))
        ax2.set_title('Ground truth')
        ax3.set_title(f'Model | score {score:.3f}')
  • __init__์—์„œ๋Š” ์–ด๋–ค ๋ชจ๋ธ์„ ํ…Œ์ŠคํŠธํ•˜๋Š”์ง€, ์–ด๋–ค ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ํ…Œ์ŠคํŠธํ•˜๋Š”์ง€, ๊ทธ๋ฆฌ๊ณ  ์–ด๋–ค metric์„ ์‚ฌ์šฉํ•  ๊ฒƒ์ธ์ง€๋ฅผ ์ •ํ•ฉ๋‹ˆ๋‹ค.
  • evaluate_single() ์€ ์ด๋ฏธ์ง€ ํ•œ ๊ฐœ๋ฅผ ๋ฐ›์•„์„œ ์ด๋ฅผ normalizeํ•œ๋‹ค์Œ ์‹ค์ œ๋กœ inferenceํ•ด ๋ด…๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ๋กœ predicted mask์™€ ๊ทธ ์ •ํ™•๋„๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. unsqueeze๋Š” ๊ฐ„๋‹จํžˆ ๊ทธ๋ƒฅ ํ…์„œ๋ฅผ ์ญ‰ ์žก์•„ํŽด์ฃผ๋Š” ์—ฐ์‚ฐ์œผ๋กœ ์ƒ๊ฐํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
  • evaluate_all() ์€ ํ‰๊ท  ์ •ํ™•๋„๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.
  • show_qualitative() ๋Š” ๊ฒฐ๊ณผ์˜ ์ •์„ฑ์  ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ ๊ฒƒ์œผ๋กœ, ํŠน์ • ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ image, ground truth, prediction์„ ๋™์‹œ์— ๋„์›Œ์ค๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ์ด๋ฏธ์ง€๋ฅผ ๋„์›Œ์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, Dataset์„ ๋งŒ๋“ค๋•Œ ToTensor์™€ Normalizeํ–ˆ๋˜ ๊ฒƒ์„ ๋‹ค์‹œ ๊ฑฐ๊พธ๋กœ ๋Œ๋ ค์ค˜์•ผ ํ•ฉ๋‹ˆ๋‹ค. Normalize์˜ ์ •์˜๋ฅผ ์ด์šฉํ•˜์—ฌ ์ด๋ถ€๋ถ„์€ ์ ๋‹นํžˆ ์ฒ˜๋ฆฌํ•ด์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์Œ ํฌ์ŠคํŒ…์—์„œ๋Š” train์„ ์–ด๋–ป๊ฒŒ ์‹ค์ œ๋กœ ์‹คํ–‰ํ• ์ง€์™€, ์ด๋ฅผ ์ด์šฉํ•ด์„œ ์•„์ฃผ ๊ฐ„๋‹จํ•œ ๋ชจ๋ธ์„ ํ•œ๋ฒˆ ํ™•์ธํ•ด๋ณด๋Š” ์ •๋„๋ฅผ ์ง„ํ–‰ํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.