CNN (1)

합성곱 신경망(Convolutional Neural Networks, CNN)

참고: 이 교본은 기계학습(ML)에 관한 전문 지식과 경험을 가진 고급 사용자를 대상으로 한 TensorFlow 교본 입니다.
NOTE: This tutorial is intended for advanced users of TensorFlow and assumes expertise and experience in machine learning.

개요Overview

CIFAR-10분류는 ML의 일반적인 벤치마크 문제입니다.
CIFAR-10 classification is a common benchmark problem in machine learning.

10개 카테고리에 걸쳐 RGB 32 × 32 픽셀 이미지를 분류하는 것이 문제입니다: 비행기, 자동차, 새, 고양이, 사슴, 개, 개구리, 말, 선박, 트럭
The problem is to classify RGB 32x32 pixel images across 10 categories: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck.

CIFAR-10 Samples

보다 상세한 사항은 CIFAR-10 페이지 및 Alex Krizhevsky의 기술보고서를 참조하세요.
For more detail refer to the CIFAR-10 page and a Tech Report by Alex Krizhevsky.

목표 Goals

이 교본의 목표는 이미지를 인식하는 비교적 작은 CNN을 구축하는 것입니다.
The goal of this tutorial is to build a relatively small convolutional neural network (CNN) for recognizing images.

이 과정에서 이 교본은: In the process, this tutorial:

1. 네트워크 아키텍처, 훈련 및 평가를 위한 정규 조직을 강조합니다.
Highlights a canonical organization for network architecture, training and evaluation.

2. 더 크고 정교한 모델을 구성하기 위한 템플릿을 제공합니다.
Provides a template for constructing larger and more sophisticated models.

대형 모델로 확장하는 TensorFlow 능력의 대부분을 활용할 수 있을 만큼 CIFAR-10이 복잡하기 때문에 CIFAR-10을 선택하였습니다.
The reason CIFAR-10 was selected was that it is complex enough to exercise much of TensorFlow's ability to scale to large models.

동시에, CIFAR-10 모델은 신속하게 훈련시킬 만큼 작아서 새로운 아이디어와 기술을 시도하고 실험하는데 이상적 입니다.
At the same time, the model is small enough to train fast, which is ideal for trying out new ideas and experimenting with new techniques.

교본의 하이라이트 Highlights of the Tutorial

CIFAR-10 교본은 TensorFlow에서 보다 크고 보다 정교한 모델을 설계하기 위한 몇 가지 중요한 구조물(constructs)을 보여줍니다:
The CIFAR-10 tutorial demonstrates several important constructs for designing larger and more sophisticated models in TensorFlow:

l 합성곱(convolution), 정류 선형 활성화(rectified linear activations), 최대 풀링(max pooling) 및 로컬 응답 정규화(local response normalization)등의 핵심 수학적 요소.
Core mathematical components including convolution, rectified linear activations, max pooling and local response normalization.

l 입력 이미지, 활성화와 기울기의 손실 및 분포 등, 훈련 동안의 네트워크 활동의 시각화(Visualization) .
Visualization of network activities during training, including input images, losses and distributions of activations and gradients.

l 학습된 매개변수들의 이동평균(moving average) 계산과, 예측 성능 향상을 위한 평가 동안의 이들 평균의 사용을 위한 루틴.
Routines for calculating the moving average of learned parameters and using these averages during evaluation to boost predictive performance.

l 시간 경과에 따라 체계적으로 감소되는 학습속도 일정(learning rate schedule)의 구현
Implementation of a learning rate schedule that systematically decrements over time.

l 디스크 대기시간과 비싼 이미지 사전-처리로부터 모델을 분리하기 위하여 입력데이터에 대한 대기줄(queues)을 프리페칭(prefetching)
Prefetching queues for input data to isolate the model from disk latency and expensive image pre-processing.

또한 다음을 보여주는 다중-GPU 버전 모델을 제공합니다:
We also provide a multi-GPU version of the model which demonstrates:

l 여러 개의 병렬 GPU 카드에 걸쳐 훈련시키는 모델을 구성.
Configuring a model to train across multiple GPU cards in parallel.

l 다중 GPU간에 변수를 공유하고 업데이트
Sharing and updating variables among multiple GPUs.

이 교본이 TensorFlow에서의 비전 작업에 큰 CNNs를 구축하는 시작점이 되기를 바랍니다.
We hope that this tutorial provides a launch point for building larger CNNs for vision tasks on TensorFlow.

모델 아키텍쳐Model Architecture

이 CIFAR 10 교본의 모델은 합성곱(convolutions)과 비선형(nonlinearities)이 교대로 구성된 다층 구조입니다.
The model in this CIFAR-10 tutorial is a multi-layer architecture consisting of alternating convolutions and nonlinearities.

이 층들 뒤에는 softmax 분류기로 이어지는 모두 연결된 층들(fully connected layers)이 있습니다.
These layers are followed by fully connected layers leading into a softmax classifier.

이 모델은 상위 몇 개 층에서 약간 다른, 알렉스 Krizhevsky가 기술한 구조를 따릅니다.
The model follows the architecture described by Alex Krizhevsky, with a few differences in the top few layers.

이 모델은 GPU에서 몇 시간 동안의 훈련으로 약 86% 정도의 최대 성능을 달성합니다.
This model achieves a peak performance of about 86% accuracy within a few hours of training time on a GPU.

상세한 사항은 아래와 코드를 참조하시기 바랍니다. Please see below and the code for details.

1개의 이미지에서 추론(inference)을 계산하려면, 1,068,298개의 학습가능 매개변수로 구성되어야 하며, 약19.5M의 곱셈–덧셈 ops가 필요합니다.
It consists of 1,068,298 learnable parameters and requires about 19.5M multiply-add operations to compute inference on a single image.

코드 구성 Code Organization

tensorflow/models/image/cifar10/안에 이 교본의 코드가 들어있습니다.

The code for this tutorial resides in tensorflow/models/image/cifar10/.

File	Purpose
cifar10_input.py	Reads the native CIFAR-10 binary file format.
cifar10.py	Builds the CIFAR-10 model.
cifar10_train.py	Trains a CIFAR-10 model on a CPU or GPU.
cifar10_multi_gpu_train.py	Trains a CIFAR-10 model on multiple GPUs.
cifar10_eval.py	Evaluates the predictive performance of a CIFAR-10 model.

CIFAR-10 Model

CIFAR-10 네트워크는 크게 cifar10.py에 포함되어 있습니다.
The CIFAR-10 network is largely contained in cifar10.py.

전체 훈련용 그래프에는 약 765개의 ops가 포함되어 있습니다.
The complete training graph contains roughly 765 operations.

다음 모듈로 그래프를 구성하여 코드의 대부분을 재사용 할 수 있게 만들어져 있습니다:
We find that we can make the code most reusable by constructing the graph with the following modules:

모델 입력(inputs): inputs()과 distorted_inputs()는 각각 평가 및 훈련용 CIFAR 이미지를 읽고 사전-처리하는 ops를 추가 합니다.
Model inputs: inputs() and distorted_inputs() add operations that read and preprocess CIFAR images for evaluation and training, respectively.
모델 예측(prediction): inference()는 추론 즉, 제공된 이미지에서 분류를 수행 하는 ops를 추가합니다.
Model prediction: inference() adds operations that perform inference, i.e. classification, on supplied images.
모델 훈련(training): loss()와 train()는 손실, 기울기, 변수 업데이트 및 시각화 요약을 계산하는 ops를 추가합니다.
Model training: loss() and train() add operations that compute the loss, gradients, variable updates and visualization summaries.

모델의 입력 부분은, CIFAR-10 바이너리 데이터파일에서 이미지를 읽어오는, 함수 inputs()과 distorted_inputs()로 구축됩니다.
The input part of the model is built by the functions inputs() and distorted_inputs() which read images from the CIFAR-10 binary data files.

이들 파일은 고정 바이트 길이 레코드를 포함하고 있어서 tf.FixedLengthRecord를 사용합니다.
These files contain fixed byte length records, so we use tf.FixedLengthRecordReader.

Reader 클래스의 작동방식에 대한 자세한 내용은 Reading Data를 참조하십시오.
See Reading Data to learn more about how the Reader class works.

이미지는 아래와 같이 처리됩니다: The images are processed as follows:

· 평가용 중앙집중식으로 또는 훈련용 무작위방식으로, 이미지를 24 x 24 픽셀로 자릅니다.
They are cropped to 24 x 24 pixels, centrally for evaluation or randomly for training.

· 모델을 동적 범위로 집중시키기 위하여 이미지를 근사치 흰색(approximately whitened)으로 만듭니다.They are approximately whitened to make the model insensitive to dynamic range.

훈련할 때, 데이터집합크기를 인위적으로 확대하기 위하여 한 시리즈의 무작위 왜곡 (distortions)을 추가로 적용합니다:
For training, we additionally apply a series of random distortions to artificially increase the data set size:

· 왼쪽에서 오른 쪽으로 임의로 이미지를 flip Randomly flip the image from left to right.

· 이미지 밝기를 임의로 왜곡(distort) Randomly distort the image brightness.

· 이미지 contrast를 임의로 왜곡 Randomly distort the image contrast.

사용 가능한 왜곡 목록은 이미지 페이지를 참조하십시오.
Please see the Images page for the list of available distortions.

TensorBoard에서 그들을 시각화 할 수 있도록 또한 이미지에 image_summary을 첨부합니다.
We also attach an image_summary to the images so that we may visualize them in TensorBoard.

입력이 정확하게 구축되는지를 검증해야 합니다.
This is a good practice to verify that inputs are built correctly.

디스크로부터 이미지를 판독하고 왜곡하는데 많은 처리 시간이 소요됩니다.
Reading images from disk and distorting them can use a non-trivial amount of processing time.

훈련을 늦추는 이러한 ops를 방지하기 위해, TensorFlow 큐를 지속적으로 채우는 16개 의 개별 스레드 내부에서 이들을 실행합니다.
To prevent these operations from slowing down training, we run them inside 16 separate threads which continuously fill a TensorFlow queue.

모델 예측 Model Prediction

예측의 로짓(logits)을 계산하는 op를 추가하는 inference() 함수로 모델 예측 부분이 구성됩니다.
The prediction part of the model is constructed by the inference() function which adds operations to compute the logits of the predictions.

모델 예측 부분은 다음과 같이 구성되어 있습니다: That part of the model is organized as follows:

층이름(Layer Name)	설명(Description)
conv1	convolution and rectified linear activation.
pool1	max pooling.
norm1	local response normalization.
conv2	convolution and rectified linear activation.
norm2	local response normalization.
pool2	max pooling.
local3	fully connected layer with rectified linear activation.
local4	fully connected layer with rectified linear activation.
softmax_linear	로짓 산출을 위한 선형 변형 linear transformation to produce logits.

아래는 추론 op를 설명하는 TensorBoard에서 생성된 그래프입니다.
Here is a graph generated from TensorBoard describing the inference operation:

연습: inference의 출력은 비-정규화 로짓(un-normalized logits) 입니다.
EXERCISE: The output of inference are un-normalized logits.

tf.softmax()를 사용하여 정규화된 예측을 반환하도록 네트워크 아키텍처를 편집 하십시오.
Try editing the network architecture to return normalized predictions using tf.softmax().

inputs()와 inference() 함수는 모델에서 평가를 수행하는 데 필요한 모든 부품을 제공 합니다.
The inputs() and inference() functions provide all the components necessary to perform evaluation on a model.

이제부터는 모델 훈련을 위한 ops 구축을 합니다.
We now shift our focus towards building operations for training a model.

연습: inference()의 모델 아키텍처는 cuda-convnet에 지정된 CIFAR-10모델과 약간 다릅니다.
EXERCISE: The model architecture in inference() differs slightly from the CIFAR-10 model specified in cuda-convnet.

특히, 원래 Alex모델의 최상 층들은 로컬로 연결되어 있고 모두 연결되어 있지 않습니다.
In particular, the top layers of Alex's original model are locally connected and not fully connected.

최상층에서 로컬로 연결된 구조를 정확히 재현하려면 구조를 편집하십시오.
Try editing the architecture to exactly reproduce the locally connected architecture in the top layer.

저작자표시 (새창열림)

'TensorFlow' 카테고리의 다른 글

CNN (3) (0)	2016.04.30
CNN (2) (0)	2016.04.30
윈도우에 TensorFlow 설치하고 MNIST 다운로드 하기 (0)	2016.04.28
Deep MNIST (3) (0)	2016.04.27
Deep MNIST (2) (0)	2016.04.27

머신러닝 GPT

CNN (1)

'TensorFlow' 카테고리의 다른 글

댓글

티스토리툴바

CNN (1)

'TensorFlow' 카테고리의 다른 글

관련글

댓글

티스토리툴바