CNN (3)

모델 평가하기 Evaluating a Model

Hold-out 데이터집합에서 훈련된 모델이 잘 수행되는지 평가해봅니다.
Let us now evaluate how well the trained model performs on a hold-out data set.

스크립트 cifar10_eval.py가 모델을 평가합니다.
The model is evaluated by the script cifar10_eval.py.

이 스크립트가 inference() 함수로 모델을 구축하고, CIFAR-10의 평가 세트에서 1만개의 모든 이미지들을 사용합니다.
It constructs the model with the inference() function and uses all 10,000 images in the evaluation set of CIFAR-10.

스크립트가 1에서 정확도를 계산합니다: 상단 예측이 얼마나 자주 이미지의 참 레이블과 일치 하는지.
It calculates the precision at 1: how often the top prediction matches the true label of the image.

훈련 동안 모델이 얼마나 개선되는지를 모니터링하기 위해, cifar10_train.py가 생성 시킨 최근 검사점 파일에서 평가 스크립트가 주기적으로 실행됩니다.
To monitor how the model improves during training, the evaluation script runs periodically on the latest checkpoint files created by the cifar10_train.py.

python cifar10_eval.py

평가와 훈련 바이너리가 동일 GPU에서 실행되지 않도록 하고, 메모리가 부족하지 않게 해야 합니다.
Be careful not to run the evaluation and training binary on the same GPU or else you might run out of memory.

가능하면 평가를 별도의 GPU에서 실행하거나 또는 동일한 GPU에서 평가를 실행하는 동안에는 훈련 바이너리를 일시 중단해야 합니다.
Consider running the evaluation on a separate GPU if available or suspending the training binary while running the evaluation on the same GPU.

출력은 다음과 같아야 합니다: You should see the output:

2015-11-06 08:30:44.391206: precision @ 1 = 0.860

...

그것이 86%의 정확도를 반환한 경우 - 스크립트는 단순히 주기적으로 정확도 @1을 반환합니다.
The script merely returns the precision @ 1 periodically -- in this case it returned 86% accuracy.

cifar10_eval.py는 또한 TensorBoard에서 시각화되는 요약을 내보냅니다.
cifar10_eval.py also exports summaries that may be visualized in TensorBoard.

이들 요약은 평가 동안 모델에 추가 통찰력을 제공합니다.
These summaries provide additional insight into the model during evaluation.

훈련 스크립트는 모든 학습된 변수의 이동 평균 버전을 계산합니다.
The training script calculates the moving average version of all learned variables.

평가 스크립트는 이동 평균 버전과 모든 학습된 모델 매개변수를 대체합니다.
The evaluation script substitutes all learned model parameters with the moving average version.

이 대체로 평가할 때 모델의 성능이 향상됩니다.
This substitution boosts model performance at evaluation time.

연습: 평균 매개변수를 사용하면 정밀도 @1로 측정된, 약 3 %로 예측 성능을 향상시킬 수 있습니다.
EXERCISE: Employing averaged parameters may boost predictive performance by about 3% as measured by precision @ 1.

cifar10_eval.py가 모델의 평균 매개변수를 사용하지 않도록 편집하고, 예측 성능 저하 를 검증합니다.

Edit cifar10_eval.py to not employ the averaged parameters for the model and verify that the predictive performance drops.

여러 개의 GPU카드를 사용하여 모델을 훈련시키기
Training a Model Using Multiple GPU Cards

현대적 워크스테이션에는 과학적 계산을 위한 다중 GPU가 포함되어 있을 수 있습니다.
Modern workstations may contain multiple GPUs for scientific computation.

TensorFlow는 여러 카드에서 동시에 훈련 op를 실행하기 위해 이 환경을 활용할 수 있습니다.
TensorFlow can leverage this environment to run the training operation concurrently across multiple cards.

병렬 및 분산 방식으로 모델을 훈련하려면 조정된 훈련 과정이 필요합니다.
Training a model in a parallel, distributed fashion requires coordinating training processes.

모델 복제(replica)가, 데이터의 부분집합에서 모델 훈련의 복사본 하나가 되게 합니다.
For what follows we term model replica to be one copy of a model training on a subset of data.

개별 모델 복제가 모델 매개변수의 오래된 사본에서 훈련될 수도 있기 때문에, 모델 매개변수의 비동기 업데이트를 순수하게 사용하는 것은 차선의 훈련 성과로 연결됩니다.
Naively employing asynchronous updates of model parameters leads to sub-optimal training performance because an individual model replica might be trained on a stale copy of the model parameters.

반대로, 완전히 동기화된 업데이트를 사용하면 최고로 느린 모델 복제 만큼 느려질 것입니다.
Conversely, employing fully synchronous updates will be as slow as the slowest model replica.

다중 GPU 카드를 가진 워크스테이션에서, 각각의 GPU는 비슷한 속도를 가질 것이고 전체 CIFAR-10 모델을 실행하기에 충분한 메모리가 포함할 것입니다.
In a workstation with multiple GPU cards, each GPU will have similar speed and contain enough memory to run an entire CIFAR-10 model.

따라서, 다음의 방식으로 훈련 시스템을 설계합니다:
Thus, we opt to design our training system in the following manner:

· 각 GPU 위에 개별 모델 복제를 위치시킴
Place an individual model replica on each GPU.

· 모든 GPU가 한 뱃치의 데이터 처리를 마치기를 기다리면서 모델 매개변수를 동기적 으로 업데이트 시킴
Update model parameters synchronously by waiting for all GPUs to finish processing a batch of data.

아래는 이 모델의 다이어그램입니다: Here is a diagram of this model:

각각의 GPU는 데이터의 독특한 뱃치에 대한 기울기 뿐만 아니라 추론(inference)도 계산합니다.
Note that each GPU computes inference as well as the gradients for a unique batch of data.

이 설정은 효과적으로 GPU에 걸쳐 데이터의 큰 뱃치를 분할을 허용합니다.
This setup effectively permits dividing up a larger batch of data across the GPUs.

이 설정은 모든 GPU가 모델 매개변수를 공유하게 합니다.
This setup requires that all GPUs share the model parameters.

GPU에 데이터를 전송하고 전송 받는 것이 매우 느리다는 것은 잘 알려진 사실입니다.
A well-known fact is that transferring data to and from GPUs is quite slow.

따라서, CPU에 모든 모델 매개변수를 저장하고 업데이트하도록 합니다(녹색 박스 참조).
For this reason, we decide to store and update all model parameters on the CPU (see green box).

새로운 데이터 뱃치가 모든 GPU에 의해 처리되는 경우, 모델 매개변수들의 새로운 세트는 GPU 로 전송됩니다.
A fresh set of model parameters is transferred to the GPU when a new batch of data is processed by all GPUs.

GPU는 op에서 동기화됩니다. The GPUs are synchronized in operation.

모든 기울기는 GPU로부터 축적되어 평균이 됩니다(녹색상자 참조).
All gradients are accumulated from the GPUs and averaged (see green box).

모델 매개변수는 모든 모델 복제본에 걸쳐 평균 기울기로 업데이트됩니다.
The model parameters are updated with the gradients averaged across all model replicas.

장치에 변수와 ops 배치하기 Placing Variables and Operations on Devices

장치에 ops와 변수를 배치하려면 몇 가지 특별한 추상화가 필요합니다.
Placing operations and variables on devices requires some special abstractions.

요구되는 최초의 추상화는 단일 모델 복제의 추론과 기울기를 계산하는 함수입니다.
The first abstraction we require is a function for computing inference and gradients for a single model replica.

코드에서 이 추상화를 "tower" 라고 합니다. In the code we term this abstraction a "tower".

각 타워에 대하여 2개의 속성을 설정해야 합니다:
We must set two attributes for each tower:

· 타워 내의 모든 ops에서 유일한 이름. A unique name for all operations within a tower.

tf.named_scope()은 범위를 앞에 추가하여 유일한 이름을 제공합니다.
tf.name_scope() provides this unique name by prepending a scope.

예를 들어, 첫째 타워의 모든 ops에는 앞에 tower_0을 붙여서, tower_0/CONV1/ Conv2D와 같이 됩니다.
For instance, all operations in the first tower are prepended with tower_0, e.g. tower_0/conv1/Conv2D.

· 타워 내에서 op를 실행하는 선호되는 하드웨어 장치.
A preferred hardware device to run the operation within a tower.

tf.device ()이 이것을 지정합니다. tf.device() specifies this.

예를 들어, 첫째 탑의 모든 ops는, 그들이 첫째 GPU에서 가동되어야 함을 나타내면서, 장치(‘/gpu:0’) 범위 안에 있습니다.
For instance, all operations in the first tower reside within device('/gpu:0') scope indicating that they should be run on the first GPU.

모든 변수들은 CPU에 pin되며, multi-GPU 버전에서 변수를 공유하기 위하여 tf.get_variable() 를 통하여 접근됩니다.
how-to on Sharing Variables를 참조하십시오.
All variables are pinned to the CPU and accessed via tf.get_variable() in order to share them in a multi-GPU version. See how-to on Sharing Variables.

Launching and Training the Model on Multiple GPU cards

컴퓨터에 설치된 복수의 GPU카드가 있다면 thecifar10_multi_gpu_train.py 스크립트 보다 빠르게 모델을 학습하는 데 사용할 수 있습니다.
If you have several GPU cards installed on your machine you can use them to train the model faster with thecifar10_multi_gpu_train.py script.

이 버전의 훈련 스크립트는 다수의 GPU카드에 걸쳐 모델을 병렬화시킵니다.
This version of the training script parallelizes the model across multiple GPU cards.

python cifar10_multi_gpu_train.py --num_gpus = 2

훈련 스크립트는 다음과 같아야 합니다: The training script should output:

Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.

2015-11-04 11:45:45.927302: step 0, loss = 4.68 (2.0 examples/sec; 64.221 sec/batch)

2015-11-04 11:45:49.133065: step 10, loss = 4.66 (533.8 examples/sec; 0.240 sec/batch)

2015-11-04 11:45:51.397710: step 20, loss = 4.64 (597.4 examples/sec; 0.214 sec/batch)

2015-11-04 11:45:54.446850: step 30, loss = 4.62 (391.0 examples/sec; 0.327 sec/batch)

2015-11-04 11:45:57.152676: step 40, loss = 4.61 (430.2 examples/sec; 0.298 sec/batch)

2015-11-04 11:46:00.437717: step 50, loss = 4.59 (406.4 examples/sec; 0.315 sec/batch)

...

GPU 카드의 개수는 기본값이 1임을 알아야 합니다. 추가로, 단 1 GPU만 컴퓨터에서 사용할 수 있다면, 그 이상을 요구할지라도 모든 계산이 그 위에 배치될 것입니다.
Note that the number of GPU cards used defaults to 1. Additionally, if only 1 GPU is available on your machine, all computations will be placed on it, even if you ask for more.

연습: cifar10_train.py의 기본 설정은 128의 뱃치 크기로 가동되어야 합니다. 2개의 GPU 위에서 64의 뱃치 크기로 cifar10_multi_gpu_train.py를 가동시키고 훈련속도를 비교하십시오.
EXERCISE: The default settings for cifar10_train.py is to run on a batch size of 128. Try running cifar10_multi_gpu_train.py on 2 GPUs with a batch size of 64 and compare the training speed.

다음 단계 Next Steps

축하합니다! CIFAR-10 교본을 완료했습니다.
Congratulations! You have completed the CIFAR-10 tutorial.

지금 각자의 이미지분류 시스템을 개발하여 훈련 시키는데 관심이 있다면, 이 교본을 나눠서 각자의 이미지분류 문제를 해결하기 위해 구성요소를 교체하는 것이 좋습니다.
If you are now interested in developing and training your own image classification system, we recommend forking this tutorial and replacing components to address your image classification problem.

연습: Street View House Numbers (SVHN) 데이터집합을 다운로드합니다.
EXERCISE: Download the Street View House Numbers (SVHN) data set.

CIFAR-10 교본을 나눠서 SVHN에서 입력 데이터로 바꾸십시오.
Fork the CIFAR-10 tutorial and swap in the SVHN as the input data.

예측 성능 개선을 위해 네트워크 구조를 적용해보십시오.
Try adapting the network architecture to improve predictive performance.

저작자표시

'TensorFlow' 카테고리의 다른 글

TensorFlow Mechanics 101 (2) (0)	2016.04.30
TensorFlow Mechanics 101 (1) (0)	2016.04.30
CNN (2) (0)	2016.04.30
CNN (1) (0)	2016.04.30
윈도우에 TensorFlow 설치하고 MNIST 다운로드 하기 (0)	2016.04.28

머신러닝 GPT

CNN (3)

'TensorFlow' 카테고리의 다른 글

댓글

티스토리툴바

CNN (3)

'TensorFlow' 카테고리의 다른 글

관련글

댓글

티스토리툴바