CNN (2)

모델 훈련 Model Training

N-방식 분류 수행을 위한 일반적인 네트워크 훈련방법은 소프트맥스 회귀로 알려진 다항 로지스틱 회귀(multinomial logistic regression) 입니다.
The usual method for training a network to perform N-way classification is multinomial logistic regression, aka. softmax regression.

소프트맥스 회귀는 네트워크 출력에 softmax 비선형을 적용하여, 정규화된 예측과 레이블의 one-핫 인코딩(1-hot encoding) 간의 교차-엔트로피(cross-entropy)를 계산 합니다.
Softmax regression applies a softmax nonlinearity to the output of the network and calculates the cross-entropy between the normalized predictions and a 1-hot encoding of the label.

정규화에서, 또한 모든 학습된 변수에 대한 일반적인 가중치 부패 손실(weight decay losses)을 적용합니다.
For regularization, we also apply the usual weight decay losses to all learned variables.

loss() 함수가 반환하는 모델의 목적함수는 교차 엔트로피 손실과 모든 가중치 부패 항의 합 입니다.
The objective function for the model is the sum of the cross entropy loss and all these weight decay terms, as returned by the loss() function.

scalar_summary을 TensorBoard에 시각화합니다:
We visualize it in TensorBoard with a scalar_summary:

CIFAR-10 Loss

시간이 지남에 따라 기하급수적으로 소멸되는 학습속도를 가진 표준 내리막기울기 (gradient descent) 알고리즘을 (다른 방법에 대한 훈련 참조)를 사용하여 모델을 훈련 시킵니다.
We train the model using standard gradient descent algorithm (see Training for other methods) with a learning rate that exponentially decays over time.

CIFAR-10 Learning Rate Decay

train() 함수는 기울기를 산출하고, 학습 변수를 갱신함으로써 목적을 최소화하는데 필요한 op를 추가합니다(자세한 내리막기울기최적화(GradientDescentOptimizer) 참조).
The train() function adds the operations needed to minimize the objective by calculating the gradient and updating the learned variables (see GradientDescentOptimizer for details).

train() 함수는 한 뱃치의 이미지 모델을 훈련하고 업데이트하는데 필요한 모든 계산을 실행하는 op를 리턴합니다.
It returns an operation that executes all the calculations needed to train and update the model for one batch of images.

모델 가동하기와 훈련시키기 Launching and Training the Model

모델을 구축하였으니 모델을 가동시킨 다음 스크립트 cifar10_train.py를 갖고 훈련 op를 가동시킵니다.
We have built the model, let's now launch it and run the training operation with the script cifar10_train.py.

python cifar10_train.py

참고: 먼저 CIFAR-10 교본의 모든 대상을 실행하면, CIFAR-10데이터집합이 자동으로 다운로드 됩니다.
NOTE: The first time you run any target in the CIFAR-10 tutorial, the CIFAR-10 dataset is automatically downloaded.

데이터집합이 ~ 160MB이기 때문에 첫 번째 실행은 빨리 완료됩니다.
The data set is ~160MB so you may want to grab a quick cup of coffee for your first run.

출력이 다음과 같아야 합니다: You should see the output:

Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.

2015-11-04 11:45:45.927302: step 0, loss = 4.68 (2.0 examples/sec; 64.221 sec/batch)

2015-11-04 11:45:49.133065: step 10, loss = 4.66 (533.8 examples/sec; 0.240 sec/batch)

2015-11-04 11:45:51.397710: step 20, loss = 4.64 (597.4 examples/sec; 0.214 sec/batch)

2015-11-04 11:45:54.446850: step 30, loss = 4.62 (391.0 examples/sec; 0.327 sec/batch)

2015-11-04 11:45:57.152676: step 40, loss = 4.61 (430.2 examples/sec; 0.298 sec/batch)

2015-11-04 11:46:00.437717: step 50, loss = 4.59 (406.4 examples/sec; 0.315 sec/batch)

...

스크립트는 10단계 마다 전체 손실 뿐만 아니라 데이터의 마지막 뱃치의 처리속도도 보고 합니다. 몇 가지 코멘트:
The script reports the total loss every 10 steps as well the speed at which the last batch of data was processed. A few comments:

• 사전처리 스레드(threads)가 처리된 2만개의 CIFAR이미지로 셔플링 대기열을 채우기 때문에, 데이터의 첫째 뱃치는 너무 느릴 수 있습니다(예, 몇 분).
The first batch of data can be inordinately slow (e.g. several minutes) as the preprocessing threads fill up the shuffling queue with 20,000 processed CIFAR images.

• 보고된 손실은 최근 뱃치의 평균 손실입니다. 이 손실은 교차 엔트로피와 모든 가중치 부패 항의 합 임을 기억하십시오.
The reported loss is the average loss of the most recent batch. Remember that this loss is the sum of the cross entropy and all weight decay terms.

• 뱃치의 처리속도를 주목하십시오. 위의 숫자는 테슬라 K40c에서 얻은 것입니다. CPU 에서 실행하면, 성능이 더 느릴 것으로 예상됩니다.
Keep an eye on the processing speed of a batch. The numbers shown above were obtained on a Tesla K40c. If you are running on a CPU, expect slower performance.

연습: 실험을 할 때, 첫째 훈련단계가 너무 오래 걸려서 때로는 신경이 쓰입니다. 시작할 때 큐를 채우는 이미지의 수를 줄여야 합니다.
EXERCISE: When experimenting, it is sometimes annoying that the first training step can take so long. Try decreasing the number of images that initially fill up the queue.

cifar 10.py에서 NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN을 찾으십시오.
Search for NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN in cifar10.py.

cifar10_train.py 는 모든 모델 매개변수를 checkpoint 파일 안에 주기적으로 저장 (saves) 하지만 모델을 평가하지는 않습니다.
cifar10_train.py periodically saves all model parameters in checkpoint files but it does not evaluate the model.

cifar10_eval.py은 예측 성능을 측정하기 위하여 checkpoint 파일을 사용합니다(아래 Evaluating a Model 참고).

The checkpoint file will be used by cifar10_eval.py to measure the predictive performance (see Evaluating a Model below).

앞의 스텝을 따랐다면 이제 CIFAR-10 model 훈련을 시작할 수 있습니다.
If you followed the previous steps, then you have now started training a CIFAR-10 model.

Congratulations!

cifar10_train.py에서 반환된 터미널 텍스트가 모델 훈련방법에 대한 최소의 통찰력을 제공합니다. 훈련 동안 모델에 대한 그 이상의 통찰력이 필요합니다:
The terminal text returned from cifar10_train.py provides minimal insight into how the model is training. We want more insight into the model during training:

· 손실이 실제로 감소하는지 또는 단순한 노이즈인지?
Is the loss really decreasing or is that just noise?

· 모델이 적절한 이미지를 제공받고 있는지?
Is the model being provided appropriate images?

· 기울기, 활성화 및 가중치가 합리적 인지?
Are the gradients, activations and weights reasonable?

· 현재 학습속도는? What is the learning rate currently at?

TensorBoard 는, SummaryWriter를 통하여 cifar10_train.py로부터 주기적으로 나타나는 데이터를 표시함으로써, 위의 기능을 제공합니다
TensorBoard provides this functionality, displaying data exported periodically from cifar10_train.py via a SummaryWriter.

예를 들어, 훈련 동안 활성화의 분포 및 local3 특성에서의 sparsity 정도가 어떻게 전개 되는지를 볼 수 있습니다:
For instance, we can watch how the distribution of activations and degree of sparsity in local3 features evolve during training:

총 손실 뿐 아니라 개별 손실함수도 시간경과에 따라 추적하는 것이 특히 흥미롭습니다.
Individual loss functions, as well as the total loss, are particularly interesting to track over time.

하지만, 훈련에 사용되는 작은 뱃치 크기로 인한 상당한 노이즈 양을 손실이 보여주고 있습니다.
However, the loss exhibits a considerable amount of noise due to the small batch size employed by training.

실제로, 그 원시값에 더하여 그 이동평균을 시각화하는 것이 매우 유용합니다.
In practice we find it extremely useful to visualize their moving averages in addition to their raw values.

이 목적을 위해 스크립트의 ExponentialMovingAverage 사용 방법을 참조하십시오.
See how the scripts use ExponentialMovingAverage for this purpose.

저작자표시

'TensorFlow' 카테고리의 다른 글

TensorFlow Mechanics 101 (1) (0)	2016.04.30
CNN (3) (0)	2016.04.30
CNN (1) (0)	2016.04.30
윈도우에 TensorFlow 설치하고 MNIST 다운로드 하기 (0)	2016.04.28
Deep MNIST (3) (0)	2016.04.27

머신러닝 GPT

CNN (2)

'TensorFlow' 카테고리의 다른 글

댓글

티스토리툴바

CNN (2)

'TensorFlow' 카테고리의 다른 글

관련글

댓글

티스토리툴바