TensorFlow Mechanics 101 (3)

모델 훈련시키기 Train the Model

일단 그래프가 만들어지면, fully_connected_feed.py 안의 사용자 코드가 제어하는 루프 안에서 그 그래프를 반복적으로 훈련시키고 평가할 수 있습니다.
Once the graph is built, it can be iteratively trained and evaluated in a loop controlled by the user code in fully_connected_feed.py.

The Graph

run_training() 함수의 맨 위에는, 만들어진 모든 ops가 기본 전역 그래프 인스턴스인 tf.Graph에 연관되어 있음을 가리키는, python with command 하나가 있습니다.
At the top of the run_training() function is a python with command that indicates all of the built ops are to be associated with the default global tf.Graph instance.

with tf.Graph().as_default():

tf.Graph는 그룹으로 함께 실행될 수도 있는 ops를 모은 것입니다.
A tf.Graph is a collection of ops that may be executed together as a group.

TensorFlow를 사용하는 대부분의 경우, 하나의 기본 그래프를 사용하는 것으로 충분합니다.
Most TensorFlow uses will only need to rely on the single default graph.

여러 개의 그래프를 사용해야 하는 복잡한 경우는 이 강의의 범위를 벗어나는 내용 입니다.
More complicated uses with multiple graphs are possible, but beyond the scope of this simple tutorial.

세션 The Session

모든 구축 준비가 마무리되고 필요한 모든 ops가 생성되면, 그래프 가동을 위한 tf.Session이 생성됩니다.
Once all of the build preparation has been completed and all of the necessary ops generated, a tf.Session is created for running the graph.

sess = tf.Session()

아니면 범위 지정을 위한 with 블록으로 Session이 생성될 수도 있습니다.
Alternately, a Session may be generated into a with block for scoping:

with tf.Session() as sess:

세션에 대한 빈 매개변수는, 이 코드가 기본 로컬 세션에 결합될 것임을(또는 아직 생성되지 않았다면 생성될 것임을) 가리킵니다.
The empty parameter to session indicates that this code will attach to (or create if not yet created) the default local session.

세션을 만든 직후, initialization op 위에서 sess.run()을 호출함으로써 모든 tf.Variable 인스턴스 가 초기화됩니다.
Immediately after creating the session, all of the tf.Variable instances are initialized by calling sess.run() on their initialization op.

init = tf.initialize_all_variables()

sess.run(init)

sess.run() 메소드는 매개변수로 전달된 op(s)에 대응하는 그래프의 모든 부분집합을 가동 시킵니다.
The sess.run() method will run the complete subset of the graph that corresponds to the op(s) passed as parameters.

이 첫 번째 호출에서, init op는 변수들의 초기화자만이 포함된 tf.group입니다.
In this first call, the init op is a tf.group that contains only the initializers for the variables.

그래프의 나머지 부분은 여기서 실행되지 않습니다; 그 일은 아래의 훈련 루프에서 일어납니다.
None of the rest of the graph is run here; that happens in the training loop below.

Train Loop

세션으로 변수를 초기화 한 다음, 훈련이 시작될 수 있습니다.
After initializing the variables with the session, training may begin.

사용자 코드는 스텝 당 훈련을 제어하며, 다음은 유용한 훈련을 할 수 있는 가장 간단한 루프 입니다:
The user code controls the training per step, and the simplest loop that can do useful training is:

for step in xrange(max_steps):

sess.run(train_op)

하지만, 이 교본은 앞서 생성된 플레이스홀더와 매치시키기 위하여 각 스텝 별로 투입 데이터를 나누어야 한다는 측면에서 약간은 보다 복잡합니다.
However, this tutorial is slightly more complicated in that it must also slice up the input data for each step to match the previously generated placeholders.

그래프에 피드하기 Feed the Graph

각 단계마다, 코드는 그 단계에서 사용할 예제가 담긴 피드 사전(feed dictionary)을 생성시킵니다.
이 사전의 key는 각 예제에 대응하는 플레이스홀더 연산입니다.
For each step, the code will generate a feed dictionary that will contain the set of examples on which to train for the step, keyed by the placeholder ops they represent.

fill_feed_dict() 함수는, 다음 batch_size에 해당하는 이미지와 레이블을 주어진 DataSet에서 쿼리하고, 이 이미지와 레이블은 플레이스홀더에 매칭되는 텐서에 채워집니다.
In the fill_feed_dict() function, the given DataSet is queried for its next batch_size set of images and labels, and tensors matching the placeholders are filled containing the next images and labels.

images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size)

그러면 플레이스홀더를 key로 하고 해당 피드 텐서를 values로 하는 파이썬 딕셔너리 객체 하나가 생성됩니다.
A python dictionary object is then generated with the placeholders as keys and the representative feed tensors as values.

feed_dict = {

images_placeholder: images_feed,

labels_placeholder: labels_feed,

}

이 딕셔너리는 sess.run() 함수의 feed_dict 매개변수로 전달되어, 이 훈련 단계에 사용될 입력 예제가 됩니다.
This is passed into the sess.run() function's feed_dict parameter to provide the input examples for this step of training.

상태 확인 Check the Status

run call에서 가져오기 위하여 코드는 2개의 값([train_op, loss])을 지정합니다.
The code specifies two values to fetch in its run call: [train_op, loss].

for step in xrange(FLAGS.max_steps):

feed_dict = fill_feed_dict(data_sets.train,

images_placeholder,

labels_placeholder)

_, loss_value = sess.run([train_op, loss],

feed_dict=feed_dict)

가져올 값이 2개이기 때문에, sess.run()은 2개의 항목을 가진 튜플을 리턴합니다.
Because there are two values to fetch, sess.run() returns a tuple with two items.

가져올 값의 리스트에 있는 각 텐서는, 리턴된 튜플(이 훈련 단계 동안 해당 텐서의 값으로 채워진)의 numpy 배열에 해당합니다.
Each Tensor in the list of values to fetch corresponds to a numpy array in the returned tuple, filled with the value of that tensor during this step of training.

train_op은 출력값이 없는 op이므로, 리턴된 튜플에서의 상응한 엘리먼트는 None이 되고 그 값은 무시됩니다.
Since train_op is an Operation with no output value, the corresponding element in the returned tuple is None and, thus, discarded.

하지만, 훈련 중 모델이 발산할 경우, 손실 텐서의 값은 NaN이 될 수도 있으므로, 이 로깅 값을 저장합니다.
However, the value of the loss tensor may become NaN if the model diverges during training, so we capture this value for logging.

NaN 없이 훈련이 잘 가동된다면, 훈련 루프는 매 100 단계마다 상태를 설명하는 간단한 텍스트 를 출력하며, 사용자는 이를 통해 훈련 상태를 알게 됩니다.
Assuming that the training runs fine without NaNs, the training loop also prints a simple status text every 100 steps to let the user know the state of training.

if step % 100 == 0:

print 'Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration)

상태를 시각화하기 Visualize the Status

TensorBoard가 사용하는 이벤트 파일을 발생시키기 위해, 그래프 구축 단계 동안 모든 요약(이 경우 단 하나)을 하나의 op로 모읍니다.
In order to emit the events files used by TensorBoard, all of the summaries (in this case, only one) are collected into a single op during the graph building phase.

summary_op = tf.merge_all_summaries()

세션이 만들어진 후 tf.train.SummaryWriter는 이벤트 파일 작성을 위하여 인스턴스화 될 수 있는데, 이 인스턴스에는 그래프 자체와 요약값이 포함됩니다.
And then after the session is created, a tf.train.SummaryWriter may be instantiated to write the events files, which contain both the graph itself and the values of the summaries.

summary_writer = tf.train.SummaryWriter(FLAGS.train_dir, graph_def=sess.graph_def)

마지막으로, 이벤트 파일은 summary_op가 실행될 때마다 새로운 요약값으로 업데이트되고, 그 출력은 writer의 add_summary() 함수로 전달됩니다.
Lastly, the events file will be updated with new summary values every time the summary_op is run and the output passed to the writer's add_summary() function.

summary_str = sess.run(summary_op, feed_dict=feed_dict)

summary_writer.add_summary(summary_str, step)

이벤트 파일이 작성될 때, TensorBoard는 훈련 폴더에 접근해서 요약된 값을 시각화 해서 보여줄 수 있습니다.
When the events files are written, TensorBoard may be run against the training folder to display the values from the summaries.

MNIST TensorBoard

TensorBoard를 만들고 실행하는 더 자세한 방법은 Tensorboard: Visualising Your Training을 참고하세요.
NOTE: For more info about how to build and run Tensorboard, please see the accompanying tutorial Tensorboard: Visualizing Your Training.

체크포인트 저장하기 Save a Checkpoint

추가 훈련이나 평가를 위해 추후에 모델을 복원할 때 사용될 수도 있는 체크포인트 파일을 발생시키기 위하여, tf.train.Saver를 인스턴스 화 합니다.
In order to emit a checkpoint file that may be used to later restore a model for further training or evaluation, we instantiate a tf.train.Saver.

saver = tf.train.Saver()

현재 훈련 가능한 모든 값을 훈련 디렉토리의 체크포인트 파일에 쓰기 위해, 훈련을 반복하는 동안 saver.save() 메소드가 주기적으로 호출됩니다.
In the training loop, the saver.save() method will periodically be called to write a checkpoint file to the training directory with the current values of all the trainable variables.

saver.save(sess, FLAGS.train_dir, global_step=step)

향후 일정 시점에는 모델 매개변수를 다시 로드하기 위하여 saver.restore() 메소드를 이용하면 훈련을 다시 할 수 있습니다.
At some later point in the future, training might be resumed by using the saver.restore() method to reload the model parameters.

saver.restore(sess, FLAGS.train_dir)

모델 평가하기 Evaluate the Model

매 천 번의 단계마다 코드는 훈련 및 테스트 데이터집합을 이용해서 모델을 평가하게 됩니다.
Every thousand steps, the code will attempt to evaluate the model against both the training and test datasets.

훈련, 평가, 테스트 데이터에 각각에 대해 do_eval() 함수가 세 번 호출됩니다.
Thedo_eval() function is called thrice, for the training, validation, and test datasets.

print 'Training Data Eval:'

do_eval(sess,

eval_correct,

images_placeholder,

labels_placeholder,

data_sets.train)

print 'Validation Data Eval:'

do_eval(sess,

eval_correct,

images_placeholder,

labels_placeholder,

data_sets.validation)

print 'Test Data Eval:'

do_eval(sess,

eval_correct,

images_placeholder,

labels_placeholder,

data_sets.test)

좀더 복잡한 방법으로, data_sets.test를 격리해서 충분한 양의 하이퍼매개변수(hyperparameter) 조율 후에만 평가할 수도 있습니다.
Note that more complicated usage would usually sequester the data_sets.test to only be checked after significant amounts of hyperparameter tuning.

하지만 단순하고 작은 MNIST 문제를 위하여 모든 데이터를 평가하도록 하겠습니다.
For the sake of a simple little MNIST problem, however, we evaluate against all of the data.

평가 그래프 구축 Build the Eval Graph

기본 그래프를 열기 전에, 테스트 데이터를 반환하도록 인자를 주어 get_data(train=False) 함수를 호출해서 테스트 데이터를 가져와야 합니다.
Before opening the default Graph, the test data should have been fetched by calling theget_data(train=False) function with the parameter set to grab the test dataset.

test_all_images, test_all_labels = get_data(train=False)

훈련 순환부에 들어가기 전에 mnist.py의 evaluation() 함수를 호출해서(loss() 함수와 같은 로짓/라벨 인자를 줍니다) Eval 연산을 구성해야 합니다.
Before entering the training loop, the Eval op should have been built by calling the evaluation() function frommnist.py with the same logits/labels parameters as the loss() function.

eval_correct = mnist.evaluation(logits, labels_placeholder)

evaluation() 함수는, K 최대 가능 예측(K most-likely predictions)으로 참 레이블을 찾을 수 있다면, 각 모델 출력이 정확한 지를 자동으로 채점할 수 있는, tf.nn.in_top_k op를 생성시키는 역할 만 을 합니다.
The evaluation() function simply generates a tf.nn.in_top_k op that can automatically score each model output as correct if the true label can be found in the K most-likely predictions.

이 경우, 예측이 참 레이블인 경우에만 예측이 정확하다고 여기기 위해 K 값을 1로 설정합니다.
In this case, we set the value of K to 1 to only consider a prediction correct if it is for the true label.

eval_correct = tf.nn.in_top_k(logits, labels, 1)

Eval Output 출력 평가

이제 feed_dict을 채운 뒤 eval_correct 연산과 함께 sess.run()에 넘겨, 주어진 데이터집합으로 모델을 평가하기 위한 순환부를 만들 수 있습니다.
One can then create a loop for filling a feed_dict and calling sess.run() against the eval_correct op to evaluate the model on the given dataset.

for step in xrange(steps_per_epoch):

feed_dict = fill_feed_dict(data_set,

images_placeholder,

labels_placeholder)

true_count += sess.run(eval_correct, feed_dict=feed_dict)

true_count 변수는 in_top_k 연산이 올바르다고 판단한 모든 예측의 개수를 세는 역할만 합니다.
The true_count variable simply accumulates all of the predictions that the in_top_k op has determined to be correct.

이 값을 총 예제 숫자로 나누어 정확도를 계산할 수 있습니다.
From there, the precision may be calculated from simply dividing by the total number of examples.

precision = float(true_count) / float(num_examples)

print ' Num examples: %d Num correct: %d Precision @ 1: %0.02f' % (

num_examples, true_count, precision)

저작자표시 (새창열림)

'TensorFlow' 카테고리의 다른 글

TensorFlow Mechanics 101 (2) (0)	2016.04.30
TensorFlow Mechanics 101 (1) (0)	2016.04.30
CNN (3) (0)	2016.04.30
CNN (2) (0)	2016.04.30
CNN (1) (0)	2016.04.30

머신러닝 GPT

TensorFlow Mechanics 101 (3)

'TensorFlow' 카테고리의 다른 글

댓글

티스토리툴바

TensorFlow Mechanics 101 (3)

'TensorFlow' 카테고리의 다른 글

관련글

댓글

티스토리툴바