TFJob provides a Kubernetes custom resource that makes it easy to run distributed or non-distributed TensorFlow jobs on Kubernetes.
More on the Tensorflow Operator at https://github.com/kubeflow/tf-operator****
All you have to run is:
k3ai apply tensorflow-op
We present here a sample from Tensorflow Operator on https://github.com/kubeflow/tf-operator****
We first need to add a persistent volume and claim, to do so let's add the two YAML file we need, copy and paste each command in order.
kubectl apply -f - << EOFapiVersion: v1kind: PersistentVolumemetadata:name: tfevent-volumelabels:type: localapp: tfjobspec:capacity:storage: 10GistorageClassName: local-pathaccessModes:- ReadWriteOncehostPath:path: /tmp/dataEOF
now we add the PVC.
kubectl apply -f - << EOFapiVersion: v1kind: PersistentVolumeClaimmetadata:name: tfevent-volumenamespace: kubeflowlabels:type: localapp: tfjobspec:accessModes:- ReadWriteManyresources:requests:storage: 10GiEOF
Note: Because we are using local-path as storage volume and we are on a single node cluster we can't use ReadWriteMany as per Rancher local-path provisioner issue https://github.com/rancher/local-path-provisioner/issues/70#issuecomment-574390050__
Now we deploy the example
kubectl apply -f https://raw.githubusercontent.com/kubeflow/tf-operator/master/examples/v1/mnist_with_summaries/tf_job_mnist.yaml
You can observe the result of the example with
kubectl logs -l tf-job-name=mnist -n kubeflow --tail=-1
It should output something similar to this (we show just partially the output here)
...Adding run metadata for 799Accuracy at step 800: 0.957Accuracy at step 810: 0.9698Accuracy at step 820: 0.9676Accuracy at step 830: 0.9676Accuracy at step 840: 0.9677Accuracy at step 850: 0.9673Accuracy at step 860: 0.9676Accuracy at step 870: 0.9654Accuracy at step 880: 0.9694Accuracy at step 890: 0.9708Adding run metadata for 899Accuracy at step 900: 0.9737Accuracy at step 910: 0.9708Accuracy at step 920: 0.9721Accuracy at step 930: 0.972Accuracy at step 940: 0.9639Accuracy at step 950: 0.966Accuracy at step 960: 0.9654Accuracy at step 970: 0.9683Accuracy at step 980: 0.9685Accuracy at step 990: 0.9666Adding run metadata for 999