Run a tfx pipeline using kubeflow pipeline

zhenxia-jiuyou / 2024-02-02 / 原文

1. what is kubeflow pipeline for tfx pipeline ?

kubeflow pipeline is an ochetrator of tfx pipeline, which runs on a kubernetes cluster.

LocalDagRuner is an orchetrator of tfx pipeline, which runs local.

# run a tfx pipeline usging LocalGagRunner
tfx.orchestration.LocalDagRunner().run(

_create_schema_pipeline(
pipeline_name=SCHEMA_PIPELINE_NAME,
pipeline_root=SCHEMA_PIPELINE_ROOT,
data_root=DATA_ROOT,
schema_path=SCHEMA_PATH,
metadata_path=SCHEMA_METADATA_PATH,
module_file=_trainer_module_file,
serving_model_dir=SERVING_MODEL_DIR,
)
)


# run a tfx pipeline using KubeflowDagRunner
tfx.orchestration.experimental.KubeflowDagRunner().run(
_create_schema_pipeline(
pipeline_name=SCHEMA_PIPELINE_NAME,
pipeline_root=SCHEMA_PIPELINE_ROOT,
data_root=DATA_ROOT,
schema_path=SCHEMA_PATH,
metadata_path=SCHEMA_METADATA_PATH,
module_file=_trainer_module_file,
serving_model_dir=SERVING_MODEL_DIR,
)
)

2. steps of running a tfx pipeline using kubeflow pipeline

2.1 generate file pipeline.yaml (namely definition file of kubeflow pipeline):

tfx.orchestration.experimental.KubeflowDagRunner().run(
_create_schema_pipeline(
pipeline_name=SCHEMA_PIPELINE_NAME,
pipeline_root=SCHEMA_PIPELINE_ROOT,
data_root=DATA_ROOT,
schema_path=SCHEMA_PATH,
metadata_path=SCHEMA_METADATA_PATH,
module_file=_trainer_module_file,
serving_model_dir=SERVING_MODEL_DIR,
)
)

2.2 change image registry in file pipline.yaml, due to that gcr.io is not accessible in china.

# in file pipeline.yaml

# raw image generated by tfx.orchestration.experimental.KubeflowDagRunner().run(),
# it equals to hub.docker.com/tensorflow/tfx:1.14.0, which is not accessible in china.
#image: tensorflow/tfx:1.14.0

# replacement image
# docker.nju.edu.cn has not tfx:1.14.0 temporally, si3nce it's the latest version, 
# and docker.nju.edu.cn has not pulled it yet,
# so use tfx:1.13.0.
image: docker.nju.edu.cn/tensorflow/tfx:1.13.0
imagePullPolicy: Never

Attention:

  1. The size of image tensorflow/tfx:1.13.0 is about 30G, and its blobs (namely gzip) is about 9G, it's better to pull (namely download) it before hand. imagePullPolicy: Never means never to pull image when running the container, other options are Always, IfNotPresent .
  2. imagePullPolicy: Never needs the image exits on each node which is possible to assign the pod to, or will raise error:
    'Warning ErrImageNeverPull 2m36s (x10 over 4m16s) kubelet Container image "docker.nju.edu.cn/tensorflow/tfx:1.13.0" is not present with pull policy of Never',
    when scheduling the pod (namely assigning the pod to one node in the kubernetes cluster).
    Or, setting nodeAffinity to the node who has the image for the pod:
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - maye-inspiron-5547
  1. containerd has namespaces for images, the default namespace is "default", the namespace of images pulled by containers in a kubernetes cluster is "k8s.io". If needed image not in namespace "k8s.io", kubernetes can not see it.

crictl is container runtime interface cli of kubernetes,
crictl镜像的namespace就一个,k8s.io。因此也是默认拉取镜像的namespace。
如果通过ctr拉取镜像时如果不指定放在k8s.io空间下,crictl是无法读取到本地的该镜像的。
ctr是containerd自带的命令行工具。一共有三个命名空间default,k8s.io 和moby。默认default。
nerdctl is docker-compatible cli of containerd.

ctr image ls : list images in namespace "default" .

拉取镜像到k8s.io命名空间:

nerdctl pull nginx:latest --namespace k8s.io

查看k8s.io下的镜像:

sudo nerdctl images --namespace k8s.io

Attention:

nerdctl image list --namespace k8s.io  

No image shown

This is due to using rootless k8s nerdctl, acessing image namespace k8s.io needs root access right.

copy an image from one namespace to another namespace:

ctr -n default image export my-image.tar my-image 
ctr -n k8s.io image import my-image.tar

# or,
nerdctl save my-image.tar my-image --namespace default
nerdctl load my-image.tar --namespace k8s.io

2.3 mount pathes which need to be accessed by all components of the tfx pipeline, due to that one component one pod, and each container has standalone file system, including pipeline_root, input_base of ExampleGen, module_path of Transform and Trainer , artifact_uri of SchemaImpoter .

# in file pipeline.yaml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: detect-anomolies-on-wafer-tfdv-schema-
  
  #name: detect-anomolies-on-wafer-tfdv-schema-maye
  
  annotations: {pipelines.kubeflow.org/kfp_sdk_version: 1.8.0, pipelines.kubeflow.org/pipeline_compilation_time: '2024-01-07T22:16:36.438482',
    pipelines.kubeflow.org/pipeline_spec: '{"description": "Constructs a Kubeflow
      pipeline.", "inputs": [{"default": "pipelines/detect_anomolies_on_wafer_tfdv_schema",
      "name": "pipeline-root"}], "name": "detect_anomolies_on_wafer_tfdv_schema"}'}
  labels: {pipelines.kubeflow.org/kfp_sdk_version: 1.8.0}
spec:
  entrypoint: detect-anomolies-on-wafer-tfdv-schema
  
  nodeAffinity:           #### putting nodeAffinity in workflow spec means 
    required:             #### all component of the workflow use this nodeAffinity.   
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - maye-inspiron-5547
  
  volumes:
  - name: wafer-data              #### define volume for input_base
    hostPath: 
      path: /home/maye/trainEvalData
      type: Directory
     
  - name: transform-module        #### define volume for module_path of Transform
    hostPath:
      path: /home/maye/maye_temp/tfx_user_code_Transform-0.0+35148d2579a5a421da4bda3bd371de44bf8888bb4ea4f5cc424f859c6e4db9db-py3-none-any.whl
      type: File
            
  - name: trainer-module         #### define volume for module_path of Trainer
    hostPath: 
      path: /home/maye/maye_temp/tfx_user_code_Trainer-0.0+35148d2579a5a421da4bda3bd371de44bf8888bb4ea4f5cc424f859c6e4db9db-py3-none-any.whl
      type: File    
  
  - name: schema-path           #### define volume for artifact_uri of SchemaImporter
    hostPath:
      path: /home/maye/maye_temp/detect_anomalies_in_wafer_schema
      type: Directory  
      
      
  - name: tfx-pv               #### define volume for pipeline-root
    persistentVolumeClaim:
          claimName: tfx-pv-claim
      
  
  templates:
  - name: detect-anomolies-on-wafer-tfdv-schema
    inputs:
      parameters:    
      - {name: pipeline-root}    #### reference argument pipeline-root of workflow
      
    dag:
      tasks:
      - name: importexamplegen
        template: importexamplegen
        arguments:
          parameters:
          - {name: pipeline-root, value: '{{inputs.parameters.pipeline-root}}'}  #### real argument for
                                                                                 #### for pipeline-root
        volumes:                                                                    
        - name: wafer-data     #### reference volume for input_base
        - name: tfx-pv         #### reference volume for pipeline_root
      - name: pusher
        template: pusher
        dependencies: [trainer]
        arguments:
          parameters:
          - {name: pipeline-root, value: '{{inputs.parameters.pipeline-root}}'}
          
        volumes:
        - name: tfx-pv  
          
      - name: schema-importer
        template: schema-importer
        arguments:
          parameters:
          - {name: pipeline-root, value: '{{inputs.parameters.pipeline-root}}'}
          
        volumes:
        - name: tfx-pv  
        - name: schema-path
        
          
      - name: statisticsgen
        template: statisticsgen
        dependencies: [importexamplegen]
        arguments:
          parameters:
          - {name: pipeline-root, value: '{{inputs.parameters.pipeline-root}}'}    
        volumes:
        - name: tfx-pv
           
      - name: trainer
        template: trainer
        dependencies: [importexamplegen, transform]
        arguments:
          parameters:
          - {name: pipeline-root, value: '{{inputs.parameters.pipeline-root}}'}            
        volumes:
        - name: trainer-module
        - name: tfx-pv        
          
      - name: transform
        template: transform
        dependencies: [importexamplegen, schema-importer]
        arguments:
          parameters:
          - {name: pipeline-root, value: '{{inputs.parameters.pipeline-root}}'}    
        volumes:
        - name: transform-module   
        - name: tfx-pv
        - name: schema-path
          
   - name: importexamplegen
    container:
      args:
      - --pipeline_root           #### pass pipeline_root as argument of container shell cmd
      - '{{inputs.parameters.pipeline-root}}'
      - --kubeflow_metadata_config
      - |-
        {
          "grpc_config": {
            "grpc_service_host": {
              "environment_variable": "METADATA_GRPC_SERVICE_HOST"
            },
            "grpc_service_port": {
              "environment_variable": "METADATA_GRPC_SERVICE_PORT"
            }
          }
        }
      - --node_id
      - ImportExampleGen
      - --tfx_ir
      - |-
        {
          "pipelineInfo": {
            "id": "detect_anomolies_on_wafer_tfdv_schema"
          },
          "nodes": [
            {
              "pipelineNode": {
                "nodeInfo": {
                  "type": {
                    "name": "tfx.components.example_gen.import_example_gen.component.ImportExampleGen"
                  },
                  "id": "ImportExampleGen"
                },
                "contexts": {
                  "contexts": [
                    {
                      "type": {
                        "name": "pipeline"
                      },
                      "name": {
                        "fieldValue": {
                          "stringValue": "detect_anomolies_on_wafer_tfdv_schema"
                        }
                      }
                    },
                    {
                      "type": {
                        "name": "pipeline_run"
                      },
                      "name": {
                        "runtimeParameter": {
                          "name": "pipeline-run-id",
                          "type": "STRING"
                        }
                      }
                    },
                    {
                      "type": {
                        "name": "node"
                      },
                      "name": {
                        "fieldValue": {
                          "stringValue": "detect_anomolies_on_wafer_tfdv_schema.ImportExampleGen"
                        }
                      }
                    }
                  ]
                },
                "outputs": {
                  "outputs": {
                    "examples": {
                      "artifactSpec": {
                        "type": {
                          "name": "Examples",
                          "properties": {
                            "split_names": "STRING",
                            "span": "INT",
                            "version": "INT"
                          },
                          "baseType": "DATASET"
                        }
                      }
                    }
                  }
                },
                "parameters": {
                  "parameters": {
                    "output_config": {
                      "fieldValue": {
                        "stringValue": "{}"
                      }
                    },
                    "input_config": {
                      "fieldValue": {
                        "stringValue": "{\n  \"splits\": [\n    {\n      \"name\": \"train\",\n      \"pattern\": \"train\"\n    },\n    {\n      \"name\": \"eval\",\n      \"pattern\": \"eval\"\n    }\n  ]\n}"
                      }
                    },
                    "output_data_format": {
                      "fieldValue": {
                        "intValue": "6"
                      }
                    },
                    "input_base": {  ## [^note]
                      "fieldValue": {  #### the value is mounted path of
                        "stringValue": "/maye/trainEvalData"   ## volume input_base.
                      }                                         
                    },
                    "output_file_format": {
                      "fieldValue": {
                        "intValue": "5"
                      }
                    }
                  }
                },
                "downstreamNodes": [
                  "StatisticsGen",
                  "Trainer",
                  "Transform"
                ],
                "executionOptions": {
                  "cachingOptions": {}
                }
              }
            }
          ],
          "runtimeSpec": {
            "pipelineRoot": {
              "runtimeParameter": {
                "name": "pipeline-root",    #### parameter pipeline-root of ImportExampleGne
                "type": "STRING",           #### its value is passed via container shell cmd argument 
                "defaultValue": {         #### pipeline-root, is mounted path of volume pipeline_root.
                  "stringValue": "pipelines/detect_anomolies_on_wafer_tfdv_schema"
                }
              }
            },
            "pipelineRunId": {
              "runtimeParameter": {
                "name": "pipeline-run-id",
                "type": "STRING"
              }
            }
          },
          "executionMode": "SYNC",
          "deploymentConfig": {
            "@type": "type.googleapis.com/tfx.orchestration.IntermediateDeploymentConfig",
            "executorSpecs": {
              "ImportExampleGen": {
                "@type": "type.googleapis.com/tfx.orchestration.executable_spec.BeamExecutableSpec",
                "pythonExecutorSpec": {
                  "classPath": "tfx.components.example_gen.import_example_gen.executor.Executor"
                }
              }
            },
            "customDriverSpecs": {
              "ImportExampleGen": {
                "@type": "type.googleapis.com/tfx.orchestration.executable_spec.PythonClassExecutableSpec",
                "classPath": "tfx.components.example_gen.driver.FileBasedDriver"
              }
            },
            "metadataConnectionConfig": {
              "@type": "type.googleapis.com/ml_metadata.ConnectionConfig",
              "sqlite": {
                "filenameUri": "metadata/detect_anomolies_on_wafer_tfdv_schema/metadata.db",
                "connectionMode": "READWRITE_OPENCREATE"
              }
            }
          }
        }
      - --metadata_ui_path
      - /mlpipeline-ui-metadata.json
      - --runtime_parameter
      - pipeline-root=STRING:{{inputs.parameters.pipeline-root}}
      command: [python, -m, tfx.orchestration.kubeflow.container_entrypoint]
      ...
      image: docker.nju.edu.cn/tensorflow/tfx:1.13.0
      imagePullPolicy: Never
      volumeMounts: 
      - mountPath: /maye/trainEvalData     #### mount volume for input_path
        name: wafer-data   
      - mountPath: /tfx/tfx_pv             #### mount volume for pipeline_root
        name: tfx-pv

    inputs:
      parameters:
      - {name: pipeline-root}     #### formal parameter of the container
    outputs:
      artifacts:
      - {name: mlpipeline-ui-metadata, path: /mlpipeline-ui-metadata.json}
     ...

  - name: pusher
    container:
      ...
      image: docker.nju.edu.cn/tensorflow/tfx:1.13.0
      imagePullPolicy: Never
      volumeMounts: 
      - mountPath: /tfx/tfx_pv
        name: tfx-pv      
      ...

  - name: schema-importer
    container:
      ...
      image: docker.nju.edu.cn/tensorflow/tfx:1.13.0
      imagePullPolicy: Never  
      volumeMounts: 
      - mountPath: /tfx/tfx_pv
        name: tfx-pv
        
      - mountPath: /tfx/pipelines/detect_anomalies_in_wafer_schema
        name: schema-path
        readOnly: True
      ...

  - name: statisticsgen
    container:
      ...
      image: docker.nju.edu.cn/tensorflow/tfx:1.13.0
      imagePullPolicy: Never      
      volumeMounts: 
      - mountPath: /tfx/tfx_pv
        name: tfx-pv
    ...
  - name: trainer
    container:
      ..

      image: docker.nju.edu.cn/tensorflow/tfx:1.13.0
      imagePullPolicy: Never     
      volumeMounts: 
      - mountPath: /tfx/pipelines/tfx_user_code_Trainer-0.0+35148d2579a5a421da4bda3bd371de44bf8888bb4ea4f5cc424f859c6e4db9db-py3-none-any.whl
        name: trainer-module
        readOnly: True  
      - mountPath: /tfx/tfx_pv
        name: tfx-pv
      ...

  - name: transform
    container:
      ...

      image: docker.nju.edu.cn/tensorflow/tfx:1.13.0
      imagePullPolicy: Never 
      volumeMounts:
      - mountPath: /tfx/pipelines/tfx_user_code_Transform-0.0+35148d2579a5a421da4bda3bd371de44bf8888bb4ea4f5cc424f859c6e4db9db-py3-none-any.whl
        name: transform-module
        readOnly: True     
      - mountPath: /tfx/pipelines/detect_anomalies_in_wafer_schema
        name: schema-path
        readOnly: True     
      - mountPath: /tfx/tfx_pv
        name: tfx-pv
      ...

  arguments:
    parameters:     #### real argument pipeline-root of workflow
    - {name: pipeline-root, value: /tfx/tfx_pv/pipelines/detect_anomolies_on_wafer_tfdv_schema}
    
  serviceAccountName: pipeline-runner
  

[*note]: parameter input_base of ImportExampleGen

3. Error & Solution

[ERROR: Failed to pull image]

(base) maye@maye-Inspiron-5547:~$ kubectl describe pod detect-anomolies-on-wafer-tfdv-schema-ldvtw-1952722848 -n kubeflow
Events:
Type Reason Age From Message


Warning Failed 52m (x4 over 92m) kubelet Error: ImagePullBackOff
Warning Failed 13m (x9 over 92m) kubelet Error: ErrImagePull
Warning Failed 4m42s (x9 over 92m) kubelet Failed to pull image "tensorflow/tfx:1.14.0": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/tensorflow/tfx:1.14.0": failed to copy: httpReadSeeker: failed open: unexpected status code https://pft7f97f.mirror.aliyuncs.com/v2/tensorflow/tfx/blobs/sha256:f2cce533751060f702397991bc7f0acf6d691c898fe1c7cc25b3ece25a409879?ns=docker.io: 500 Internal Server Error - Server message: unknown: unknown error
Normal BackOff 4m17s (x13 over 92m) kubelet Back-off pulling image "tensorflow/tfx:1.14.0"
(base) maye@maye-Inspiron-5547:~$

[Solution]
This is due to that docker.io is not accessible in china, replace it with its mirror website, such as: docker.nju.edu.cn , in file pipeline.yaml, namely replace "tensorflow/tfx:1.14.0" to "docker.nju.edu.cn/tensorflow/tfx:1.14.0" .

4. complete pipeline.yaml

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: detect-anomolies-on-wafer-tfdv-schema-
  
  #name: detect-anomolies-on-wafer-tfdv-schema-maye
  
  annotations: {pipelines.kubeflow.org/kfp_sdk_version: 1.8.0, pipelines.kubeflow.org/pipeline_compilation_time: '2024-01-07T22:16:36.438482',
    pipelines.kubeflow.org/pipeline_spec: '{"description": "Constructs a Kubeflow
      pipeline.", "inputs": [{"default": "pipelines/detect_anomolies_on_wafer_tfdv_schema",
      "name": "pipeline-root"}], "name": "detect_anomolies_on_wafer_tfdv_schema"}'}
  labels: {pipelines.kubeflow.org/kfp_sdk_version: 1.8.0}
spec:
  entrypoint: detect-anomolies-on-wafer-tfdv-schema
  
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - maye-inspiron-5547
  
   
  
  volumes:
  - name: wafer-data
    hostPath: 
      path: /home/maye/trainEvalData
      type: Directory
     
  - name: transform-module
    hostPath:
      path: /home/maye/maye_temp/tfx_user_code_Transform-0.0+35148d2579a5a421da4bda3bd371de44bf8888bb4ea4f5cc424f859c6e4db9db-py3-none-any.whl
      type: File
      
      
  - name: trainer-module
    hostPath: 
      path: /home/maye/maye_temp/tfx_user_code_Trainer-0.0+35148d2579a5a421da4bda3bd371de44bf8888bb4ea4f5cc424f859c6e4db9db-py3-none-any.whl
      type: File
      
  
  - name: schema-path
    hostPath:
      path: /home/maye/maye_temp/detect_anomalies_in_wafer_schema
      type: Directory  
      
      
  - name: tfx-pv
    persistentVolumeClaim:
          claimName: tfx-pv-claim
      
  
  
  templates:
  - name: detect-anomolies-on-wafer-tfdv-schema
    inputs:
      parameters:
      - {name: pipeline-root}
          
      
    dag:
      tasks:
      - name: importexamplegen
        template: importexamplegen
        arguments:
          parameters:
          - {name: pipeline-root, value: '{{inputs.parameters.pipeline-root}}'}
          
          
        volumes:
        - name: wafer-data     
        - name: tfx-pv  
          
      - name: pusher
        template: pusher
        dependencies: [trainer]
        arguments:
          parameters:
          - {name: pipeline-root, value: '{{inputs.parameters.pipeline-root}}'}
          
        volumes:
        - name: tfx-pv  
          
      - name: schema-importer
        template: schema-importer
        arguments:
          parameters:
          - {name: pipeline-root, value: '{{inputs.parameters.pipeline-root}}'}
          
        volumes:
        - name: tfx-pv  
        - name: schema-path
        
          
      - name: statisticsgen
        template: statisticsgen
        dependencies: [importexamplegen]
        arguments:
          parameters:
          - {name: pipeline-root, value: '{{inputs.parameters.pipeline-root}}'}
          
          #artifacts:
          #- {name: import_example_gen_outputs, from: "{{tasks.importexamplegen.outputs.artifacts.import_example_gen_outputs}}"}
    
        
        volumes:
        - name: tfx-pv
          
        
    
      - name: trainer
        template: trainer
        dependencies: [importexamplegen, transform]
        arguments:
          parameters:
          - {name: pipeline-root, value: '{{inputs.parameters.pipeline-root}}'}
          
       
        volumes:
        - name: trainer-module
        - name: tfx-pv  
       
          
      - name: transform
        template: transform
        dependencies: [importexamplegen, schema-importer]
        arguments:
          parameters:
          - {name: pipeline-root, value: '{{inputs.parameters.pipeline-root}}'}
          
        volumes:
        - name: transform-module   
        - name: tfx-pv
        - name: schema-path
          
          
  - name: importexamplegen
    container:
      args:
      - --pipeline_root
      - '{{inputs.parameters.pipeline-root}}'
      - --kubeflow_metadata_config
      - |-
        {
          "grpc_config": {
            "grpc_service_host": {
              "environment_variable": "METADATA_GRPC_SERVICE_HOST"
            },
            "grpc_service_port": {
              "environment_variable": "METADATA_GRPC_SERVICE_PORT"
            }
          }
        }
      - --node_id
      - ImportExampleGen
      - --tfx_ir
      - |-
        {
          "pipelineInfo": {
            "id": "detect_anomolies_on_wafer_tfdv_schema"
          },
          "nodes": [
            {
              "pipelineNode": {
                "nodeInfo": {
                  "type": {
                    "name": "tfx.components.example_gen.import_example_gen.component.ImportExampleGen"
                  },
                  "id": "ImportExampleGen"
                },
                "contexts": {
                  "contexts": [
                    {
                      "type": {
                        "name": "pipeline"
                      },
                      "name": {
                        "fieldValue": {
                          "stringValue": "detect_anomolies_on_wafer_tfdv_schema"
                        }
                      }
                    },
                    {
                      "type": {
                        "name": "pipeline_run"
                      },
                      "name": {
                        "runtimeParameter": {
                          "name": "pipeline-run-id",
                          "type": "STRING"
                        }
                      }
                    },
                    {
                      "type": {
                        "name": "node"
                      },
                      "name": {
                        "fieldValue": {
                          "stringValue": "detect_anomolies_on_wafer_tfdv_schema.ImportExampleGen"
                        }
                      }
                    }
                  ]
                },
                "outputs": {
                  "outputs": {
                    "examples": {
                      "artifactSpec": {
                        "type": {
                          "name": "Examples",
                          "properties": {
                            "split_names": "STRING",
                            "span": "INT",
                            "version": "INT"
                          },
                          "baseType": "DATASET"
                        }
                      }
                    }
                  }
                },
                "parameters": {
                  "parameters": {
                    "output_config": {
                      "fieldValue": {
                        "stringValue": "{}"
                      }
                    },
                    "input_config": {
                      "fieldValue": {
                        "stringValue": "{\n  \"splits\": [\n    {\n      \"name\": \"train\",\n      \"pattern\": \"train\"\n    },\n    {\n      \"name\": \"eval\",\n      \"pattern\": \"eval\"\n    }\n  ]\n}"
                      }
                    },
                    "output_data_format": {
                      "fieldValue": {
                        "intValue": "6"
                      }
                    },
                    "input_base": {
                      "fieldValue": {
                        "stringValue": "/maye/trainEvalData"
                      }
                    },
                    "output_file_format": {
                      "fieldValue": {
                        "intValue": "5"
                      }
                    }
                  }
                },
                "downstreamNodes": [
                  "StatisticsGen",
                  "Trainer",
                  "Transform"
                ],
                "executionOptions": {
                  "cachingOptions": {}
                }
              }
            }
          ],
          "runtimeSpec": {
            "pipelineRoot": {
              "runtimeParameter": {
                "name": "pipeline-root",
                "type": "STRING",
                "defaultValue": {
                  "stringValue": "pipelines/detect_anomolies_on_wafer_tfdv_schema"
                }
              }
            },
            "pipelineRunId": {
              "runtimeParameter": {
                "name": "pipeline-run-id",
                "type": "STRING"
              }
            }
          },
          "executionMode": "SYNC",
          "deploymentConfig": {
            "@type": "type.googleapis.com/tfx.orchestration.IntermediateDeploymentConfig",
            "executorSpecs": {
              "ImportExampleGen": {
                "@type": "type.googleapis.com/tfx.orchestration.executable_spec.BeamExecutableSpec",
                "pythonExecutorSpec": {
                  "classPath": "tfx.components.example_gen.import_example_gen.executor.Executor"
                }
              }
            },
            "customDriverSpecs": {
              "ImportExampleGen": {
                "@type": "type.googleapis.com/tfx.orchestration.executable_spec.PythonClassExecutableSpec",
                "classPath": "tfx.components.example_gen.driver.FileBasedDriver"
              }
            },
            "metadataConnectionConfig": {
              "@type": "type.googleapis.com/ml_metadata.ConnectionConfig",
              "sqlite": {
                "filenameUri": "metadata/detect_anomolies_on_wafer_tfdv_schema/metadata.db",
                "connectionMode": "READWRITE_OPENCREATE"
              }
            }
          }
        }
      - --metadata_ui_path
      - /mlpipeline-ui-metadata.json
      - --runtime_parameter
      - pipeline-root=STRING:{{inputs.parameters.pipeline-root}}
      command: [python, -m, tfx.orchestration.kubeflow.container_entrypoint]
      env:
      - name: WORKFLOW_ID
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''workflows.argoproj.io/workflow'']'}
      - name: KFP_POD_NAME
        valueFrom:
          fieldRef: {fieldPath: metadata.name}
      - name: KFP_POD_UID
        valueFrom:
          fieldRef: {fieldPath: metadata.uid}
      - name: KFP_NAMESPACE
        valueFrom:
          fieldRef: {fieldPath: metadata.namespace}
      - name: WORKFLOW_ID
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''workflows.argoproj.io/workflow'']'}
      - name: KFP_RUN_ID
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''pipeline/runid'']'}
      - name: ENABLE_CACHING
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''pipelines.kubeflow.org/enable_caching'']'}
      envFrom:
      - configMapRef: {name: metadata-grpc-configmap, optional: true}
      #image: tensorflow/tfx:1.14.0
      
      image: docker.nju.edu.cn/tensorflow/tfx:1.13.0
      imagePullPolicy: Never
      
      
      volumeMounts: 
      - mountPath: /maye/trainEvalData
        name: wafer-data
        
      - mountPath: /tfx/tfx_pv
        name: tfx-pv
       
      
  
      
    inputs:
      parameters:
      - {name: pipeline-root}
    outputs:
      artifacts:
      - {name: mlpipeline-ui-metadata, path: /mlpipeline-ui-metadata.json}
        
      #- {name: import_example_gen_outputs, path: /tmp/pipelines}
        
        
        
    metadata:
      labels:
        add-pod-env: "true"
        pipelines.kubeflow.org/pipeline-sdk-type: tfx
        pipelines.kubeflow.org/kfp_sdk_version: 1.8.0
        pipelines.kubeflow.org/enable_caching: "true"
  - name: pusher
    container:
      args:
      - --pipeline_root
      - '{{inputs.parameters.pipeline-root}}'
      - --kubeflow_metadata_config
      - |-
        {
          "grpc_config": {
            "grpc_service_host": {
              "environment_variable": "METADATA_GRPC_SERVICE_HOST"
            },
            "grpc_service_port": {
              "environment_variable": "METADATA_GRPC_SERVICE_PORT"
            }
          }
        }
      - --node_id
      - Pusher
      - --tfx_ir
      - |-
        {
          "pipelineInfo": {
            "id": "detect_anomolies_on_wafer_tfdv_schema"
          },
          "nodes": [
            {
              "pipelineNode": {
                "nodeInfo": {
                  "type": {
                    "name": "tfx.components.pusher.component.Pusher",
                    "baseType": "DEPLOY"
                  },
                  "id": "Pusher"
                },
                "contexts": {
                  "contexts": [
                    {
                      "type": {
                        "name": "pipeline"
                      },
                      "name": {
                        "fieldValue": {
                          "stringValue": "detect_anomolies_on_wafer_tfdv_schema"
                        }
                      }
                    },
                    {
                      "type": {
                        "name": "pipeline_run"
                      },
                      "name": {
                        "runtimeParameter": {
                          "name": "pipeline-run-id",
                          "type": "STRING"
                        }
                      }
                    },
                    {
                      "type": {
                        "name": "node"
                      },
                      "name": {
                        "fieldValue": {
                          "stringValue": "detect_anomolies_on_wafer_tfdv_schema.Pusher"
                        }
                      }
                    }
                  ]
                },
                "inputs": {
                  "inputs": {
                    "model": {
                      "channels": [
                        {
                          "producerNodeQuery": {
                            "id": "Trainer"
                          },
                          "contextQueries": [
                            {
                              "type": {
                                "name": "pipeline"
                              },
                              "name": {
                                "fieldValue": {
                                  "stringValue": "detect_anomolies_on_wafer_tfdv_schema"
                                }
                              }
                            },
                            {
                              "type": {
                                "name": "pipeline_run"
                              },
                              "name": {
                                "runtimeParameter": {
                                  "name": "pipeline-run-id",
                                  "type": "STRING"
                                }
                              }
                            },
                            {
                              "type": {
                                "name": "node"
                              },
                              "name": {
                                "fieldValue": {
                                  "stringValue": "detect_anomolies_on_wafer_tfdv_schema.Trainer"
                                }
                              }
                            }
                          ],
                          "artifactQuery": {
                            "type": {
                              "name": "Model",
                              "baseType": "MODEL"
                            }
                          },
                          "outputKey": "model"
                        }
                      ]
                    }
                  }
                },
                "outputs": {
                  "outputs": {
                    "pushed_model": {
                      "artifactSpec": {
                        "type": {
                          "name": "PushedModel",
                          "baseType": "MODEL"
                        }
                      }
                    }
                  }
                },
                "parameters": {
                  "parameters": {
                    "custom_config": {
                      "fieldValue": {
                        "stringValue": "null"
                      }
                    },
                    "push_destination": {
                      "fieldValue": {
                        "stringValue": "{\n  \"filesystem\": {\n    \"base_directory\": \"serving_model/detect_anomolies_on_wafer_tfdv\"\n  }\n}"
                      }
                    }
                  }
                },
                "upstreamNodes": [
                  "Trainer"
                ],
                "executionOptions": {
                  "cachingOptions": {}
                }
              }
            }
          ],
          "runtimeSpec": {
            "pipelineRoot": {
              "runtimeParameter": {
                "name": "pipeline-root",
                "type": "STRING",
                "defaultValue": {
                  "stringValue": "pipelines/detect_anomolies_on_wafer_tfdv_schema"
                }
              }
            },
            "pipelineRunId": {
              "runtimeParameter": {
                "name": "pipeline-run-id",
                "type": "STRING"
              }
            }
          },
          "executionMode": "SYNC",
          "deploymentConfig": {
            "@type": "type.googleapis.com/tfx.orchestration.IntermediateDeploymentConfig",
            "executorSpecs": {
              "Pusher": {
                "@type": "type.googleapis.com/tfx.orchestration.executable_spec.PythonClassExecutableSpec",
                "classPath": "tfx.components.pusher.executor.Executor"
              }
            },
            "metadataConnectionConfig": {
              "@type": "type.googleapis.com/ml_metadata.ConnectionConfig",
              "sqlite": {
                "filenameUri": "metadata/detect_anomolies_on_wafer_tfdv_schema/metadata.db",
                "connectionMode": "READWRITE_OPENCREATE"
              }
            }
          }
        }
      - --metadata_ui_path
      - /mlpipeline-ui-metadata.json
      - --runtime_parameter
      - pipeline-root=STRING:{{inputs.parameters.pipeline-root}}
      command: [python, -m, tfx.orchestration.kubeflow.container_entrypoint]
      env:
      - name: WORKFLOW_ID
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''workflows.argoproj.io/workflow'']'}
      - name: KFP_POD_NAME
        valueFrom:
          fieldRef: {fieldPath: metadata.name}
      - name: KFP_POD_UID
        valueFrom:
          fieldRef: {fieldPath: metadata.uid}
      - name: KFP_NAMESPACE
        valueFrom:
          fieldRef: {fieldPath: metadata.namespace}
      - name: WORKFLOW_ID
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''workflows.argoproj.io/workflow'']'}
      - name: KFP_RUN_ID
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''pipeline/runid'']'}
      - name: ENABLE_CACHING
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''pipelines.kubeflow.org/enable_caching'']'}
      envFrom:
      - configMapRef: {name: metadata-grpc-configmap, optional: true}
      #image: tensorflow/tfx:1.14.0
      
      #image: dockerproxy.com/tensorflow/tfx:1.14.0
      
      image: docker.nju.edu.cn/tensorflow/tfx:1.13.0
      imagePullPolicy: Never
      
      
      volumeMounts: 
      - mountPath: /tfx/tfx_pv
        name: tfx-pv
        
      
    inputs:
      parameters:
      - {name: pipeline-root}
    outputs:
      artifacts:
      - {name: mlpipeline-ui-metadata, path: /mlpipeline-ui-metadata.json}
    metadata:
      labels:
        add-pod-env: "true"
        pipelines.kubeflow.org/pipeline-sdk-type: tfx
        pipelines.kubeflow.org/kfp_sdk_version: 1.8.0
        pipelines.kubeflow.org/enable_caching: "true"
  - name: schema-importer
    container:
      args:
      - --pipeline_root
      - '{{inputs.parameters.pipeline-root}}'
      - --kubeflow_metadata_config
      - |-
        {
          "grpc_config": {
            "grpc_service_host": {
              "environment_variable": "METADATA_GRPC_SERVICE_HOST"
            },
            "grpc_service_port": {
              "environment_variable": "METADATA_GRPC_SERVICE_PORT"
            }
          }
        }
      - --node_id
      - schema_importer
      - --tfx_ir
      - |-
        {
          "pipelineInfo": {
            "id": "detect_anomolies_on_wafer_tfdv_schema"
          },
          "nodes": [
            {
              "pipelineNode": {
                "nodeInfo": {
                  "type": {
                    "name": "tfx.dsl.components.common.importer.Importer"
                  },
                  "id": "schema_importer"
                },
                "contexts": {
                  "contexts": [
                    {
                      "type": {
                        "name": "pipeline"
                      },
                      "name": {
                        "fieldValue": {
                          "stringValue": "detect_anomolies_on_wafer_tfdv_schema"
                        }
                      }
                    },
                    {
                      "type": {
                        "name": "pipeline_run"
                      },
                      "name": {
                        "runtimeParameter": {
                          "name": "pipeline-run-id",
                          "type": "STRING"
                        }
                      }
                    },
                    {
                      "type": {
                        "name": "node"
                      },
                      "name": {
                        "fieldValue": {
                          "stringValue": "detect_anomolies_on_wafer_tfdv_schema.schema_importer"
                        }
                      }
                    }
                  ]
                },
                "outputs": {
                  "outputs": {
                    "result": {
                      "artifactSpec": {
                        "type": {
                          "name": "Schema"
                        }
                      }
                    }
                  }
                },
                "parameters": {
                  "parameters": {
                    "artifact_uri": {
                      "fieldValue": {
                        "stringValue": "/tfx/pipelines/detect_anomalies_in_wafer_schema/"
                      }
                    },
                    "reimport": {
                      "fieldValue": {
                        "intValue": "0"
                      }
                    },
                    "output_key": {
                      "fieldValue": {
                        "stringValue": "result"
                      }
                    }
                  }
                },
                "downstreamNodes": [
                  "Transform"
                ],
                "executionOptions": {
                  "cachingOptions": {}
                }
              }
            }
          ],
          "runtimeSpec": {
            "pipelineRoot": {
              "runtimeParameter": {
                "name": "pipeline-root",
                "type": "STRING",
                "defaultValue": {
                  "stringValue": "pipelines/detect_anomolies_on_wafer_tfdv_schema"
                }
              }
            },
            "pipelineRunId": {
              "runtimeParameter": {
                "name": "pipeline-run-id",
                "type": "STRING"
              }
            }
          },
          "executionMode": "SYNC",
          "deploymentConfig": {
            "@type": "type.googleapis.com/tfx.orchestration.IntermediateDeploymentConfig",
            "metadataConnectionConfig": {
              "@type": "type.googleapis.com/ml_metadata.ConnectionConfig",
              "sqlite": {
                "filenameUri": "metadata/detect_anomolies_on_wafer_tfdv_schema/metadata.db",
                "connectionMode": "READWRITE_OPENCREATE"
              }
            }
          }
        }
      - --metadata_ui_path
      - /mlpipeline-ui-metadata.json
      - --runtime_parameter
      - pipeline-root=STRING:{{inputs.parameters.pipeline-root}}
      command: [python, -m, tfx.orchestration.kubeflow.container_entrypoint]
      env:
      - name: WORKFLOW_ID
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''workflows.argoproj.io/workflow'']'}
      - name: KFP_POD_NAME
        valueFrom:
          fieldRef: {fieldPath: metadata.name}
      - name: KFP_POD_UID
        valueFrom:
          fieldRef: {fieldPath: metadata.uid}
      - name: KFP_NAMESPACE
        valueFrom:
          fieldRef: {fieldPath: metadata.namespace}
      - name: WORKFLOW_ID
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''workflows.argoproj.io/workflow'']'}
      - name: KFP_RUN_ID
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''pipeline/runid'']'}
      - name: ENABLE_CACHING
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''pipelines.kubeflow.org/enable_caching'']'}
      envFrom:
      - configMapRef: {name: metadata-grpc-configmap, optional: true}
      #image: tensorflow/tfx:1.14.0
      
      #image: dockerproxy.com/tensorflow/tfx:1.14.0
      
      image: docker.nju.edu.cn/tensorflow/tfx:1.13.0
      imagePullPolicy: Never
      
      
      volumeMounts: 
      - mountPath: /tfx/tfx_pv
        name: tfx-pv
        
      - mountPath: /tfx/pipelines/detect_anomalies_in_wafer_schema
        name: schema-path
        readOnly: True
      
      
    inputs:
      parameters:
      - {name: pipeline-root}
    outputs:
      artifacts:
      - {name: mlpipeline-ui-metadata, path: /mlpipeline-ui-metadata.json}
    metadata:
      labels:
        add-pod-env: "true"
        pipelines.kubeflow.org/pipeline-sdk-type: tfx
        pipelines.kubeflow.org/kfp_sdk_version: 1.8.0
        pipelines.kubeflow.org/enable_caching: "true"
  - name: statisticsgen
    container:
      args:
      - --pipeline_root
      - '{{inputs.parameters.pipeline-root}}'
      - --kubeflow_metadata_config
      - |-
        {
          "grpc_config": {
            "grpc_service_host": {
              "environment_variable": "METADATA_GRPC_SERVICE_HOST"
            },
            "grpc_service_port": {
              "environment_variable": "METADATA_GRPC_SERVICE_PORT"
            }
          }
        }
      - --node_id
      - StatisticsGen
      - --tfx_ir
      - |-
        {
          "pipelineInfo": {
            "id": "detect_anomolies_on_wafer_tfdv_schema"
          },
          "nodes": [
            {
              "pipelineNode": {
                "nodeInfo": {
                  "type": {
                    "name": "tfx.components.statistics_gen.component.StatisticsGen",
                    "baseType": "PROCESS"
                  },
                  "id": "StatisticsGen"
                },
                "contexts": {
                  "contexts": [
                    {
                      "type": {
                        "name": "pipeline"
                      },
                      "name": {
                        "fieldValue": {
                          "stringValue": "detect_anomolies_on_wafer_tfdv_schema"
                        }
                      }
                    },
                    {
                      "type": {
                        "name": "pipeline_run"
                      },
                      "name": {
                        "runtimeParameter": {
                          "name": "pipeline-run-id",
                          "type": "STRING"
                        }
                      }
                    },
                    {
                      "type": {
                        "name": "node"
                      },
                      "name": {
                        "fieldValue": {
                          "stringValue": "detect_anomolies_on_wafer_tfdv_schema.StatisticsGen"
                        }
                      }
                    }
                  ]
                },
                "inputs": {
                  "inputs": {
                    "examples": {
                      "channels": [
                        {
                          "producerNodeQuery": {
                            "id": "ImportExampleGen"
                          },
                          "contextQueries": [
                            {
                              "type": {
                                "name": "pipeline"
                              },
                              "name": {
                                "fieldValue": {
                                  "stringValue": "detect_anomolies_on_wafer_tfdv_schema"
                                }
                              }
                            },
                            {
                              "type": {
                                "name": "pipeline_run"
                              },
                              "name": {
                                "runtimeParameter": {
                                  "name": "pipeline-run-id",
                                  "type": "STRING"
                                }
                              }
                            },
                            {
                              "type": {
                                "name": "node"
                              },
                              "name": {
                                "fieldValue": {
                                  "stringValue": "detect_anomolies_on_wafer_tfdv_schema.ImportExampleGen"
                                }
                              }
                            }
                          ],
                          "artifactQuery": {
                            "type": {
                              "name": "Examples",
                              "baseType": "DATASET"
                            }
                          },
                          "outputKey": "examples"
                        }
                      ],
                      "minCount": 1
                    }
                  }
                },
                "outputs": {
                  "outputs": {
                    "statistics": {
                      "artifactSpec": {
                        "type": {
                          "name": "ExampleStatistics",
                          "properties": {
                            "span": "INT",
                            "split_names": "STRING"
                          },
                          "baseType": "STATISTICS"
                        }
                      }
                    }
                  }
                },
                "parameters": {
                  "parameters": {
                    "exclude_splits": {
                      "fieldValue": {
                        "stringValue": "[]"
                      }
                    }
                  }
                },
                "upstreamNodes": [
                  "ImportExampleGen"
                ],
                "executionOptions": {
                  "cachingOptions": {}
                }
              }
            }
          ],
          "runtimeSpec": {
            "pipelineRoot": {
              "runtimeParameter": {
                "name": "pipeline-root",
                "type": "STRING",
                "defaultValue": {
                  "stringValue": "pipelines/detect_anomolies_on_wafer_tfdv_schema"
                }
              }
            },
            "pipelineRunId": {
              "runtimeParameter": {
                "name": "pipeline-run-id",
                "type": "STRING"
              }
            }
          },
          "executionMode": "SYNC",
          "deploymentConfig": {
            "@type": "type.googleapis.com/tfx.orchestration.IntermediateDeploymentConfig",
            "executorSpecs": {
              "StatisticsGen": {
                "@type": "type.googleapis.com/tfx.orchestration.executable_spec.BeamExecutableSpec",
                "pythonExecutorSpec": {
                  "classPath": "tfx.components.statistics_gen.executor.Executor"
                }
              }
            },
            "metadataConnectionConfig": {
              "@type": "type.googleapis.com/ml_metadata.ConnectionConfig",
              "sqlite": {
                "filenameUri": "metadata/detect_anomolies_on_wafer_tfdv_schema/metadata.db",
                "connectionMode": "READWRITE_OPENCREATE"
              }
            }
          }
        }
      - --metadata_ui_path
      - /mlpipeline-ui-metadata.json
      - --runtime_parameter
      - pipeline-root=STRING:{{inputs.parameters.pipeline-root}}
      command: [python, -m, tfx.orchestration.kubeflow.container_entrypoint]
      env:
      - name: WORKFLOW_ID
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''workflows.argoproj.io/workflow'']'}
      - name: KFP_POD_NAME
        valueFrom:
          fieldRef: {fieldPath: metadata.name}
      - name: KFP_POD_UID
        valueFrom:
          fieldRef: {fieldPath: metadata.uid}
      - name: KFP_NAMESPACE
        valueFrom:
          fieldRef: {fieldPath: metadata.namespace}
      - name: WORKFLOW_ID
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''workflows.argoproj.io/workflow'']'}
      - name: KFP_RUN_ID
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''pipeline/runid'']'}
      - name: ENABLE_CACHING
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''pipelines.kubeflow.org/enable_caching'']'}
      envFrom:
      - configMapRef: {name: metadata-grpc-configmap, optional: true}
      #image: tensorflow/tfx:1.14.0
      
      #image: dockerproxy.com/tensorflow/tfx:1.14.0
      
      image: docker.nju.edu.cn/tensorflow/tfx:1.13.0
      imagePullPolicy: Never
      
      volumeMounts: 
      - mountPath: /tfx/tfx_pv
        name: tfx-pv
      
      
      
    inputs:
      parameters:
      - {name: pipeline-root}
      
      #artifacts:
      #- {name: import_example_gen_outputs, path: /tmp/pipelines}
      
    outputs:
      artifacts:
      - {name: mlpipeline-ui-metadata, path: /mlpipeline-ui-metadata.json}
      
      #- {name: statistics_gen_outputs, path: /tmp/pipelines}
      
      
    metadata:
      labels:
        add-pod-env: "true"
        pipelines.kubeflow.org/pipeline-sdk-type: tfx
        pipelines.kubeflow.org/kfp_sdk_version: 1.8.0
        pipelines.kubeflow.org/enable_caching: "true"
  - name: trainer
    container:
      args:
      - --pipeline_root
      - '{{inputs.parameters.pipeline-root}}'
      - --kubeflow_metadata_config
      - |-
        {
          "grpc_config": {
            "grpc_service_host": {
              "environment_variable": "METADATA_GRPC_SERVICE_HOST"
            },
            "grpc_service_port": {
              "environment_variable": "METADATA_GRPC_SERVICE_PORT"
            }
          }
        }
      - --node_id
      - Trainer
      - --tfx_ir
      - |-
        {
          "pipelineInfo": {
            "id": "detect_anomolies_on_wafer_tfdv_schema"
          },
          "nodes": [
            {
              "pipelineNode": {
                "nodeInfo": {
                  "type": {
                    "name": "tfx.components.trainer.component.Trainer",
                    "baseType": "TRAIN"
                  },
                  "id": "Trainer"
                },
                "contexts": {
                  "contexts": [
                    {
                      "type": {
                        "name": "pipeline"
                      },
                      "name": {
                        "fieldValue": {
                          "stringValue": "detect_anomolies_on_wafer_tfdv_schema"
                        }
                      }
                    },
                    {
                      "type": {
                        "name": "pipeline_run"
                      },
                      "name": {
                        "runtimeParameter": {
                          "name": "pipeline-run-id",
                          "type": "STRING"
                        }
                      }
                    },
                    {
                      "type": {
                        "name": "node"
                      },
                      "name": {
                        "fieldValue": {
                          "stringValue": "detect_anomolies_on_wafer_tfdv_schema.Trainer"
                        }
                      }
                    }
                  ]
                },
                "inputs": {
                  "inputs": {
                    "examples": {
                      "channels": [
                        {
                          "producerNodeQuery": {
                            "id": "ImportExampleGen"
                          },
                          "contextQueries": [
                            {
                              "type": {
                                "name": "pipeline"
                              },
                              "name": {
                                "fieldValue": {
                                  "stringValue": "detect_anomolies_on_wafer_tfdv_schema"
                                }
                              }
                            },
                            {
                              "type": {
                                "name": "pipeline_run"copy an image from one namespace to another namespace:
                              },
                              "name": {
                                "runtimeParameter": {
                                  "name": "pipeline-run-id",
                                  "type": "STRING"
                                }
                              }
                            },
                            {
                              "type": {
                                "name": "node"
                              },
                              "name": {
                                "fieldValue": {
                                  "stringValue": "detect_anomolies_on_wafer_tfdv_schema.ImportExampleGen"
                                }
                              }
                            }
                          ],
                          "artifactQuery": {
                            "type": {
                              "name": "Examples",
                              "baseType": "DATASET"
                            }
                          },
                          "outputKey": "examples"
                        }
                      ],
                      "minCount": 1
                    },
                    "transform_graph": {
                      "channels": [
                        {
                          "producerNodeQuery": {
                            "id": "Transform"
                          },
                          "contextQueries": [
                            {
                              "type": {
                                "name": "pipeline"
                              },
                              "name": {
                                "fieldValue": {
                                  "stringValue": "detect_anomolies_on_wafer_tfdv_schema"
                                }
                              }
                            },
                            {
                              "type": {
                                "name": "pipeline_run"
                              },
                              "name": {
                                "runtimeParameter": {
                                  "name": "pipeline-run-id",
                                  "type": "STRING"
                                }
                              }
                            },
                            {
                              "type": {
                                "name": "node"
                              },
                              "name": {
                                "fieldValue": {
                                  "stringValue": "detect_anomolies_on_wafer_tfdv_schema.Transform"
                                }
                              }
                            }
                          ],
                          "artifactQuery": {
                            "type": {
                              "name": "TransformGraph"
                            }
                          },
                          "outputKey": "transform_graph"
                        }
                      ]
                    }
                  }
                },
                "outputs": {
                  "outputs": {
                    "model": {
                      "artifactSpec": {
                        "type": {
                          "name": "Model",
                          "baseType": "MODEL"
                        }
                      }
                    },
                    "model_run": {
                      "artifactSpec": {
                        "type": {
                          "name": "ModelRun"
                        }
                      }
                    }
                  }
                },
                "parameters": {
                  "parameters": {
                    "train_args": {
                      "fieldValue": {
                        "stringValue": "{\n  \"num_steps\": 21\n}"
                      }
                    },
                    "custom_config": {
                      "fieldValue": {
                        "stringValue": "{\"epochs\": 50}"
                      }
                    },
                    "eval_args": {
                      "fieldValue": {
                        "stringValue": "{}"
                      }
                    },
                    "module_path": {
                      "fieldValue": {
                        "stringValue": "detect_anomalies_in_wafer_trainer@/tfx/pipelines/tfx_user_code_Trainer-0.0+35148d2579a5a421da4bda3bd371de44bf8888bb4ea4f5cc424f859c6e4db9db-py3-none-any.whl"
                      }
                    }
                  }
                },
                "upstreamNodes": [
                  "ImportExampleGen",
                  "Transform"
                ],
                "downstreamNodes": [
                  "Pusher"
                ],
                "executionOptions": {
                  "cachingOptions": {}
                }
              }
            }
          ],
          "runtimeSpec": {
            "pipelineRoot": {
              "runtimeParameter": {
                "name": "pipeline-root",
                "type": "STRING",
                "defaultValue": {
                  "stringValue": "pipelines/detect_anomolies_on_wafer_tfdv_schema"
                }
              }
            },
            "pipelineRunId": {
              "runtimeParameter": {
                "name": "pipeline-run-id",
                "type": "STRING"
              }
            }
          },
          "executionMode": "SYNC",
          "deploymentConfig": {
            "@type": "type.googleapis.com/tfx.orchestration.IntermediateDeploymentConfig",
            "executorSpecs": {
              "Trainer": {
                "@type": "type.googleapis.com/tfx.orchestration.executable_spec.PythonClassExecutableSpec",
                "classPath": "tfx.components.trainer.executor.GenericExecutor"
              }
            },
            "metadataConnectionConfig": {
              "@type": "type.googleapis.com/ml_metadata.ConnectionConfig",
              "sqlite": {
                "filenameUri": "metadata/detect_anomolies_on_wafer_tfdv_schema/metadata.db",
                "connectionMode": "READWRITE_OPENCREATE"
              }
            }
          }
        }
      - --metadata_ui_path
      - /mlpipeline-ui-metadata.json
      - --runtime_parameter
      - pipeline-root=STRING:{{inputs.parameters.pipeline-root}}
      command: [python, -m, tfx.orchestration.kubeflow.container_entrypoint]
      env:
      - name: WORKFLOW_ID
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''workflows.argoproj.io/workflow'']'}
      - name: KFP_POD_NAME
        valueFrom:
          fieldRef: {fieldPath: metadata.name}
      - name: KFP_POD_UID
        valueFrom:
          fieldRef: {fieldPath: metadata.uid}
      - name: KFP_NAMESPACE
        valueFrom:
          fieldRef: {fieldPath: metadata.namespace}
      - name: WORKFLOW_ID
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''workflows.argoproj.io/workflow'']'}
      - name: KFP_RUN_ID
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''pipeline/runid'']'}
      - name: ENABLE_CACHING
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''pipelines.kubeflow.org/enable_caching'']'}
      envFrom:
      - configMapRef: {name: metadata-grpc-configmap, optional: true}
      #image: tensorflow/tfx:1.14.0
      
      #image: dockerproxy.com/tensorflow/tfx:1.14.0
      
      
      image: docker.nju.edu.cn/tensorflow/tfx:1.13.0
      imagePullPolicy: Never
      
      
      volumeMounts: 
      - mountPath: /tfx/pipelines/tfx_user_code_Trainer-0.0+35148d2579a5a421da4bda3bd371de44bf8888bb4ea4f5cc424f859c6e4db9db-py3-none-any.whl
        name: trainer-module
        readOnly: True
      
      - mountPath: /tfx/tfx_pv
        name: tfx-pv
      
    inputs:
      parameters:
      - {name: pipeline-root}
    outputs:
      artifacts:
      - {name: mlpipeline-ui-metadata, path: /mlpipeline-ui-metadata.json}
    metadata:
      labels:
        add-pod-env: "true"
        pipelines.kubeflow.org/pipeline-sdk-type: tfx
        pipelines.kubeflow.org/kfp_sdk_version: 1.8.0
        pipelines.kubeflow.org/enable_caching: "true"
  - name: transform
    container:
      args:
      - --pipeline_root
      - '{{inputs.parameters.pipeline-root}}'
      - --kubeflow_metadata_config
      - |-
        {
          "grpc_config": {
            "grpc_service_host": {
              "environment_variable": "METADATA_GRPC_SERVICE_HOST"
            },
            "grpc_service_port": {
              "environment_variable": "METADATA_GRPC_SERVICE_PORT"
            }
          }
        }
      - --node_id
      - Transform
      - --tfx_ir
      - |-
        {
          "pipelineInfo": {
            "id": "detect_anomolies_on_wafer_tfdv_schema"
          },
          "nodes": [
            {
              "pipelineNode": {
                "nodeInfo": {
                  "type": {
                    "name": "tfx.components.transform.component.Transform",
                    "baseType": "TRANSFORM"
                  },
                  "id": "Transform"
                },
                "contexts": {
                  "contexts": [
                    {
                      "type": {
                        "name": "pipeline"
                      },
                      "name": {
                        "fieldValue": {
                          "stringValue": "detect_anomolies_on_wafer_tfdv_schema"
                        }
                      }
                    },
                    {
                      "type": {
                        "name": "pipeline_run"
                      },
                      "name": {
                        "runtimeParameter": {
                          "name": "pipeline-run-id",
                          "type": "STRING"
                        }
                      }
                    },
                    {
                      "type": {
                        "name": "node"
                      },
                      "name": {
                        "fieldValue": {
                          "stringValue": "detect_anomolies_on_wafer_tfdv_schema.Transform"
                        }
                      }
                    }
                  ]
                },
                "inputs": {
                  "inputs": {
                    "schema": {
                      "channels": [
                        {
                          "producerNodeQuery": {
                            "id": "schema_importer"
                          },
                          "contextQueries": [
                            {
                              "type": {
                                "name": "pipeline"
                              },
                              "name": {
                                "fieldValue": {
                                  "stringValue": "detect_anomolies_on_wafer_tfdv_schema"
                                }
                              }
                            },
                            {
                              "type": {
                                "name": "pipeline_run"
                              },
                              "name": {
                                "runtimeParameter": {
                                  "name": "pipeline-run-id",
                                  "type": "STRING"
                                }
                              }
                            },
                            {
                              "type": {
                                "name": "node"
                              },
                              "name": {
                                "fieldValue": {
                                  "stringValue": "detect_anomolies_on_wafer_tfdv_schema.schema_importer"
                                }
                              }
                            }
                          ],
                          "artifactQuery": {
                            "type": {
                              "name": "Schema"
                            }
                          },
                          "outputKey": "result"
                        }
                      ],
                      "minCount": 1
                    },
                    "examples": {
                      "channels": [
                        {
                          "producerNodeQuery": {
                            "id": "ImportExampleGen"
                          },
                          "contextQueries": [
                            {
                              "type": {
                                "name": "pipeline"
                              },
                              "name": {
                                "fieldValue": {
                                  "stringValue": "detect_anomolies_on_wafer_tfdv_schema"
                                }
                              }
                            },
                            {
                              "type": {
                                "name": "pipeline_run"
                              },
                              "name": {
                                "runtimeParameter": {
                                  "name": "pipeline-run-id",
                                  "type": "STRING"
                                }
                              }
                            },
                            {
                              "type": {
                                "name": "node"
                              },
                              "name": {
                                "fieldValue": {
                                  "stringValue": "detect_anomolies_on_wafer_tfdv_schema.ImportExampleGen"
                                }
                              }
                            }
                          ],
                          "artifactQuery": {
                            "type": {
                              "name": "Examples",
                              "baseType": "DATASET"
                            }
                          },
                          "outputKey": "examples"
                        }
                      ],
                      "minCount": 1
                    }
                  }
                },
                "outputs": {
                  "outputs": {
                    "post_transform_anomalies": {
                      "artifactSpec": {
                        "type": {
                          "name": "ExampleAnomalies",
                          "properties": {
                            "split_names": "STRING",
                            "span": "INT"
                          }
                        }
                      }
                    },
                    "updated_analyzer_cache": {
                      "artifactSpec": {
                        "type": {
                          "name": "TransformCache"
                        }
                      }
                    },
                    "transform_graph": {
                      "artifactSpec": {
                        "type": {
                          "name": "TransformGraph"
                        }
                      }
                    },
                    "post_transform_schema": {
                      "artifactSpec": {
                        "type": {
                          "name": "Schema"
                        }
                      }
                    },
                    "pre_transform_schema": {
                      "artifactSpec": {
                        "type": {
                          "name": "Schema"
                        }
                      }
                    },
                    "post_transform_stats": {
                      "artifactSpec": {
                        "type": {
                          "name": "ExampleStatistics",
                          "properties": {
                            "split_names": "STRING",
                            "span": "INT"
                          },
                          "baseType": "STATISTICS"
                        }
                      }
                    },
                    "pre_transform_stats": {
                      "artifactSpec": {
                        "type": {
                          "name": "ExampleStatistics",
                          "properties": {
                            "split_names": "STRING",
                            "span": "INT"
                          },
                          "baseType": "STATISTICS"
                        }
                      }
                    }
                  }
                },
                "parameters": {
                  "parameters": {
                    "disable_statistics": {
                      "fieldValue": {
                        "intValue": "0"
                      }
                    },
                    "force_tf_compat_v1": {
                      "fieldValue": {
                        "intValue": "0"
                      }
                    },
                    "module_path": {
                      "fieldValue": {
                        "stringValue": "detect_anomalies_in_wafer_trainer@/tfx/pipelines/tfx_user_code_Transform-0.0+35148d2579a5a421da4bda3bd371de44bf8888bb4ea4f5cc424f859c6e4db9db-py3-none-any.whl"
                      }
                    },
                    "custom_config": {
                      "fieldValue": {
                        "stringValue": "null"
                      }
                    }
                  }
                },
                "upstreamNodes": [
                  "ImportExampleGen",copy an image from one namespace to another namespace:
                  "schema_importer"
                ],
                "downstreamNodes": [
                  "Trainer"
                ],
                "executionOptions": {
                  "cachingOptions": {}
                }
              }
            }
          ],
          "runtimeSpec": {
            "pipelineRoot": {
              "runtimeParameter": {
                "name": "pipeline-root",
                "type": "STRING",
                "defaultValue": {
                  "stringValue": "pipelines/detect_anomolies_on_wafer_tfdv_schema"
                }
              }
            },
            "pipelineRunId": {
              "runtimeParameter": {
                "name": "pipeline-run-id",
                "type": "STRING"
              }
            }
          },
          "executionMode": "SYNC",
          "deploymentConfig": {
            "@type": "type.googleapis.com/tfx.orchestration.IntermediateDeploymentConfig",
            "executorSpecs": {
              "Transform": {
                "@type": "type.googleapis.com/tfx.orchestration.executable_spec.BeamExecutableSpec",
                "pythonExecutorSpec": {
                  "classPath": "tfx.components.transform.executor.Executor"
                }
              }
            },
            "metadataConnectionConfig": {
              "@type": "type.googleapis.com/ml_metadata.ConnectionConfig",
              "sqlite": {
                "filenameUri": "metadata/detect_anomolies_on_wafer_tfdv_schema/metadata.db",
                "connectionMode": "READWRITE_OPENCREATE"
              }
            }
          }
        }
      - --metadata_ui_path
      - /mlpipeline-ui-metadata.json
      - --runtime_parameter
      - pipeline-root=STRING:{{inputs.parameters.pipeline-root}}
      command: [python, -m, tfx.orchestration.kubeflow.container_entrypoint]
      env:
      - name: WORKFLOW_ID
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''workflows.argoproj.io/workflow'']'}
      - name: KFP_POD_NAME
        valueFrom:
          fieldRef: {fieldPath: metadata.name}
      - name: KFP_POD_UID
        valueFrom:
          fieldRef: {fieldPath: metadata.uid}
      - name: KFP_NAMESPACE
        valueFrom:
          fieldRef: {fieldPath: metadata.namespace}
      - name: WORKFLOW_ID
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''workflows.argoproj.io/workflow'']'}
      - name: KFP_RUN_ID
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''pipeline/runid'']'}
      - name: ENABLE_CACHING
        valueFrom:
          fieldRef: {fieldPath: 'metadata.labels[''pipelines.kubeflow.org/enable_caching'']'}
      envFrom:
      - configMapRef: {name: metadata-grpc-configmap, optional: true}
      #image: tensorflow/tfx:1.14.0
      
      #image: dockerproxy.com/tensorflow/tfx:1.14.0
      
      image: docker.nju.edu.cn/tensorflow/tfx:1.13.0
      imagePullPolicy: Never
      
      
      volumeMounts:
      - mountPath: /tfx/pipelines/tfx_user_code_Transform-0.0+35148d2579a5a421da4bda3bd371de44bf8888bb4ea4f5cc424f859c6e4db9db-py3-none-any.whl
        name: transform-module
        readOnly: True
        
      - mountPath: /tfx/pipelines/detect_anomalies_in_wafer_schema
        name: schema-path
        readOnly: True  
      
      - mountPath: /tfx/tfx_pv
        name: tfx-pv
          
      
    inputs:
      parameters:
      - {name: pipeline-root}
    outputs:
      artifacts:
      - {name: mlpipeline-ui-metadata, path: /mlpipeline-ui-metadata.json}
    metadata:
      labels:
        add-pod-env: "true"
        pipelines.kubeflow.org/pipeline-sdk-type: tfx
        pipelines.kubeflow.org/kfp_sdk_version: 1.8.0
        pipelines.kubeflow.org/enable_caching: "true"
  arguments:
    parameters:
    - {name: pipeline-root, value: /tfx/tfx_pv/pipelines/detect_anomolies_on_wafer_tfdv_schema}
    
    #- {name: pipeline-root, value: hdfs:///pipelines/detect_anomolies_on_wafer_tfdv_schema}
        
    
  serviceAccountName: pipeline-runner

Attention:

  1. Container has standalone file system, and when a container finishes, its file system also finishes, namely all files in it nor exit any more. pipeline_root of tfx pipeline needs to be a persistent directory which can be read and written by all components of the tfx pipeline, so in this example, use persistent volume for it. And persistent volume is better to be a standalone disk, not share disk with OS, since one persistent volume can not limit not to access the whole disk, if sharing one disk with OS, may affect OS.
  2. Each component of a tfx pipeline runs in a container in one pod, namely one component one pod. And usually, pods run on different nodes of kubernetes cluster, this is why using kubernetes cluster -- distribute pods to multiple nodes, so in this example, use nfs for pipeline_root's volume. And, nfs is better to be a standalone disk, since nfs can not limit not to access the whole disk, if sharing one disk with OS, may affect OS, not secure.

Note:

  1. mirror websites of hub.docker.com

汇总国内可用镜像
DaoCloud 镜像站
加速地址:https://docker.m.daocloud.io

支持:Docker Hub、GCR、K8S、GHCR、Quay、NVCR 等

对外免费:是

网易云
加速地址:https://hub-mirror.c.163.com

支持:Docker Hub

对外免费:是

Docker 镜像代理
加速地址:https://dockerproxy.com

支持:Docker Hub、GCR、K8S、GHCR

对外免费:是

百度云
加速地址:https://mirror.baidubce.com

支持:Docker Hub

对外免费:是

南京大学镜像站
加速地址:https://docker.nju.edu.cn

支持:Docker Hub、GCR、GHCR、Quay、NVCR 等

对外免费:是

上海交大镜像站
加速地址:https://docker.mirrors.sjtug.sjtu.edu.cn/

支持:Docker Hub、GCR 等

限制:无

阿里云
加速地址:https://<your_code>.mirror.aliyuncs.com

支持:Docker Hub

限制:需要登录账号获取CODE [1]

  1. check trace stack of failed linux process which runs in backgroud
strace -e trace=none -p <PID>

Refernece:


  1. https://zhuanlan.zhihu.com/p/642560164 ↩︎