Data Science Pipeline
Creating pipelines using version-controlled, shareable code is often a preferred approach. The Kubeflow Pipelines (kfp) SDK offers a Python API for building pipelines, which is available as a Python package that can be installed via the pip install kfp command. Using this package, you can write Python code to define a pipeline, compile it into YAML format, and then import it into OpenShift AI.
This deep dive does not focus on the specifics of using the SDK. Instead, it supplies the necessary files for you to review and upload.
Enabling Data Science Pipelines
In this section, you prepare your environment to train a model automatically using data science pipelines.
In the OpenShift AI dashboard, on the Fraud Detection page, click the Pipelines tab.
![Pipelines](_images/pipeline-menu.png)
Click Configure pipeline server button.
![Create Pipeline](_images/create-pipeline-server.png)
In the Configure Pipeline Server form, locate the Access Key field marked by the key icon. Open the dropdown menu and select Pipeline Artifacts to automatically fill the form with the necessary credentials for the connection.
![Fill Create Pipeline Form](_images/create-pipeline-server-form.png)
Click Configure pipeline server, and wait until the loading spinner disappears and Start by importing a pipeline is displayed. It is not necessary to click to push the Start by importing a pipeline button, but when enabled, it indicates that the pipeline server is ready.
This process might take 5 minutes.
Running a Data Science Pipeline
From your Jupyter environment, download the 7_get_data_train_upload.py
file to your local disk by selecting the file, right-clicking your mouse, and selecting Download
.
![Download Pipeline](_images/download-pipeline.png)
Then, import this file into the created Pipeline, go again to the Pipelines tab, and push the Import Pipeline button.
![Pipeline Import](_images/pipeline-import.png)
Then fill the form with a Pipeline name and Pipeline description.
Click Upload and select 7_get_data_train_upload.yaml
from your local files to upload the pipeline.
Finally, click the Import pipeline button to import and save the pipeline.
![Pipeline Import form](_images/pipeline-import-upload.png)
The pipeline is registered but not executed till you create a run.
Expand the pipeline item, click the action menu (⋮), and then select View runs.
![Pipeline Run](_images/pipeline-view-runs.png)
Click Create run and fill the form with the following values:
Experiment: leave the default Default value Name: Run 1 Pipeline: Select the pipeline that you uploaded
You can leave the other fields with their default values.
![Create Run](_images/pipeline-create-run-form.png)
After creating the run, the pipeline is executed automatically. It gets training data, retrains the model with the new data, and publishes it.
![Create Run](_images/pipeline-run-in-progress.png)