Model Serving with GitOps

In this section, we’ll deploy the model and the game application using Argo CD, a declarative GitOps continuous delivery tool for Kubernetes. Argo CD follows the GitOps principles, where the desired state of the application is declaratively defined in a Git repository, and Argo CD ensures that the actual state of the deployed application matches the desired state. Argo CD continuously monitors the desired state of the application in the Git repository and automatically applies the changes to the Kubernetes cluster.

An Argo CD instance has been already preinstalled and configured in your argocd-{user}. This Argo CD instance comes from the OpenShift GitOps Operator.

Go to the OpenShift Web Console, select argocd-{user} project.

You can access the Argo CD web interface by clicking on the Argo CD route URL. The URL is displayed in the Topology view, and named argocd-server.

Make sure you access ArgoCD through the argocd-server route URL, and not the Cluster Argo CD route. This is to ensure your permissions are correctly set up.

The Argo CD web interface is protected by OAuth, and you can log in using your cluster credentials:

{user}

{password}

Then authorize the Argo CD application to access the OpenShift API. Now, you can access the Argo CD web interface and manage the applications.

Argo CD provides a web interface and a command-line interface (CLI) for managing applications. You can view the status of the deployed applications, perform manual synchronization, rollback to previous versions, and manage application configurations.

1. Deploying the model as an Argo CD Application

Next, we’ll create an Argo CD Application to deploy the AI model with KServe and OpenVINO Model Server.

For this purpose, we created an Helm chart to deploy the model with two KServe objects: ServingRuntime and InferenceService

Those will point to the v2 model trained and pushed to the Model Registry by the Kubeflow pipeline as we discussed in the previous chapter.

Select + NEW APP from the Argo CD web interface, and fill in the following details:

Application Name

ai-model

Project Name

default

Sync Policy

Automatic

Self Heal

Enabled

Repository URL

{gitea_console_url}/{user}/rps-game-manifests.git

Revision

main

Path

helm-model

Cluster URL

https://kubernetes.default.svc

Namespace

wksp-{user}

Helm Parameters: model.path

models/{user}-yolo-rps-v2

Helm Parameters: model.name

yolo-rps

Helm is a package manager for Kubernetes that allows you to define, install, and manage applications as packages called charts. A Helm chart contains the necessary templates and configuration files to deploy an application in a Kubernetes cluster.

Hit CREATE when finished. After creating the application, Argo CD will automatically synchronize the application and deploy the AI model in the Data Science namespace controlled by Argo CD. You can view the status of the application, the resources deployed, and the synchronization status in the Argo CD web interface.

It might take a few minutes to have the model fully deployed.

From the link: OpenShift AI Dashboard you should see now the current status with model deployment with Inference API up and running and status green.

If you click on the Internal and External endpoints details you’ll see the URLs to access the inferenced model either from the cluster or from outside the cluster:

2. KServe and InferenceService

You can also review the deployment with KServe on the cluster.

Go to the OpenShift Web Console.

Select your AI project so wskp-{user}.

Find a KNative Service named yolo-rps-predictor.

You can review the newly created InferenceService

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  annotations:
    openshift.io/display-name: yolo-rps
    serving.knative.openshift.io/enablePassthrough: 'true'
    sidecar.istio.io/inject: 'true'
    sidecar.istio.io/rewriteAppHTTPProbers: 'true'
  name: yolo-rps
  namespace: wksp-{user}
  finalizers:
    - inferenceservice.finalizers
  labels:
    opendatahub.io/dashboard: 'true'
spec:
  predictor:
    maxReplicas: 1
    minReplicas: 1
    model:
      modelFormat:
        name: onnx
        version: '1'
      name: ''
      resources:
        limits:
          cpu: '2'
          memory: 3Gi
        requests:
          cpu: '1'
          memory: 1Gi
      runtime: yolo-rps
      storage:
        key: aws-connection-my-storage
        path: models/user9-yolo-rps-v2

and ServingRuntime

apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
  annotations:
    opendatahub.io/accelerator-name: ''
    opendatahub.io/apiProtocol: REST
    opendatahub.io/template-display-name: OpenVINO Model Server
    opendatahub.io/template-name: kserve-ovms
    openshift.io/display-name: yolo-rps
  name: yolo-rps
  namespace: wksp-{user}
  labels:
    opendatahub.io/dashboard: 'true'
spec:
  annotations:
    prometheus.io/path: /metrics
    prometheus.io/port: '8888'
  containers:
    - args:
        - '--model_name={{.Name}}'
        - '--port=8001'
        - '--rest_port=8888'
        - '--model_path=/mnt/models'
        - '--file_system_poll_wait_seconds=0'
        - '--grpc_bind_address=0.0.0.0'
        - '--rest_bind_address=0.0.0.0'
        - '--target_device=AUTO'
        - '--metrics_enable'
      image: 'quay.io/modh/openvino_model_server@sha256:e4ece481b8dd866163da7ffcd3e95389ef7fb3d32a9a0dbb056b07839652a4f7'
      name: kserve-container
      ports:
        - containerPort: 8888
          protocol: TCP
      volumeMounts:
        - mountPath: /dev/shm
          name: shm
  multiModel: false
  protocolVersions:
    - v2
    - grpc-v2
  supportedModelFormats:
    - autoSelect: true
      name: openvino_ir
      version: opset13
    - name: onnx
      version: '1'
    - autoSelect: true
      name: tensorflow
      version: '1'
    - autoSelect: true
      name: tensorflow
      version: '2'
    - autoSelect: true
      name: paddle
      version: '2'
    - autoSelect: true
      name: pytorch
      version: '2'
  volumes:
    - emptyDir:
        medium: Memory
        sizeLimit: 2Gi
      name: shm

3. Next Steps

Now that we have deployed the v2 RoShambo game model using the Model Server in OpenShift AI with Argo CD, we can integrate it with the game application to enable AI-powered gameplay. In the next section, we’ll test in the Jupyter notebook the inferencing endpoint by querying the model with sample input data and verifying the predictions.