Interactive Session (3): Serving a Model Using Kubernetes (OP)
- Due Apr 10 at 11:59pm
- Points 0
- Questions 0
- Available Mar 23 at 11:59pm - Apr 12 at 11:59pm
- Time Limit None
Instructions
Welcome to CMPT 756 Interactive Session (3). In this Interactive Session, based on a Google Cloud Platform Tutorial, you will deploy and serve a model using one GPU on Google Kubernetes Engine (GKE) standard cluster with NVIDIA Triton Inference Server. You will first deploy a simple using Kubernetes. Then, being more comfortable with GPE, you will create a standard Kubernetes cluster and a node pool with L4 GPU, deploy an inference server in your cluster, and serve your model.
This Interactive Session is estimated to take about 50 minutes of your time with all the preparation and additional practices. You can start it at your preferred time and keep working on it at your pace until the due date to finish or redo this Interactive Session as many times as you wish. Your highest score among all attempts will be considered for you.
The Interactive Session closely follows a GCP tutorial and adds some required preparation and some helpful hints here and there. At any step during this hands-on experience, if you need further information or more breakdown of steps, please do not hesitate to consult Google Cloud Platform documentation.
Acknowledgement
Google Cloud Platform Documentation and Tutorials were the main resources used in preparing this Interactive Session. The figures and steps for configuring resources are used exactly as published in the Google Cloud Documentation and Tutorials (accessed January 2025) with minor modifications for adding further details.