Improve Price Performance for LLM Serving with vLLM on TPU & GKE

Dive into a hands-on workshop designed exclusively for AI developers. Learn to leverage the power of Google Cloud TPUs, the custom accelerators behind Google Gemini, for highly efficient LLM inference using vLLM. In this trial run for Google Developer Experts (GDEs), you'll build and deploy Gemma 3 27B on Trillium TPUs with vLLM and Google Kubernetes Engine (GKE). Explore advanced tooling like Dynamic Workload Scheduler (DWS) for TPU provisioning, Google Cloud Storage (GCS) for model checkpoints, and essential observability and monitoring solutions. Your live feedback will directly shape the future of this workshop, and we encourage you to share your experience with the vLLM/TPU integration on your social channels.

Location: Room 207

Duration: 1 hour

Speaker(s):

Author:

Niranjan Hira

Senior Product Manager

Google Cloud

As a Product Manager in our AI Infrastructure team, Hira looks out for how Google Cloud offerings can help customers and partners build more helpful AI experiences for users. With over 30 years of experience building applications and products across multiple industries, he likes to hog the whiteboard and tell developer tales.

Session Type:

Workshop