Building a Multi-Agent Code Development Flow with Flowise on GKE

Tags:

This tutorial will provide instructions on how to deploy and use Flowise on GKE (Google Kubernetes Engine) to build and operate AI applications using a low-code/no-code approach.

Overview

This tutorial is designed for developers and platform engineers interested in leveraging Flowise on GKE for building customized LLM flows and AI agents, offloading resource-intensive tasks to a managed cluster. Managed clusters automatically handle the complex infrastructure requirements of AI applications like scaling GPU nodes for model inference, managing variable workloads, allowing developers to focus on building AI solutions rather than managing servers. Flowise is a low-code/no-code platform that enables developers to build and deploy AI applications and multi-agent systems through a visual drag-and-drop interface without extensive programming. As an example for this tutorial, we will create a multi-agent application that acts as a software development team with a software developer and a code reviewer which are coordinated by a supervisor. The example demonstrates Flowise’s core multi-agent coordination capabilities. This showcases how AI agents can collaborate on complex tasks—with a supervisor orchestrating between a code writer and reviewer. For more info, you may want to read the docs.

What will you learn

Provision required infrastructure automatically (using Terraform). The GKE Autopilot cluster is used by default.
Install Flowise on the GKE cluster.
Configure Flowise to build and manage LLM flows.
Build a sample AI application (e.g., a chatbot or agent) using Flowise components.
Serve the resulting AI application from the GKE cluster.

Filesystem structure

├── agentflow.json  # Example Flowise agentflow.
├── terraform/      # Terraform config that creates required infrastructure.
├───── cloudsql.tf
├───── default_env.tfvars
├───── main.tf
├───── network.tf
├───── outputs.tf
├───── variables.tf
├───── bucket.tf
└───── workload_identity.tf

Before you begin

Ensure you have a GCP project with billing enabled and enable the GKE API.
Ensure you have the following tools installed on your workstation
- gcloud CLI
- kubectl
- terraform
- helm

If you previously installed the gcloud CLI, get the latest version by running:

gcloud components update

Ensure that you are signed in using the gcloud CLI tool. Run the following command:

gcloud auth application-default login

Infrastructure Setup

Clone the repository

Clone the repository with our guides and cd to the flowise/ directory by running these commands:

git clone https://github.com/ai-on-gke/tutorials-and-examples.git
cd tutorials-and-examples/flowise

Enable Necessary APIs

Enable the APIs required for GKE, Artifact Registry, Cloud Build and Cloud Storage

gcloud services enable \
    container.googleapis.com \
    artifactregistry.googleapis.com \
    cloudbuild.googleapis.com \
    storage.googleapis.com

Create cluster and other resources

In this section we will use Terraform to automate the creation of infrastructure resources. For more details how it is done, please refer to the terraform config in the terraform/ folder. By default, the configuration provisions an Autopilot GKE cluster, but it can be changed to standard by setting autopilot_cluster = false.

It creates the following resources. For more information such as resource names and other details, please refer to the Terraform config:

Service Accounts:
1. Cluster IAM Service Account (derives name from a cluster name, e.g. tf-gke-<cluster name>) – manages permissions for the GKE cluster.
2. Application’s IAM Service Account (default name flowise-tf and can be changed in the terraform config) – manages permissions for the deployed application to access:
  - LLM models that is stored in a Cloud Storage bucket.
Cloud Storage Bucket to store data such as LLM model.
Artifact registry – stores container images for the application.
CloudSQL instance to store Flowise data such as chat messages, conversational context, etc. To verify that the data is persisted, you can verify it, for example, in the CloudSQL Studio after the tutorial is completed.

Go the the terraform directory:
```
cd terraform
```
Specify the following values inside the default_env.tfvars file (or make a separate copy):
<PROJECT_ID> – replace with your project id (you can find it in the project settings).
Init terraform modules:
```
terraform init
```
Optionally run the plan command to view an execution plan:
```
terraform plan -var-file=default_env.tfvars
```

Execute the plan:

terraform apply -var-file=default_env.tfvars

And you should see your resources created:


Apply complete! Resources: 25 added, 0 changed, 0 destroyed.

Outputs:

bucket_name = "flowise-tf"
cloudsql_database_name = "flowise"
cloudsql_database_secret_name = "db-secret"
cloudsql_database_user = "flowise"
cloudsql_instance_ip = "<ip>"
gke_cluster_location = "us-central1"
gke_cluster_name = "flowise-tf"
k8s_service_account_name = "flowise-tf"
project_id = "<your project_id>"

Configure your kubectl context:

gcloud container clusters get-credentials $(terraform output -raw gke_cluster_name) --region $(terraform output -raw gke_cluster_location) --project $(terraform output -raw project_id)

Deploy the Ollama to serve LLMs

Run this command in order to create a deployment manifest. The command will substitute required values from the terraform:

cat <<EOF > ../ollama-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
spec:
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
      annotations:
        gke-gcsfuse/volumes: 'true'
    spec:
      serviceAccount: $(terraform output -raw k8s_service_account_name)
      nodeSelector:
        cloud.google.com/gke-accelerator: nvidia-l4
      containers:
        - name: ollama
          image: ollama/ollama:latest
          ports:
            - containerPort: 11434
          volumeMounts:
            - name: ollama-data
              mountPath: /root/.ollama/
          resources:
            limits:
              nvidia.com/gpu: 1
      volumes:
        - name: ollama-data
          csi:
            driver: gcsfuse.csi.storage.gke.io
            volumeAttributes:
              bucketName: $(terraform output -raw bucket_name)
              mountOptions: implicit-dirs,only-dir=ollama
---
apiVersion: v1
kind: Service
metadata:
  name: ollama
spec:
  selector:
    app: ollama
  ports:
    - protocol: TCP
      port: 11434
      targetPort: 11434
EOF

Deploy the created manifest:

kubectl apply -f ../ollama-deployment.yml

Wait for Ollama is successfully deployed:

kubectl rollout status deployment/ollama

Pull the llama3.2 model within Ollama server pod:

kubectl exec $(kubectl get pod -l app=ollama -o name) -c ollama -- ollama pull llama3.2

Flowise Deployment and Configuration

Add Flowise helm repository:

helm repo add cowboysysop https://cowboysysop.github.io/charts/

Create yml file with values to customize the Flowise helm chart:

cat <<EOF > ../values.yml
externalPostgresql:
  enabled: true
  host: $(terraform output -raw cloudsql_instance_ip)
  port: 5432
  username: $(terraform output -raw cloudsql_database_user)
  existingSecret: $(terraform output -raw cloudsql_database_secret_name)
  existingSecretKeyPassword: password
  database: $(terraform output -raw cloudsql_database_name)
  readinessProbe:
    enabled: true
    initialDelaySeconds: 120
serviceAccount:
  create: false
  name: $(terraform output -raw k8s_service_account_name)
EOF

Install Flowise helm chart with the values from the file that was created previously. To learn more about the chart, please refer to its page. Especially for the templates and default values.
```
helm install flowise cowboysysop/flowise -f ../values.yml
```

Wait the completion of the deployment:

kubectl rollout status deployment/flowise

You can have a look at the running pods to verify that everything is deployed:

kubectl get pods

There have to be pods for Ollama and Flowise and the output should be similar to:

NAME                       READY   STATUS    RESTARTS      AGE
flowise-5c89d977b7-9lnw8   1/1     Running   0             24m
ollama-87557bbf4-5pk98     2/2     Running   0             30m

You can also verify that the respective services are created:

kubectl get svc

The output should be similar to:

NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
flowise      ClusterIP   34.118.227.187   <none>        3000/TCP    36m
kubernetes   ClusterIP   34.118.224.1     <none>        443/TCP     72m
ollama       ClusterIP   34.118.232.55    <none>        11434/TCP   43m

Forward port of the Flowise service in order to access its web UI:
```
kubectl port-forward svc/flowise 3000:3000
```
In case of errors, try looking at logs:
```
kubectl logs -l app.kubernetes.io/name=flowise
```

Trying multi-agent example

In the example we create an agentflow that uses LLMs from the Ollama service that we deployed earlier on the GKE cluster alongside with the Flowise. All nodes of the example agentflow use only locally deployed LLMs that are served by Ollama.

Load the example agentflow

Open web UI at http://localhost:3000
Create new agentflow by clicking on the “Add New” button in the “Agentflows” section:
Click on the Load Agents button and select our example agentflow from the file tutorials-and-examples/flowise/agentflow.json in the repository. This is basically a built-in Software Team template from the Flowise marketplace but it uses the Ollama Chat Model to access locally hosted LLMs.
Our example agentflow must be loaded now. Save and give it a name by clicking the save button in order to start using it:

star

Use the Max Loop Count parameter of the Loop nodes in order to prevent infinite loops.

Use the Agentflow

The Supervisor node has to process the initial prompt and make a task for the Software Engineer node. When the task is ready, the supervisor has to pass its result to the Code Reviewer node and repeat it until the code is approved.

Open the chat window …

… and expand it for convenience:
Enter the prompt. In our example, we prompt the supervisor to write a snake game.

When the flow is completed, you can see its visualization:

In the beginning the Supervisor requests to write the code from the Software Engineer worker:

Then the code is passed by the Supervisor to the Code Reviewer worker:

Then, after some iterations, the Code Reviewer should approve the code and the Generate Final Answer node has to print the result:

[Optional] Use as Embedding or API endpoint

The agentflow can be accessed through API for further automation or embedded into a webpage. For more info, check the docs.

Click on API Endpoint:
Choose one of the options. Use the API endpoint for programmatic integration into existing applications. Use embedding to add the agentflow directly into websites as an interactive chat widget.

Cleaning up

Destroy the provisioned infrastructure.

terraform destroy -var-file=default_env.tfvars

Troubleshooting

Some models are unable to work with agents.

If you change the model, be aware that it may not work properly with the agents and a prompt may just stop without any descriptive messages from the Flowise. In this case you can look at logs of the Flowise:

 kubectl logs -l app.kubernetes.io/name=flowise

or Ollama

 kubectl logs -l app=ollama

Timeout

By default, there is a 30 seconds timeout assigned to the proxy by GCP. This caused issue when the response is taking longer than 30 seconds threshold to return. In order to fix this issue, make the following changes:

Note: To set the timeout to be 10 minutes (for example) – we specify 600 seconds below.

Create a backendconfig.yaml file with the following content:

apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: flowise-backendconfig
  namespace: your-namespace
spec:
  timeoutSec: 600

Run the command:

kubectl apply -f backendconfig.yaml

Add the following reference to the BackendConfig as a Flowise’s service annotation by adding this line in the values.yml file:

service:
    annotations:
	cloud.google.com/backend-config: '{"default": "flowise-backendconfig"}'

You can also read about this issue on the GCP deployment docs.

Feedback

Was this page helpful?

Thank you for your feedback.

Continue reading:

Agent ADK using GKE Autopilot Cluster with Llama and vLLM

This tutorial demonstrates how to deploy the Llama-3.1-8B-Instruct model on Google Kubernetes Engine (GKE) and vLLM for efficient inference. Additionally, it shows how to integrate an ADK agent to interact with the model, supporting both basic chat completions and tool usage. The setup leverages a GKE Autopilot cluster to handle the computational requirements.

ADK VertexAI Example

This tutorial guides you through deploying a containerized agent built with the [Google Agent Development Kit (ADK)](https://google.github.io/adk-docs/) to [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine/docs/concepts/kubernetes-engine-overview). The agent uses [VertexAI](https://cloud.google.com/vertex-ai/docs) to access LLMs. GKE provides a managed environment for deploying, managing, and scaling your containerized applications using Google infrastructure.

Agent ADK on GKE with Ray with Llama and vLLM

This tutorial demonstrates how to deploy the Llama-3.1-8B-Instruct model on Google Kubernetes Engine (GKE) using Ray Serve and vLLM for efficient inference. Additionally, it shows how to integrate an ADK agent to interact with the model, supporting both basic chat completions and tool usage. The setup leverages a GKE Standard cluster with GPU-enabled nodes to handle the computational requirements.

Fine-Tuning ESM2

This sample walks through setting up a Google Cloud GKE environment to fine-tune ESM2 (Evolutionary Scale Modeling) using NVIDIA BioNeMo Framework 2.0

Metaflow

This tutorial will provide instructions on how to deploy and use the [Metaflow](https://docs.metaflow.org/) framework on GKE (Google Kubernetes Engine) and operate AI/ML workloads using [Argo-Workflows](https://argo-workflows.readthedocs.io/en/latest/).

MLFlow

In this tutorial we will fine-tune gemma-2-9b using LoRA as an experiment in MLFlow. We will deploy MLFlow on a GKE cluster and set up MLFlow to store artifacts inside a GCS bucket. In the end, we will deploy a fine-tuned model using KServe.