Post

MCP Servers running on Kubernetes

MCP Servers running on Kubernetes

In the previous post, Build secure and cheap Amazon EKS Auto Mode I used cert-manager to obtain a wildcard certificate for the Ingress. This post will explore running various MCP servers in Kubernetes, aiming to power a web chat application like ChatGPT for data queries.

MCP Architecture

This post will guide you through the following steps:

  • ToolHive Installation: Setting up ToolHive, a secure manager for MCP servers in Kubernetes.
  • MCP Server Deployment: Deploying fetch and osv MCP servers.
  • LibreChat Installation: Installing and configuring LibreChat, a self-hosted web chat application.
  • vLLM Installation: Setting up vLLM, a high-throughput inference engine for Large Language Models.
  • Open WebUI Installation: Setting up Open WebUI, a user-friendly interface for chat interactions.

By the end of this tutorial, you’ll have a fully functional chat application powered by MCP servers and local LLM inference running on your EKS cluster.

Requirements

You will need the following environment variables. Replace the placeholder values with your actual credentials:

Variables used in the following steps:

1
2
3
4
5
6
7
8
export AWS_REGION="${AWS_REGION:-us-east-1}"
export CLUSTER_FQDN="${CLUSTER_FQDN:-k01.k8s.mylabs.dev}"
export CLUSTER_NAME="${CLUSTER_FQDN%%.*}"
export MY_EMAIL="petr.ruzicka@gmail.com"
export TMP_DIR="${TMP_DIR:-${PWD}}"
export KUBECONFIG="${KUBECONFIG:-${TMP_DIR}/${CLUSTER_FQDN}/kubeconfig-${CLUSTER_NAME}.conf}"
export TAGS="${TAGS:-Owner=${MY_EMAIL},Environment=dev,Cluster=${CLUSTER_FQDN}}"
mkdir -pv "${TMP_DIR}/${CLUSTER_FQDN}"

Install ToolHive

ToolHive is an open-source, lightweight, and secure manager for MCP (Model Context Protocol) servers, designed to simplify the deployment and management of AI model servers in Kubernetes environments.

ToolHive

Install toolhive-operator-crds and toolhive-operator helm charts.

Install the toolhive-operator-crds and toolhive-operator Helm charts:

1
2
3
4
5
6
# renovate: datasource=github-tags depName=stacklok/toolhive extractVersion=^toolhive-operator-crds-(?<version>.*)$
TOOLHIVE_OPERATOR_CRDS_HELM_CHART_VERSION="0.0.11"
helm upgrade --install --version="${TOOLHIVE_OPERATOR_CRDS_HELM_CHART_VERSION}" toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds
# renovate: datasource=github-tags depName=stacklok/toolhive extractVersion=^toolhive-operator-(?<version>.*)$
TOOLHIVE_OPERATOR_HELM_CHART_VERSION="0.1.8"
helm upgrade --install --version="${TOOLHIVE_OPERATOR_HELM_CHART_VERSION}" --namespace toolhive-system --create-namespace toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator

Deploy MCP Servers

Create a secret with your GitHub token and deploy the fetch and osv MCP servers:

1
2
3
# renovate: datasource=github-tags depName=stacklok/toolhive
TOOLHIVE_VERSION="0.2.0"
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v${TOOLHIVE_VERSION}/examples/operator/mcp-servers/mcpserver_fetch.yaml

Create the OSV MCP Servers:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
tee "${TMP_DIR}/${CLUSTER_FQDN}/k8s-toolhive-mcpserver-osv.yml" << EOF | kubectl apply -f -
apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPServer
metadata:
  name: osv
  namespace: toolhive-system
spec:
  image: ghcr.io/stackloklabs/osv-mcp/server
  transport: streamable-http
  port: 8080
  permissionProfile:
    type: builtin
    name: network
  resources:
    limits:
      cpu: 100m
      memory: 128Mi
    requests:
      cpu: 50m
      memory: 64Mi
EOF

Enabling Karpenter to Provision amd64 Node Pools

vLLM only works with Nvidia GPU and amd64-based CPU instances. To enable Karpenter to provision an amd64 node pool, create a new NodePool resource as shown below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
tee "${TMP_DIR}/${CLUSTER_FQDN}/k8s-karpenter-nodepool-amd64.yml" << EOF | kubectl apply -f -
apiVersion: eks.amazonaws.com/v1
kind: NodeClass
metadata:
  name: my-default-gpu
spec:
$(kubectl get nodeclasses default -o yaml | yq '.spec | pick(["role", "securityGroupSelectorTerms", "subnetSelectorTerms"])' | sed 's/\(.*\)/  \1/')
  ephemeralStorage:
    size: 40Gi
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: my-default-amd64-gpu
spec:
  template:
    spec:
      nodeClassRef:
        group: eks.amazonaws.com
        kind: NodeClass
        name: my-default-gpu
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: topology.kubernetes.io/zone
          operator: In
          values: ["${AWS_REGION}a"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["g4dn.xlarge"] # g4dn.xlarge: NVIDIA T4 GPU, 4 vCPUs, 16 GiB RAM, x86_64 architecture
      taints:
        - key: nvidia.com/gpu
          value: "true"
          effect: NoSchedule
  limits:
    cpu: 8
    memory: 32Gi
    nvidia.com/gpu: 2
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: my-default-amd64
spec:
  template:
    spec:
      nodeClassRef:
        group: eks.amazonaws.com
        kind: NodeClass
        name: my-default
      requirements:
        - key: eks.amazonaws.com/instance-category
          operator: In
          values: ["t"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: topology.kubernetes.io/zone
          operator: In
          values: ["${AWS_REGION}a"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
  limits:
    cpu: 8
    memory: 32Gi
EOF

Install vLLM

vLLM is a high-throughput and memory-efficient inference engine for Large Language Models (LLMs). It provides fast and scalable LLM serving with features like continuous batching, PagedAttention, and support for various model architectures.

vLLM

Install vllm helm chart and modify the default values.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# renovate: datasource=helm depName=vllm registryUrl=https://vllm-project.github.io/production-stack
VLLM_HELM_CHART_VERSION="0.1.5"

helm repo add vllm https://vllm-project.github.io/production-stack
cat > "${TMP_DIR}/${CLUSTER_FQDN}/helm_values-vllm.yml" << EOF
servingEngineSpec:
  runtimeClassName: ""
  modelSpec:
    - name: tinyllama-1-1b-chat-v1-0
      annotations:
        model: tinyllama-1-1b-chat-v1-0
      podAnnotations:
        model: tinyllama-1-1b-chat-v1-0
      repository: vllm/vllm-openai
      tag: latest
      modelURL: TinyLlama/TinyLlama-1.1B-Chat-v1.0
      replicaCount: 1
      requestCPU: 2
      requestMemory: 8Gi
      requestGPU: 1
      limitCPU: 8
      limitMemory: 32Gi
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/arch
              operator: In
              values: ["amd64"]
routerSpec:
  resources:
    requests:
      cpu: 1
      memory: 2Gi
    limits:
      cpu: 2
      memory: 4Gi
  nodeSelectorTerms:
    - matchExpressions:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
EOF
helm upgrade --install --version "${VLLM_HELM_CHART_VERSION}" --namespace vllm --create-namespace --values "${TMP_DIR}/${CLUSTER_FQDN}/helm_values-vllm.yml" vllm vllm/vllm-stack

Install LibreChat

LibreChat is an open-source, self-hosted web chat application designed as an enhanced alternative to ChatGPT. It supports multiple AI providers (including OpenAI, Azure, Google, and more), offers a user-friendly interface, conversation management, plugin support, and advanced features like prompt templates and file uploads.

LibreChat

Create librechat namespace and secrets with environment variables:

1
2
3
4
5
6
7
8
9
kubectl create namespace librechat
(
  set +x
  kubectl create secret generic --namespace librechat librechat-credentials-env \
    --from-literal=CREDS_KEY="$(openssl rand -hex 32)" \
    --from-literal=CREDS_IV="$(openssl rand -hex 16)" \
    --from-literal=JWT_SECRET="$(openssl rand -hex 32)" \
    --from-literal=JWT_REFRESH_SECRET="$(openssl rand -hex 32)"
)

Install librechat helm chart and modify the default values.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# renovate: datasource=helm depName=librechat registryUrl=https://charts.blue-atlas.de
LIBRECHAT_HELM_CHART_VERSION="1.8.10"

helm repo add librechat https://charts.blue-atlas.de
cat > "${TMP_DIR}/${CLUSTER_FQDN}/helm_values-librechat.yml" << EOF
librechat:
  # https://www.librechat.ai/docs/configuration/dotenv
  configEnv:
    ALLOW_EMAIL_LOGIN: "true"
    ALLOW_REGISTRATION: "true"
    ENDPOINTS: agents,custom
    existingSecretName: librechat-credentials-env
  # https://github.com/danny-avila/LibreChat/blob/main/librechat.example.yaml
  configYamlContent: |
    version: 1.2.1
    cache: true
    endpoints:
      custom:
        - name: vLLM
          apiKey: vllm
          baseURL: http://vllm-router-service.vllm.svc.cluster.local/v1
          models:
            default: ['TinyLlama/TinyLlama-1.1B-Chat-v1.0']
            fetch: true
    mcpServers:
      fetch:
        type: streamable-http
        url: http://mcp-fetch-proxy.toolhive-system.svc.cluster.local:8080/mcp
      osv:
        type: streamable-http
        url: http://mcp-osv-proxy.toolhive-system.svc.cluster.local:8080/mcp
  imageVolume:
    enabled: false
ingress:
  annotations:
    gethomepage.dev/enabled: "true"
    gethomepage.dev/description: LibreChat is an open-source, self-hosted web chat application designed as an enhanced alternative to ChatGPT
    gethomepage.dev/group: Apps
    gethomepage.dev/icon: https://raw.githubusercontent.com/danny-avila/LibreChat/8f20fb28e549949b05e8b164d8a504bc14c0951a/client/public/assets/logo.svg
    gethomepage.dev/name: LibreChat
    nginx.ingress.kubernetes.io/auth-url: https://oauth2-proxy.${CLUSTER_FQDN}/oauth2/auth
    nginx.ingress.kubernetes.io/auth-signin: https://oauth2-proxy.${CLUSTER_FQDN}/oauth2/start?rd=\$scheme://\$host\$request_uri
  hosts:
    - host: librechat.${CLUSTER_FQDN}
      paths:
        - path: /
          pathType: ImplementationSpecific
  tls:
    - hosts:
        - librechat.${CLUSTER_FQDN}
# https://github.com/bitnami/charts/blob/main/bitnami/mongodb/values.yaml
mongodb:
  nodeSelector:
    kubernetes.io/arch: amd64
meilisearch:
  enabled: false
EOF
helm upgrade --install --version "${LIBRECHAT_HELM_CHART_VERSION}" --namespace librechat --values "${TMP_DIR}/${CLUSTER_FQDN}/helm_values-librechat.yml" librechat librechat/librechat

LibreChat LibreChat

Install Open WebUI

Open WebUI is a user-friendly web interface for chat interactions.

Open WebUI

Install open-webui helm chart and modify the default values.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# renovate: datasource=helm depName=open-webui registryUrl=https://helm.openwebui.com
OPEN_WEBUI_HELM_CHART_VERSION="6.29.0"

helm repo add open-webui https://helm.openwebui.com/
cat > "${TMP_DIR}/${CLUSTER_FQDN}/helm_values-open-webui.yml" << EOF
ollama:
  enabled: false
pipelines:
  enabled: false
ingress:
  enabled: true
  annotations:
    gethomepage.dev/enabled: "true"
    gethomepage.dev/description: Open WebUI is a user friendly web interface for chat interactions.
    gethomepage.dev/group: Apps
    gethomepage.dev/icon: https://raw.githubusercontent.com/open-webui/open-webui/14a6c1f4963892c163821765efcc10c5c4578454/static/static/favicon.svg
    gethomepage.dev/name: Open WebUI
    nginx.ingress.kubernetes.io/auth-url: https://oauth2-proxy.${CLUSTER_FQDN}/oauth2/auth
    nginx.ingress.kubernetes.io/auth-signin: https://oauth2-proxy.${CLUSTER_FQDN}/oauth2/start?rd=\$scheme://\$host\$request_uri
  host: open-webui.${CLUSTER_FQDN}
extraEnvVars:
  - name: ADMIN_EMAIL
    value: ${MY_EMAIL}
  - name: ENV
    value: dev
  - name: WEBUI_URL
    value: https://open-webui.${CLUSTER_FQDN}
  - name: OPENAI_API_BASE_URL
    value: http://vllm-router-service.vllm.svc.cluster.local/v1
  - name: DEFAULT_MODELS
    value: TinyLlama/TinyLlama-1.1B-Chat-v1.0
  - name: ENABLE_EVALUATION_ARENA_MODELS
    value: "False"
  - name: ENABLE_CODE_INTERPRETER
    value: "False"
EOF
helm upgrade --install --version "${OPEN_WEBUI_HELM_CHART_VERSION}" --namespace open-webui --create-namespace --values "${TMP_DIR}/${CLUSTER_FQDN}/helm_values-open-webui.yml" open-webui open-webui/open-webui

Open WebUI Open WebUI

Enjoy … 😉

This post is licensed under CC BY 4.0 by the author.