Slack bot for answering questions from Confluence using Amazon Bedrock
Slack bot that answers questions from your Confluence wiki using Amazon Bedrock Knowledge Base, OpenSearch Serverless, and LiteLLM on Amazon EKS, provisioned with OpenTofu.
This post builds directly on top of Amazon EKS with Open WebUI and AWS Bedrock managed by OpenTofu, which provisions an Amazon EKS cluster with LiteLLM (an OpenAI-compatible proxy in front of AWS Bedrock), Envoy Gateway, cert-manager, ExternalDNS and Karpenter - all driven by OpenTofu. That cluster is the foundation here; this post adds a Retrieval-Augmented Generation (RAG) pipeline on top of it.
The goal is a Slack bot that answers from your own Confluence wiki. An Amazon Bedrock Knowledge Base crawls a single Confluence space, embeds the pages with Amazon Titan and stores the vectors in Amazon OpenSearch Serverless. LiteLLM then retrieves the relevant chunks on every question and feeds them to a Bedrock model, while Collmbo - a small Slack bot - is the chat front-end. Collmbo sends a plain chat request and all the RAG wiring lives in LiteLLM and AWS.
The setup should align with the following criteria:
- An Amazon Bedrock Knowledge Base indexes one configurable Confluence space (the space key is an OpenTofu variable)
- Confluence credentials are rendered by OpenTofu into an AWS Secrets Manager secret - the only credential store the Bedrock connector accepts
- Amazon OpenSearch Serverless is the vector store backing the Knowledge Base (the only store the Confluence connector supports)
- LiteLLM performs RAG server-side through an always-on vector store attached to a model - the bot passes no extra parameters
- Collmbo runs as a single container (
ghcr.io/iwamot/collmbo) deployed by OpenTofu into the existing EKS cluster - Collmbo talks to a dedicated LiteLLM release inside the cluster through the OpenAI-compatible API (
http://litellm-kb.litellm-kb.svc:4000/v1) - no model credentials in the bot - Slack connectivity uses Socket Mode, so the bot needs no public endpoint and no inbound load balancer
Architecture
flowchart TD
User(["fa:fa-user Slack User"])
Slack["fa:fa-slack Slack"]
Confluence["fa:fa-confluence Confluence Cloud"]
User -- "message / @mention" --> Slack
Slack <== "Socket Mode (outbound WebSocket)" ==> CO
CO -- "OpenAI API" --> LL
LL -- "Retrieve" --> KB
KB -- "vectors" --> AOSS
LL -- "Converse (+context)\nSigV4" --> BR
BR --> GR
KB -. "crawl + embed (ingestion job)" .-> Confluence
subgraph AWS["fa:fa-cloud AWS us-east-1"]
BR["fa:fa-brain Amazon Bedrock"]
GR["fa:fa-lock Bedrock Guardrail"]
KB["fa:fa-book Bedrock Knowledge Base"]
AOSS["fa:fa-database OpenSearch Serverless"]
SM["fa:fa-key Secrets Manager\n(Confluence creds)"]
subgraph EKS["fa:fa-dharmachakra Amazon EKS"]
CO["fa:fa-robot Collmbo"]
LL["fa:fa-server LiteLLM (litellm-kb)"]
end
end
KB -. "reads creds" .-> SM
The request flow:
- A user sends a message in Slack (direct message or
@mentionin a channel). - Slack delivers the event to Collmbo over the existing Socket Mode WebSocket.
- Collmbo builds an OpenAI-format chat request and sends it to the in-cluster
litellm-kbservice (http://litellm-kb.litellm-kb.svc:4000/v1), selecting the knowledge-base-attached model (...-kb). - Because that model has a vector store attached, LiteLLM calls the Bedrock Knowledge Base
RetrieveAPI, which searches the Confluence vectors in OpenSearch Serverless and returns the most relevant chunks. - LiteLLM prepends the retrieved chunks to the conversation and calls Amazon Bedrock with SigV4 (via EKS Pod Identity), applying the Bedrock Guardrail configured in the base post.
- The grounded answer travels back through LiteLLM to Collmbo, which posts it to the Slack thread.
Separately, on a schedule (or on demand), the Knowledge Base ingestion job crawls the configured Confluence space, chunks and embeds every page with Titan, and writes the vectors into OpenSearch Serverless - this is what populates the store the Retrieve step reads.
Requirements
This post assumes the cluster from Amazon EKS with Open WebUI and AWS Bedrock managed by OpenTofu is already described - the same OpenTofu working directory (${TMP_DIR}/${CLUSTER_FQDN}), S3 backend, providers (helm, kubectl) and the litellm release are reused. You will need the AWS CLI configured exactly as in that post:
1
2
3
4
# AWS Credentials
export AWS_ACCESS_KEY_ID="xxxxxxxxxxxxxxxxxx"
export AWS_SECRET_ACCESS_KEY="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
export AWS_SESSION_TOKEN="xxxxxxxx"
The environment variables (AWS_REGION, CLUSTER_FQDN, TMP_DIR, TF_VAR_cluster_fqdn, TF_VAR_tags, the Google OIDC variables, …) are the same ones exported in the base post and are not repeated here.
The Amazon Titan Text Embeddings V2 model must be enabled in the Bedrock console for the target region (one-time, per account) - the Knowledge Base uses it to embed Confluence pages.
You also need a Confluence Cloud site (a free *.atlassian.net instance works), at least one space with a few pages, and an Atlassian API token for basic authentication. Map them - together with the “space key” to index - to the TF_VAR_* names the OpenTofu code expects (in CI these come from secrets):
1
2
3
4
export TF_VAR_confluence_url="${MY_CONFLUENCE_URL:-https://mylabsdev.atlassian.net}"
export TF_VAR_confluence_username="${MY_CONFLUENCE_EMAIL:-petr.ruzicka@gmail.com}"
export TF_VAR_confluence_api_token="${MY_CONFLUENCE_API_TOKEN:-${MY_ATLASSIAN_PERSONAL_TOKEN:-confluence-api-token-placeholder}}"
export TF_VAR_confluence_space_key="${MY_CONFLUENCE_SPACE_KEY:-myspace}"
Install the required tools:
1
mise use opentofu@1.12.1 aws@2.35.2 kubectl@1.36.1
Create a Slack App
Creating the app, installing it to a workspace, and copying the Bot User OAuth Token (xoxb-...) are the same steps described in Amazon Bedrock AgentCore Slack Bot deployed with OpenTofu under Create a Slack App - follow that section, with two Collmbo-specific differences:
- Use a manifest When creating the app, choose From an app manifest (instead of From scratch) and paste Collmbo’s
manifest.yml. It already enables Socket Mode and sets the required bot scopes. - Generate an App-Level Token Socket Mode needs an extra token: go to Settings > Basic Information > App-Level Tokens, choose Generate Token and Scopes, add the
connections:writescope, and copy the App-Level Token (xapp-1-...).
Slack permissions and events
The manifest is the source of truth for the bot’s permissions, so creating the app From an app manifest is strongly recommended. Collmbo’s manifest.yml sets exactly the following.
Bot token scopes (oauth_config.scopes.bot):
| Scope | Why Collmbo needs it |
|---|---|
channels:history | Read messages in public channels it is invited to |
groups:history | Read messages in private channels |
im:history | Read direct messages |
mpim:history | Read group direct messages |
im:write | Open direct-message conversations |
chat:write | Post replies |
chat:write.public | Post in public channels without being a member |
files:read | Read uploaded images / PDFs for multimodal input |
files:write | Upload files back to Slack |
users:read | Look up the sender’s locale (the set_locale middleware) |
Subscribed bot events (settings.event_subscriptions.bot_events):
app_home_openedmessage.channelsmessage.groupsmessage.immessage.mpim
Map secrets to the TF_VAR_* names Collmbo’s OpenTofu code expects:
1
2
export TF_VAR_slack_app_token="${MY_SLACK_APP_TOKEN:-xapp-placeholder}"
export TF_VAR_slack_bot_token="${MY_SLACK_BOT_TOKEN:-xoxb-placeholder}"
OpenTofu Code
All resources below are added to the same OpenTofu working directory used by the base post (${TMP_DIR}/${CLUSTER_FQDN}), so they share its state and providers and can reference its resources directly (the EKS module, the Bedrock guardrail aws_bedrock_guardrail.ai_safety, and the cluster add-ons).
Knowledge Base and Collmbo variables
Add the Slack token variables plus the Confluence connection variables. They are all populated from the TF_VAR_* environment variables exported above:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
tee "${TMP_DIR}/${CLUSTER_FQDN}/collmbo-variables.tf" << \EOF
variable "slack_app_token" {
description = "Slack App-Level Token (xapp-1-...) used for Socket Mode"
type = string
sensitive = true
}
variable "slack_bot_token" {
description = "Slack Bot User OAuth Token (xoxb-...)"
type = string
sensitive = true
}
variable "confluence_url" {
description = "Confluence Cloud base URL (must end in .atlassian.net)"
type = string
}
variable "confluence_username" {
description = "Atlassian account email used for basic authentication"
type = string
}
variable "confluence_api_token" {
description = "Atlassian API token used in place of the password"
type = string
sensitive = true
}
variable "confluence_space_key" {
description = "Key of the single Confluence space to index (e.g. DOCS)"
type = string
}
EOF
Confluence credentials in Secrets Manager
The Bedrock Confluence connector reads its credentials only from AWS Secrets Manager (Parameter Store is not accepted). For basic authentication the secret must contain username (the account email), password (the API token) and hostUrl. OpenTofu renders it from the variables above:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
tee "${TMP_DIR}/${CLUSTER_FQDN}/kb-confluence-secret.tf" << \EOF
resource "aws_secretsmanager_secret" "confluence" {
name = "${local.cluster_name}-confluence"
description = "Confluence basic-auth credentials for the Bedrock Knowledge Base"
recovery_window_in_days = 0
}
resource "aws_secretsmanager_secret_version" "confluence" {
secret_id = aws_secretsmanager_secret.confluence.id
secret_string = jsonencode({
username = var.confluence_username
password = var.confluence_api_token
hostUrl = var.confluence_url
})
}
EOF
OpenSearch Serverless vector store
The Confluence connector is only supported with an Amazon OpenSearch Serverless vector store (S3 Vectors is not accepted for managed connectors), so the Knowledge Base stores its embeddings in an OpenSearch Serverless collection of type VECTORSEARCH.
The collection needs three policies before it can be created or used: an encryption policy (required before creation), a network policy, and a data access policy granting both the OpenTofu caller (to create the index) and the Knowledge Base role (to read/write vectors):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
tee "${TMP_DIR}/${CLUSTER_FQDN}/kb-opensearch.tf" << \EOF
locals {
aoss_collection_name = "${local.cluster_name}-confluence"
# Field names / dimensions shared by the index and the Knowledge Base mapping.
aoss_index_name = "bedrock-knowledge-base-index"
aoss_vector_field = "bedrock-knowledge-base-vector"
aoss_text_field = "AMAZON_BEDROCK_TEXT_CHUNK"
aoss_metadata_field = "AMAZON_BEDROCK_METADATA"
aoss_vector_dimension = 1024
# Convert assumed-role STS ARN to IAM role ARN for the AOSS principal:
# arn:aws:sts::<acct>:assumed-role/<role>/<session>
# -> arn:aws:iam::<acct>:role/<role>. Roles with paths (for example
# AWSReservedSSO_*) may need a manual principal.
aoss_caller_arn = replace(
replace(data.aws_caller_identity.current.arn, "/:sts::(\\d+):assumed-role/([^/]+)/.*/", ":iam::$1:role/$2"),
"/^(arn:aws:iam::\\d+:role/[^/]+)/.*/", "$1"
)
}
resource "aws_opensearchserverless_security_policy" "encryption" {
name = local.aoss_collection_name
type = "encryption"
policy = jsonencode({
Rules = [{ Resource = ["collection/${local.aoss_collection_name}"], ResourceType = "collection" }]
AWSOwnedKey = true
})
}
resource "aws_opensearchserverless_security_policy" "network" {
name = local.aoss_collection_name
type = "network"
policy = jsonencode([{
Rules = [
{ Resource = ["collection/${local.aoss_collection_name}"], ResourceType = "collection" },
{ Resource = ["collection/${local.aoss_collection_name}"], ResourceType = "dashboard" },
]
AllowFromPublic = true
}])
}
resource "aws_opensearchserverless_collection" "confluence" {
name = local.aoss_collection_name
type = "VECTORSEARCH"
standby_replicas = "DISABLED"
depends_on = [aws_opensearchserverless_security_policy.encryption]
}
resource "aws_opensearchserverless_access_policy" "data" {
name = local.aoss_collection_name
type = "data"
policy = jsonencode([{
Rules = [
{ Resource = ["collection/${local.aoss_collection_name}"], Permission = ["aoss:*"], ResourceType = "collection" },
{ Resource = ["index/${local.aoss_collection_name}/*"], Permission = ["aoss:*"], ResourceType = "index" },
]
Principal = [
local.aoss_caller_arn,
aws_iam_role.bedrock_kb.arn,
]
}])
}
EOF
OpenSearch index (created out of band)
Bedrock expects the vector index to already exist in the collection, but there is no Terraform, CloudFormation or CDK resource that creates an OpenSearch Serverless index - only the AWS CLI can.
A terraform_data resource runs aws opensearchserverless create-index with the kNN schema and is re-run whenever the collection is replaced:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
tee "${TMP_DIR}/${CLUSTER_FQDN}/kb-index.tf" << \EOF
locals {
aoss_index_schema = jsonencode({
settings = { index = { knn = true } }
mappings = {
properties = {
(local.aoss_vector_field) = {
type = "knn_vector"
dimension = local.aoss_vector_dimension
method = { name = "hnsw", engine = "faiss", space_type = "l2" }
}
(local.aoss_text_field) = { type = "text" }
(local.aoss_metadata_field) = { type = "text", index = false }
}
}
})
}
resource "terraform_data" "aoss_index" {
triggers_replace = [aws_opensearchserverless_collection.confluence.id]
provisioner "local-exec" {
interpreter = ["/bin/bash", "-c"]
command = <<-CMD
set -euo pipefail
ID=${aws_opensearchserverless_collection.confluence.id}
NAME=${local.aoss_index_name}
# Retry create-index (the data access policy is eventually consistent) and
# consider it done once get-index can read it back. This covers propagation
# delay, an already-existing index, and the index becoming queryable.
for i in $(seq 1 20); do
aws opensearchserverless create-index --id "$ID" --index-name "$NAME" \
--index-schema ${jsonencode(local.aoss_index_schema)} 2> /dev/null || true
if aws opensearchserverless get-index --id "$ID" --index-name "$NAME" > /dev/null 2>&1; then
echo "index ready"
exit 0
fi
echo "waiting for index (attempt $i)..."
sleep 15
done
echo "index did not become ready in time" >&2
exit 1
CMD
}
depends_on = [aws_opensearchserverless_access_policy.data]
}
EOF
Bedrock Knowledge Base and Confluence data source
The Knowledge Base ties everything together: an IAM role Bedrock assumes (with permission to read the Secrets Manager secret, embed with Titan, and access the OpenSearch collection), the vector configuration, and a Confluence data source restricted to the single space via a Space inclusion filter on the space key:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
tee "${TMP_DIR}/${CLUSTER_FQDN}/kb.tf" << \EOF
data "aws_iam_policy_document" "bedrock_kb_assume" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["bedrock.amazonaws.com"]
}
condition {
test = "StringEquals"
variable = "aws:SourceAccount"
values = [data.aws_caller_identity.current.account_id]
}
}
}
data "aws_iam_policy_document" "bedrock_kb" {
statement {
sid = "ReadConfluenceSecret"
actions = ["secretsmanager:GetSecretValue"]
resources = [aws_secretsmanager_secret.confluence.arn]
}
statement {
sid = "InvokeEmbeddingModel"
actions = ["bedrock:InvokeModel"]
resources = ["arn:aws:bedrock:*::foundation-model/amazon.titan-embed-text-v2:0"]
}
statement {
sid = "AccessOpenSearch"
actions = ["aoss:APIAccessAll"]
resources = [aws_opensearchserverless_collection.confluence.arn]
}
}
resource "aws_iam_role" "bedrock_kb" {
name = "${local.cluster_name}-bedrock-kb"
assume_role_policy = data.aws_iam_policy_document.bedrock_kb_assume.json
}
resource "aws_iam_role_policy" "bedrock_kb" {
name = "bedrock-kb"
role = aws_iam_role.bedrock_kb.id
policy = data.aws_iam_policy_document.bedrock_kb.json
}
resource "aws_bedrockagent_knowledge_base" "confluence" {
name = "${local.cluster_name}-confluence"
role_arn = aws_iam_role.bedrock_kb.arn
knowledge_base_configuration {
type = "VECTOR"
vector_knowledge_base_configuration {
embedding_model_arn = "arn:aws:bedrock:${data.aws_region.current.region}::foundation-model/amazon.titan-embed-text-v2:0"
}
}
storage_configuration {
type = "OPENSEARCH_SERVERLESS"
opensearch_serverless_configuration {
collection_arn = aws_opensearchserverless_collection.confluence.arn
vector_index_name = local.aoss_index_name
field_mapping {
vector_field = local.aoss_vector_field
text_field = local.aoss_text_field
metadata_field = local.aoss_metadata_field
}
}
}
depends_on = [terraform_data.aoss_index]
}
resource "aws_bedrockagent_data_source" "confluence" {
name = "confluence-${var.confluence_space_key}"
knowledge_base_id = aws_bedrockagent_knowledge_base.confluence.id
# RETAIN avoids Bedrock DELETE_UNSUCCESSFUL during `tofu destroy`: collection
# teardown removes vectors anyway, so explicit per-vector purge is unnecessary.
data_deletion_policy = "RETAIN"
data_source_configuration {
type = "CONFLUENCE"
confluence_configuration {
source_configuration {
host_url = var.confluence_url
host_type = "SAAS"
auth_type = "BASIC"
credentials_secret_arn = aws_secretsmanager_secret.confluence.arn
}
crawler_configuration {
filter_configuration {
type = "PATTERN"
pattern_object_filter {
filters {
object_type = "Space"
inclusion_filters = ["^${var.confluence_space_key}$"]
}
}
}
}
}
}
}
# Start a Bedrock ingestion job when the data source is created/replaced.
# Terraform has no StartIngestionJob resource, so terraform_data runs the AWS CLI.
resource "terraform_data" "confluence_ingestion" {
triggers_replace = [aws_bedrockagent_data_source.confluence.data_source_id]
provisioner "local-exec" {
interpreter = ["/bin/bash", "-c"]
command = <<-CMD
set -euo pipefail
aws bedrock-agent start-ingestion-job \
--knowledge-base-id ${aws_bedrockagent_knowledge_base.confluence.id} \
--data-source-id ${aws_bedrockagent_data_source.confluence.data_source_id} \
--query 'ingestionJob.{id:ingestionJobId,status:status}'
CMD
}
depends_on = [aws_bedrockagent_data_source.confluence]
}
output "knowledge_base_id" {
description = "Bedrock Knowledge Base ID"
value = aws_bedrockagent_knowledge_base.confluence.id
}
output "data_source_id" {
description = "Confluence data source ID"
value = aws_bedrockagent_data_source.confluence.data_source_id
}
EOF
Dedicated LiteLLM release for RAG
The base post owns the litellm Helm release and is left untouched. Rather than mutating that release at runtime, this post deploys a second, dedicated LiteLLM release (litellm-kb) whose proxy_config declares the Knowledge Base wiring natively in the chart values: a vector_store_registry entry for the Bedrock Knowledge Base and an always-on ...-kb model that references it through vector_store_ids. Every request to that model runs retrieve → augment → generate server-side.
Because this
litellm-kbrelease is backed by a database, LiteLLM reads avector_store_idfrom theLiteLLM_ManagedVectorStorestable, notvector_store_registry(BerriAI/litellm#25947): an absent ID silently returns no context. A small idempotentJobtherefore registers the Knowledge Base via the admin API, shipped throughextraResourcesas apost-install,post-upgradehook.
It is a self-contained copy of the base proxy (own namespace, Pod Identity, PostgreSQL and master key), so the only thing it shares with the base post is the Bedrock guardrail and Knowledge Base it points at. The dedicated pod role grants the Bedrock invoke/guardrail permissions plus bedrock:Retrieve for the Knowledge Base:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
tee "${TMP_DIR}/${CLUSTER_FQDN}/litellm-kb.tf" << \EOF
# Master key for the dedicated LiteLLM-KB release (independent from the base one).
resource "random_password" "litellm_kb_master_key" {
length = 32
special = false
}
# IAM policy for the litellm-kb pod: Bedrock invoke with guardrail + KB Retrieve.
data "aws_iam_policy_document" "litellm_kb" {
statement {
sid = "BedrockInvoke"
actions = [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream",
"bedrock:Converse",
"bedrock:ConverseStream",
]
resources = [
"arn:aws:bedrock:*::foundation-model/*",
"arn:aws:bedrock:*:*:inference-profile/*",
]
condition {
test = "StringEquals"
variable = "bedrock:GuardrailIdentifier"
values = [aws_bedrock_guardrail.ai_safety.guardrail_arn]
}
}
statement {
sid = "BedrockApplyGuardrail"
actions = ["bedrock:ApplyGuardrail"]
resources = [aws_bedrock_guardrail.ai_safety.guardrail_arn]
}
statement {
sid = "BedrockListAndGet"
actions = [
"bedrock:ListFoundationModels",
"bedrock:GetFoundationModel",
"bedrock:ListInferenceProfiles",
"bedrock:GetInferenceProfile",
]
resources = ["*"]
}
statement {
sid = "BedrockRetrieve"
actions = ["bedrock:Retrieve"]
resources = [aws_bedrockagent_knowledge_base.confluence.arn]
}
}
module "litellm_kb_pod_identity" {
source = "terraform-aws-modules/eks-pod-identity/aws"
# renovate: datasource=terraform-module depName=terraform-aws-modules/eks-pod-identity/aws
version = "2.8.1"
name = "${local.cluster_name}-litellm-kb"
attach_custom_policy = true
source_policy_documents = [data.aws_iam_policy_document.litellm_kb.json]
associations = {
main = {
cluster_name = module.eks.cluster_name
namespace = "litellm-kb"
service_account = "litellm-kb"
}
}
}
resource "helm_release" "litellm_kb" {
# renovate: datasource=docker depName=docker.litellm.ai/berriai/litellm-helm
version = "1.89.2"
name = "litellm-kb"
chart = "oci://docker.litellm.ai/berriai/litellm-helm"
namespace = "litellm-kb"
create_namespace = true
wait = true
values = [<<-YAML
replicaCount: 1
image:
repository: ghcr.io/berriai/litellm-database
pullPolicy: Always
resources:
requests:
memory: 1Gi
masterkey: sk-${random_password.litellm_kb_master_key.result}
serviceAccount:
create: true
name: litellm-kb
service:
port: 4000
db:
deployStandalone: true
postgresql:
image:
tag: latest
auth:
password: ${random_password.litellm_kb_master_key.result}
postgres-password: ${random_password.litellm_kb_master_key.result}
disableSchemaUpdate: false
migrationJob:
enabled: false
proxy_config:
model_list:
# Always-on RAG model: every call queries the Bedrock Knowledge Base
# (Retrieve) and prepends the context before invoking Bedrock.
- model_name: us.anthropic.claude-haiku-4-5-20251001-v1:0-kb
litellm_params:
model: bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0
aws_region_name: ${data.aws_region.current.region}
vector_store_ids: ["${aws_bedrockagent_knowledge_base.confluence.id}"]
# Claude on Bedrock rejects temperature+top_p together.
# Collmbo sends temperature, so drop top_p.
additional_drop_params: ["top_p"]
guardrailConfig:
guardrailIdentifier: ${aws_bedrock_guardrail.ai_safety.guardrail_arn}
guardrailVersion: "DRAFT"
trace: "disabled"
vector_store_registry:
- vector_store_name: confluence-knowledge-base
litellm_params:
vector_store_id: ${aws_bedrockagent_knowledge_base.confluence.id}
custom_llm_provider: bedrock
vector_store_description: Confluence pages indexed by Amazon Bedrock
litellm_settings:
drop_params: true
general_settings:
store_model_in_db: true
store_prompts_in_spend_logs: true
# With a database attached, LiteLLM reads vector stores from the
# LiteLLM_ManagedVectorStores table, not "vector_store_registry"
# an unregistered KB ID falls back to the OpenAI provider and silently
# loses RAG context. This idempotent post-install,post-upgrade hook Job
# writes the DB row via the admin API.
extraResources:
- apiVersion: batch/v1
kind: Job
metadata:
name: litellm-kb-vector-store-register
namespace: litellm-kb
annotations:
# Run after the release each install/upgrade, once the proxy exists,
# and self-delete on success so the next upgrade re-runs it.
helm.sh/hook: post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
app.kubernetes.io/name: litellm-kb-vector-store-register
spec:
# The script already waits for proxy readiness, so retries here only
# cover genuine registration errors; keep the default backoff so real
# failures surface instead of being retried many times.
backoffLimit: 6
ttlSecondsAfterFinished: 600
template:
metadata:
labels:
app.kubernetes.io/name: litellm-kb-vector-store-register
spec:
restartPolicy: OnFailure
securityContext:
runAsNonRoot: true
runAsUser: 65534
seccompProfile:
type: RuntimeDefault
containers:
- name: register
# renovate: datasource=docker depName=curlimages/curl
image: curlimages/curl:8.18.0
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
env:
- name: MASTER_KEY
valueFrom:
secretKeyRef:
name: litellm-kb-masterkey
key: masterkey
- name: KB_ID
value: "${aws_bedrockagent_knowledge_base.confluence.id}"
- name: BASE_URL
value: http://litellm-kb.litellm-kb.svc:4000
command:
- /bin/sh
- -c
- |
set -eu
echo "Waiting for LiteLLM proxy to be ready..."
until curl -sf -m 5 "$${BASE_URL}/health/readiness" > /dev/null; do
sleep 5
done
# Register the Knowledge Base as a bedrock vector store.
# /vector_store/new returns 200 on success and an "already
# exists" error on re-runs, so the POST response alone is
# enough to be idempotent - no separate existence check.
echo "Registering vector store $${KB_ID} (provider: bedrock)..."
RESP="$(curl -s -m 30 -w '\n%%{http_code}' "$${BASE_URL}/vector_store/new" \
-H "Authorization: Bearer $${MASTER_KEY}" \
-H "Content-Type: application/json" \
-d "{\"vector_store_id\":\"$${KB_ID}\",\"custom_llm_provider\":\"bedrock\",\"vector_store_name\":\"confluence-knowledge-base\",\"vector_store_description\":\"Confluence pages indexed by Amazon Bedrock\"}")"
CODE="$(printf '%s' "$${RESP}" | tail -n1)"
echo "Response: $${RESP}"
if [ "$${CODE}" = "200" ]; then
echo "Registered successfully."
elif printf '%s' "$${RESP}" | grep -q "already exists"; then
echo "Already exists - treating as success."
else
echo "Registration failed." >&2
exit 1
fi
YAML
]
depends_on = [
kubectl_manifest.nodepool_default,
module.litellm_kb_pod_identity,
helm_release.cert_manager,
aws_bedrockagent_knowledge_base.confluence,
]
}
# HTTPRoute exposes the litellm-kb API through the Envoy Gateway at
# litellm-kb.${var.cluster_fqdn} (the base post owns the plain "litellm" host).
resource "kubectl_manifest" "litellm_kb_httproute" {
yaml_body = <<-YAML
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: litellm-kb
namespace: litellm-kb
spec:
parentRefs:
- name: eg
namespace: envoy-gateway-system
sectionName: https
hostnames:
- litellm-kb.${var.cluster_fqdn}
rules:
- backendRefs:
- name: litellm-kb
port: 4000
YAML
depends_on = [
helm_release.litellm_kb,
kubectl_manifest.gateway,
]
}
EOF
Collmbo Deployment
Collmbo is published only as a container image (there is no Helm chart), so it is deployed with a plain Kubernetes Deployment through the alekc/kubectl provider. The key wiring is in the environment variables:
LLM_MODELis set to the knowledge-base-attached model (openai/us.anthropic.claude-haiku-4-5-...-kb). Theopenai/prefix routes the call through LiteLLM’s OpenAI-compatible path; the-kbsuffix selects the model that performs RAG server-side. This single value is the only Collmbo-side change needed for Confluence answers.OPENAI_API_BASEpoints at the in-clusterlitellm-kbService.OPENAI_API_KEYuses thelitellm-kbmaster key (random_password.litellm_kb_master_key) so no extra secret is invented.SLACK_FORMATTING_ENABLEDlets Collmbo rewrite the model’s inline emphasis into Slackmrkdwn(**bold**->*bold*,*italic*->_italic_) before posting, so replies render with Slack formatting rather than raw Markdown.
The image also ships a default config/mcp.yml that enables the public AWS Knowledge MCP server. Collmbo loads those tools at startup and passes them to the model on every request, and their AWS-centric descriptions nudge the bot into answering as an AWS assistant regardless of the question. A ConfigMap with an empty servers: [] list is mounted over that file to disable MCP - this also keeps the tool list empty so it never competes with Knowledge Base retrieval.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
tee "${TMP_DIR}/${CLUSTER_FQDN}/collmbo.tf" << \EOF
# Dedicated namespace for the bot.
resource "kubectl_manifest" "collmbo_namespace" {
yaml_body = <<-YAML
apiVersion: v1
kind: Namespace
metadata:
name: collmbo
YAML
}
# The image's default /app/config/mcp.yml enables the AWS Knowledge MCP server,
# whose tools bias every reply toward AWS. Mount an empty list to disable MCP.
resource "kubectl_manifest" "collmbo_mcp_config" {
yaml_body = <<-YAML
apiVersion: v1
kind: ConfigMap
metadata:
name: collmbo-mcp
namespace: collmbo
data:
mcp.yml: |
servers: []
YAML
depends_on = [kubectl_manifest.collmbo_namespace]
}
# Secret holding the Slack Socket Mode tokens and the litellm-kb master key
# reused as the OpenAI-compatible API key. Rendered from OpenTofu variables.
resource "kubectl_manifest" "collmbo_secret" {
yaml_body = <<-YAML
apiVersion: v1
kind: Secret
metadata:
name: collmbo
namespace: collmbo
type: Opaque
stringData:
SLACK_APP_TOKEN: ${var.slack_app_token}
SLACK_BOT_TOKEN: ${var.slack_bot_token}
OPENAI_API_KEY: sk-${random_password.litellm_kb_master_key.result}
YAML
sensitive_fields = ["stringData"]
depends_on = [kubectl_manifest.collmbo_namespace]
}
# Collmbo Deployment. The container makes only outbound connections (Slack
# Socket Mode + LiteLLM), so it exposes no ports and needs no Service.
resource "kubectl_manifest" "collmbo_deployment" {
yaml_body = <<-YAML
apiVersion: apps/v1
kind: Deployment
metadata:
name: collmbo
namespace: collmbo
labels:
app.kubernetes.io/name: collmbo
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: collmbo
template:
metadata:
labels:
app.kubernetes.io/name: collmbo
spec:
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
containers:
- name: collmbo
# renovate: datasource=docker depName=ghcr.io/iwamot/collmbo
image: ghcr.io/iwamot/collmbo:11.0.16
env:
# Route through LiteLLM's OpenAI-compatible endpoint to the
# knowledge-base-attached model so every answer is grounded in
# Confluence (RAG happens inside LiteLLM, not in the bot).
- name: LLM_MODEL
value: openai/us.anthropic.claude-haiku-4-5-20251001-v1:0-kb
- name: OPENAI_API_BASE
value: http://litellm-kb.litellm-kb.svc:4000/v1
- name: SLACK_FORMATTING_ENABLED
value: "true"
envFrom:
- secretRef:
name: collmbo
resources:
requests:
cpu: 50m
memory: 256Mi
limits:
memory: 512Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
volumeMounts:
- name: tmp
mountPath: /tmp
# Disable the bundled AWS Knowledge MCP server (empty server list)
- name: mcp-config
mountPath: /app/config/mcp.yml
subPath: mcp.yml
readOnly: true
volumes:
- name: tmp
emptyDir: {}
- name: mcp-config
configMap:
name: collmbo-mcp
YAML
depends_on = [
kubectl_manifest.collmbo_secret,
kubectl_manifest.collmbo_mcp_config,
helm_release.litellm_kb,
]
}
# Restrict Collmbo egress: cluster DNS, the litellm-kb Service, and HTTPS (Slack).
resource "kubectl_manifest" "collmbo_networkpolicy" {
yaml_body = <<-YAML
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: collmbo
namespace: collmbo
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: collmbo
policyTypes:
- Ingress
- Egress
ingress: []
egress:
# DNS resolution
- to:
- namespaceSelector: {}
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
# In-cluster litellm-kb
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: litellm-kb
ports:
- protocol: TCP
port: 4000
# Outbound HTTPS to Slack (Socket Mode)
- ports:
- protocol: TCP
port: 443
YAML
depends_on = [kubectl_manifest.collmbo_namespace]
}
EOF
OpenTofu Code - apply
Initialise the OpenTofu working directory and apply. This is the same idempotent apply used by the base post - re-running it simply adds the Knowledge Base, OpenSearch Serverless collection, Confluence data source, the litellm-kb release and Collmbo to the existing cluster:
1
2
3
4
5
6
if aws s3api head-bucket --bucket "${CLUSTER_FQDN}" 2> /dev/null; then
tofu -chdir="${TMP_DIR}/${CLUSTER_FQDN}" init
if [[ ! ${MY_TASK:-} =~ delete: ]]; then
tofu -chdir="${TMP_DIR}/${CLUSTER_FQDN}" apply -auto-approve
fi
fi
Test the integration
Invite the bot to a channel and ask it something that is documented “only” in your Confluence space:
1
2
/invite @Slack Bot
@Slack Bot What is the recommended way to secure new Kubernetes clusters?
Collmbo replies in channels, threads, and DMs. Because LLM_MODEL points at the ...-kb model, LiteLLM retrieves the relevant Confluence chunks from the Knowledge Base and the Bedrock model answers from them - so the reply reflects your wiki, not the model’s general knowledge.
The ingestion job is started asynchronously by the apply (it is not waited on), so give it a few minutes to finish before testing - until the first job reaches
COMPLETEthe Knowledge Base is empty and answers fall back to the model’s general knowledge. Track progress with thelist-ingestion-jobsCLI command (or the Bedrock console).
Clean-up
Collmbo, the dedicated litellm-kb release, the Knowledge Base, the OpenSearch Serverless collection and the Confluence secret all live inside the base cluster’s OpenTofu state, so the standard clean-up from Amazon EKS with Open WebUI and AWS Bedrock managed by OpenTofu removes them together with the rest of the cluster - there is nothing extra to delete.
The full teardown (tofu destroy, S3 state removal, Karpenter EC2 cleanup, …) is performed by the base post’s clean-up steps, which recreate both OpenTofu files and destroy the whole working directory at once:
1
2
3
4
5
6
export TF_VAR_slack_app_token="anything"
export TF_VAR_slack_bot_token="anything"
export TF_VAR_confluence_url="https://example.atlassian.net"
export TF_VAR_confluence_username="anything"
export TF_VAR_confluence_api_token="anything"
export TF_VAR_confluence_space_key="DOCS"
Enjoy … 😉

