Stage 10 — Event-Driven Ingestion via Kafka

How to complete this stage

Enable Kafka-based ingestion so the IA node can consume RDF data from an event stream rather than only from direct API uploads. This is an optional stage to be completed if you require event-driven ingestion. Before starting:

ianode-access is running (Stage 4).
Secure Agent Graph is running in secure mode (Stage 9), or you are prepared to restart it.
Kafka is available (local or containerised, depending on your setup).

Open a terminal for Kafka-related commands.

Approach and rationale

Kafka ingestion demonstrates production-style integration, where data arrives asynchronously from other systems.

It allows the IA node to:

Ingest data from a stream of events.
Update the graph without direct client uploads.
Support integration testing that mirrors real deployments.

This stage is optional but recommended for environments that require streaming integration.

10.1 Prepare the metadata directory

Kafka ingestion requires a persistent metadata directory used by the ingestion components. From the Secure Agent Graph project directory:

cd ~/src/secure-agent-graph
mkdir -p databases

Expected behaviour

Ensures Kafka ingestion can store and reuse its internal state. Without this directory, ingestion may fail or stall.

10.2 Start Kafka

Start Kafka using the approach defined by your environment (for example using Docker Compose if provided by the project or your platform tooling).

Kafka should be running and reachable on the expected bootstrap address (commonly localhost:9092) before continuing.

10.3 Restart Secure Agent Graph with Kafka enabled

Stop Secure Agent Graph (Ctrl+C) and restart it using the Kafka configuration:

cd ~/src/secure-agent-graph

USER_ATTRIBUTES_URL=http://localhost:8091 \
JWKS_URL="http://localhost:9229/${USER_POOL_ID}/.well-known/jwks.json" \
java \
-classpath "sag-server/target/classes:sag-system/target/classes:sag-docker/target/dependency/*" \
uk.gov.dbt.ndtp.secure.agent.graph.SecureAgentGraph \
--config sag-docker/mnt/config/dev-server-kafka.ttl

Expected behaviour

Starts the IA node in Kafka-enabled mode.
Listens for RDF messages on the configured Kafka topic.
Ingests messages into the graph automatically.

10.4 Send RDF messages using the provided tooling

Use the Kafka tooling provided by the project (for example jena-kafka-client and the fk script) to publish RDF messages to the configured topic.

Expected behaviour

When messages are published successfully:

The IA node consumes them.
The graph is updated.
The data becomes queryable via SPARQL and GraphQL.

10.5 Verify ingested data is queryable

Fetch an authentication token if required by your secure configuration. Query the graph using SPARQL or GraphQL as demonstrated in Stage 9. Verify that:

Data published via Kafka is visible.
Data remains subject to authentication and ABAC filtering.

Operational notes

Kafka ingestion depends on setting the correct topic and bootstrap configuration in dev-server-kafka.ttl. If ingestion appears stalled, check that:

The databases directory exists and is writable.
The topic exists and matches the configuration.
The node has been restarted after enabling Kafka configuration.
Kafka is reachable from your host environment.

If running Kafka in Docker, ensure the advertised listener configuration allows connections from your host.

10.6 Checkpoint

At the end of this stage:

Secure Agent Graph is running with Kafka enabled.
RDF messages published to Kafka are ingested into the graph.
Ingested data is queryable.
Authentication and ABAC filtering still apply.

If ingestion does not work as expected, verify Kafka connectivity and configuration before proceeding.

Next Steps

Federator