PQCA CBOMkit Architecture – Post-Quantum Cryptography Alliance

CBOMkit is a collection of tools built by IBM (now under the Post-Quantum Cryptography Alliance) for generating and managing Cryptographic Bills of Materials (CBOMs).

A CBOM is a structured inventory of the cryptographic assets present in a software system: algorithms, keys, protocols, and their properties. The kit enables organizations to discover what cryptographic mechanisms their code depends on, how they are configured, and whether those configurations meet defined security standards, including readiness for post-quantum cryptography.

Figure 1: CBOMKit Architecture

The system is organized into five interconnected components, each with a distinct role:

Component	Codename (formerly by IBM)	Role
sonar-cryptography	Hyperion	Cryptographic detection engine (SonarQube plugin)
cbomkit-lib	Hyperion (library layer)	Standalone Java library wrapping the detection engine
cbomkit-action	–	GitHub Action for CI/CD-integrated scanning
cbomkit	Coeus + Themis + Mnemosyne	Full application with REST API, web UI, and compliance evaluation
cbomkit-theia	Theia	Filesystem and container image scanner for cryptographic artifacts (Go CLI)

The first four components form a layered dependency hierarchy. The detection engine (sonar-cryptography) sits at the base and is consumed by cbomkit-lib, which in turn is consumed by both cbomkit-action and cbomkit.

cbomkit-theia operates independently as a complementary tool. While the source code stack detects cryptographic API usage in code, Theia detects cryptographic assets present in deployment artifacts such as certificates, keys, secrets, configuration files, and container images.

Component Architecture

1. Detection Engine: sonar-cryptography

The detection engine is a SonarQube plugin that identifies cryptographic usage in source code. It supports Java, Python, and Go. The engine operates on Abstract Syntax Trees (ASTs) generated by the respective SonarQube language plugins, not on raw source text.

Internal structure:

The plugin is organized into Maven submodules, each responsible for a specific stage of the analysis process:

engine/ – Language-agnostic detection framework. Defines core abstractions such as DetectionStore, DetectionExecutive, and IDetectionRule.
java/, python/, go/ – Language-specific modules. Each provides detection rules, an AST translator, and a check registrar that integrates with the corresponding SonarQube language plugin.
mapper/ – Converts raw detections into a structured semantic model based on the INode hierarchy representing algorithms, keys, and protocols.
enricher/ – Adds derived metadata such as OIDs, default parameters, and algorithm-specific properties not explicitly present in source code.
output/ – Generates a CycloneDX CBOM JSON file from the enriched INode tree.
common/ – Shared utilities, including observer pattern infrastructure for event-driven communication.
sonar-cryptography-plugin/ – Plugin entry point. Registers language extensions and defines the post-job that triggers CBOM generation after analysis.

Figure 2: sonar-cryptography Dectection Pipeline

Detection pipeline:

The detection process follows a multi-stage pipeline for each supported language.

AST construction: SonarQube language plugins parse source files and produce typed ASTs with resolved symbols.
Rule application: Detection rules, defined using DetectionRuleBuilder, are applied to the AST. Each rule targets a specific API usage pattern, such as javax.crypto.Cipher#getInstance in Java or Cipher from cryptography.hazmat.primitives.ciphers in Python. Rules define which parameters contain cryptographic values and how they are extracted.
Detection store construction: Matched rules populate a DetectionStore, a tree structure that records the detection location and captured parameters. Nested detections are represented as child stores.
Mapping: A language-specific ITranslator converts entries in the DetectionStore into nodes of the INode semantic model. The model includes types such as Algorithm, Key, Protocol, Mode, Padding, KeyLength, and DigestSize. Each node includes an origin attribute indicating whether it was detected, inferred, or enriched.
Enrichment: The Enricher traverses the INode tree and applies algorithm-specific enrichment. For example, it may add an OID for AES-128-GCM or assign a default mode when none is explicitly defined in source code.
Output generation: CBOMOutputFile converts the enriched INode tree into a CycloneDX v1.6 CBOM. Each cryptographic asset is represented as a component of type cryptographic-asset. Evidence entries record file paths and line numbers. Dependency relationships reflect structural associations, such as a key used with a specific algorithm.

Cryptographic libraries covered:

Language	Libraries
Java	Java Cryptography Architecture (JCA), BouncyCastle Lightweight API
Python	pyca/cryptography
Go	Standard library crypto/, golang.org/x/crypto*

SonarQube integration:

The plugin integrates with SonarQube through CryptographyPlugin.java, which implements the Plugin interface. During initialization, it registers rule definitions and check registrars for each supported language.

After the scan completes, OutputFileJob (a PostJob) collects detections from all language-specific Aggregator instances through ScannerManager and writes the final CBOM file. The default output filename is cbom.json. This can be configured using the sonar.cryptoScanner.cbom property.

2. Scanning Library: cbomkit-lib

cbomkit-lib is a Java library that wraps the Sonar Cryptography Plugin and exposes a standalone programmatic API for cryptographic scanning without requiring a running SonarQube installation. Consumers invoke it directly from Java code to scan a local directory and receive a CBOM as output.

Note: The sonar-cryptography plugin is designed to run inside a SonarQube server. Without cbomkit-lib, you would need to deploy a SonarQube instance, install the plugin, configure a project, and run SonarScanner to produce a CBOM. cbomkit-lib removes this requirement by reusing the core detection, translation, enrichment, and serialization modules directly as Java library dependencies. However, it still relies on SonarQube’s language parser libraries (sonar-java, sonar-python, sonar-go) for AST generation, these run as embedded dependencies, not as part of a SonarQube server.

The library provides a two-phase API structured around indexing and scanning.

Figure 3: cbomkit-lib Phases (Indexing & Scanning)

Indexing phase:

IndexingService (abstract) is the base class for language-specific module discovery. Concrete implementations (JavaIndexService, PythonIndexService, GoIndexService) walk a project directory tree, locate language-specific build files (for example, pom.xml, build.gradle, go.mod), and construct a list of ProjectModule records. Each ProjectModule contains the module identifier, its base path, and the list of source files (InputFile objects from the Sonar plugin API) that belong to it.

The indexing step serves two purposes: it decomposes a repository into logical units aligned with the project’s build structure, and it filters files for each language, excluding test directories, generated code, and non-source artifacts.

Scanning phase:

ScannerService (abstract) is the base class for language-specific scanners. Each concrete implementation accepts a list of ProjectModule objects and runs the corresponding detection rules from the Sonar Cryptography Plugin against the source files within those modules. Results are returned as ScanResultDTO, an immutable record containing the resulting CBOM, the number of files and lines scanned, and timing information.

The CBOM class is a record that wraps CycloneDX’s Bom object. It provides utilities for merging multiple CBOM instances (for example, one per language), enriching them with metadata such as git URL, revision, and commit hash, serializing to JSON, and writing to disk.

Java scanning requires additional consideration. Since Java cryptographic API usage often depends on compile-time symbol resolution, the Java scanner can optionally require that the project be built beforehand. When compiled class files or dependency JARs are provided, symbol resolution accuracy improves. The JavaScannerService accepts paths to JAR directories and class directories for this purpose.

Progress reporting:

The library includes an optional progress reporting mechanism via the IProgressDispatcher interface. Implementations receive ProgressMessage records during scanning, enabling real-time feedback to the caller. The message type system (ProgressMessageType enum) includes LABEL, DETECTION, WARNING, ERROR, GITURL, and BRANCH. If the dispatcher detects that the client has disconnected, it raises ClientDisconnected, which the scanner handles as a controlled interruption.

3. GitHub Action: cbomkit-action

cbomkit-action packages cbomkit-lib into a GitHub Action that developers can include in their CI/CD workflows. It runs as a Docker container built on Red Hat UBI 8 with OpenJDK 21.

4. Full Application: cbomkit

cbomkit is a full-stack application that provides a persistent and queryable store of CBOMs (Mnemosyne), a web interface for visualization (Coeus), and a compliance evaluation engine for policy assessment (Themis). The application supports scanning source code directly and importing externally generated CBOM files.

Figure 4: cbomkit Architecture

Scan lifecycle:

A scan begins when a client sends a POST request to /api/v1/scan (synchronous) or establishes a WebSocket connection to /v1/scan/{clientId} (asynchronous with live progress). The presentation layer dispatches a RequestScanCommand to the CommandBus.

RequestScanCommandHandler creates a ScanAggregate, the root aggregate for a scan operation, and emits either ScanRequestedEvent (for Git URLs) or PurlScanRequestedEvent (for Package URLs, which must first be resolved to a Git URL).

ScanEventHandler listens to these events and issues sub-commands to the ScanProcessManager:

ResolvePurlCommand – Resolves a Package URL to a Git repository URL using GitHub API or deps.dev
CloneGitRepositoryCommand – Clones the repository to a temporary directory using JGit, with optional authentication
IdentifyPackageFolderCommand – Identifies the relevant subdirectory when scanning a specific package
IndexModulesCommand – Executes language-specific IndexingService
ScanCommand – Executes language-specific ScannerService

After each step, ScanProcessManager updates the ScanAggregate by applying domain events through ScanRepository. The aggregate state is reconstructed from its event log. Progress updates are dispatched via WebSocketProgressDispatcher for WebSocket scans or retained in memory for synchronous responses.

When all language scans complete, ScanFinishedEvent is emitted. CBOMProjector handles this event, merges per-language CBOMs using CBOM.merge(), and persists the result as a CBOMReadModel in PostgreSQL.

Figure 5: cbomkit Lifecycle

CBOM storage and retrieval:

The system separates scan state (event-sourced via ScanRepository) from query state (stored in CBOMReadModel). Clients retrieve CBOMs using the query-side API:

GET /api/v1/cbom/{projectIdentifier} – Retrieve a CBOM by project
GET /api/v1/cbom/last/{limit} – Retrieve most recent CBOMs
POST /api/v1/cbom/{projectIdentifier} – Upload an externally generated CBOM
DELETE /api/v1/cbom/{projectIdentifier} – Delete a stored CBOM

Compliance evaluation:

The compliance endpoint (/api/v1/compliance/check) evaluates a CBOM against defined cryptographic policies. The built-in BasicQuantumSafeComplianceService checks detected algorithms and keys against a whitelist of approved quantum-safe OIDs and algorithm names. Assets not in the whitelist are marked non-compliant. Assets that cannot be classified are marked as unknown.

For advanced policy evaluation, the application supports an external regulator service configured via CBOMKIT_REGULATOR_API_BASE and optional integration with Open Policy Agent (OPA) through a sidecar container.

Frontend:

The Vue.js frontend communicates with the backend over HTTP and WebSocket. It provides interfaces to initiate scans, browse stored CBOMs, visualize cryptographic asset hierarchies, and review compliance results. The IBM Carbon Design System is used as the UI component library.

5. Filesystem and Image Scanner: cbomkit-theia

cbomkit-theia is a Go CLI tool that detects cryptographic assets in container images and local directories. Unlike the source-code-focused detection performed by sonar-cryptography, Theia operates on deployment artifacts. It inspects files in a filesystem or container image to identify certificates, keys, secrets, and cryptography-related configuration files. It produces or enriches CycloneDX v1.6 CBOMs.

Theia addresses a limitation in the CBOM pipeline. Source code scanning identifies cryptographic API usage but does not capture runtime cryptographic materials such as deployed TLS certificates, private keys stored on disk, java.security policy files, or OpenSSL configuration that defines cipher suites. Theia detects these artifacts.

Supported data sources:

Theia accepts multiple input types through a unified Filesystem abstraction:

Local directory (plain filesystem)
Local Docker image from daemon, TAR archive, or Dockerfile (built dynamically)
Local OCI image as directory or TAR archive
OCI image from a remote registry
Docker image from DockerHub
Singularity image

For container images, Theia uses Anchore Stereoscope to resolve the image, squash layers into a single virtual filesystem, and expose it through the same Filesystem interface used for local directories.

Architecture:

Theia consists of four packages:

cmd/ – CLI layer built with Cobra. Exposes two subcommands: dir (scan a local directory) and image (scan a container image). Each subcommand builds a dependency injection container using Uber Dig, configures a Filesystem implementation, and invokes the scanner.
provider/ – Data source abstractions. Contains three sub-packages:
provider/filesystem/ – Defines the Filesystem interface (WalkDir, Open, Exists, GetConfig, GetIdentifier) and implementations PlainFilesystem (local directories) and FilteredFilesystem (adds glob-based ignore patterns via .cbomkitignore, configuration, or CLI flags).
provider/docker/ – Provides GetImage and BuildImage functions for resolving container images via Stereoscope and the Docker client. GetSquashedFilesystem returns a Layer implementing the Filesystem interface.
provider/cyclonedx/ – BOM parsing, creation, serialization, and component/dependency merge utilities. Handles CycloneDX JSON encoding and decoding.
scanner/ – Core scanning engine. RunScan orchestrates the pipeline: parse optional input BOM, instantiate configured plugins, execute plugins against the filesystem, attach tool metadata, and write the resulting BOM to stdout. The scanner struct manages ordered plugin execution based on PluginType (append plugins first, then verification plugins, then others).
scanner/plugins/ – Plugin implementations. Each plugin implements the Plugin interface:

Figure 6: cbomkit-theia

Plugins:

Plugin	Purpose
certificates	Walks the filesystem for X.509 certificates, parses each certificate, and adds it to the CBOM with signature algorithm, public key, and key algorithm details.
secrets	Uses Gitleaks to detect secrets and cryptographic key material (private, public, symmetric keys). Detected items are added as CBOM components.
javasecurity	Locates the java.security file, reads jdk.tls.disabledAlgorithms, and cross-references against an input CBOM. Assigns a confidence score (0–1) indicating executability. Requires —bom.
opensslconf	Locates OpenSSL configuration files (for example, openssl.cnf), extracts TLS versions and cipher suites, and adds them as protocol and configuration components.
problematicca	Identifies certificates issued by known problematic or untrusted certificate authorities and flags them in the CBOM.

Integration with source-code CBOMs:

Theia is designed to operate with sonar-cryptography. A typical workflow is:

Run a source-code scanner (cbomkit-lib, cbomkit-action, or cbomkit) to produce a CBOM describing cryptographic API usage.
Run cbomkit-theia image <image> –bom cbom.json to enrich that CBOM with runtime artifacts from the deployment image.

If a BOM is provided via —bom, Theia parses it, executes all plugins (including verification plugins such as javasecurity), and outputs the enriched CBOM. If no BOM is provided, Theia generates a new BOM containing only filesystem-detected assets.

Data Flow Across the System

The following describes how data moves through the components when a scan is triggered from the full application, illustrating all layers together:

Figure 7: Data Flow across CBOMKit Components

When using Theia, the flow is independent of the source-code pipeline:

Figure 8 : Data Flow across cbomkit-theia

Theia’s output can be consumed directly by compliance tools, or it can be uploaded to cbomkit via POST /api/v1/cbom/{projectIdentifier} for storage and compliance evaluation alongside source-code CBOMs.

Contributed By: Shubham Kumar, NgKore Foundation

Linkedin | GitHub

Shubham Kumar is an open-source contributor with a background in applied cryptography and telecom networks. He is actively involved in several open-source communities and currently serves as a BAC member at the OpenSSL Foundation, a TAC member at the NgKore Foundation, and a TSC member at LFN ONAP. He enjoys exploring new things in open source and contributing back to the ecosystem.