Conformance Monitoring with TLA+
Distributed systems like µONOS can be difficult to design and even harder to implement. When systems don't strictly conform to their intended properties, unpredictable behaviors can emerge. Even with good design and coding practices in place, building confidence in a distributed system can require extensive testing and evaluation.
To aid in the identification of concurrency bugs and help build confidence µONOS services adhere to their specifications, µONOS supports a special form of monitoring designed to detect violations of the system's properties. Conformance monitoring in µONOS uses formal modelling techniques to specify and verify system properties in near-real time.
Deployment
When performance conformance monitoring on a µONOS cluster, users must use Helm to deploy a single monitor pod per TLA+ specification.
Prerequisites
The conformance monitor currently requires Kafka to stream traces to TLC.
To deploy Kafka using Helm, first add the incubator repository to your Helm configuration:
$ helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
Define a values.yaml
file for your Kafka cluster, configuring the topics to be used by the conformance monitor to consume traces and produce alerts:
replicas: 1 zookeeper: replicaCount: 1 topics: - name: traces partitions: 3 replicationFactor: 1 - name: alerts partitions: 1 replicationFactor: 1
Then, install the incubator/kafka
chart with the desired configuration overrides:
$ helm install kafka incubator/kafka --values values.yaml
The successful completion of the kafka-config
pod indicates the Kafka brokers have completed startup and configured topics have been created.
Installation
Once the Kafka cluster has been configured, this chart can be deployed to perform near real-time conformance monitoring on Kafka streams. The monitor uses TLA+ specifications to evaluate traces received from Kafka, and invariants specified in the chart configuration can detect safety violations in the trace stream.
Several artifacts are required to by the chart:
model
- the name of the module to evaluatemodules
- an array of TLA+ module files to mount to the monitor podspec
- the specification to evaluateinit
- the state initialization predicate (required ifspec
is not configured)next
- the next state relation (required ifspec
is not configured)
Additional options can be used to specify invariants and other constraints on the model checker:
invariants
- an array of invariants to check for each traceconstants
- a mapping of constant values to assign to the modelconstraints
- an array of state constraintsproperties
- an array of model properties
$ helm install my-monitor --set modules={Cache.tla,CacheHistory.tla} --set model=Cache.tla --set config.spec=Spec --set config.invariants={TypeOK}
Specifications
The role of the conformance monitor is to track the system state over time and detect violations of safety guarantees by analyzing the system state. To do so, TLA+ specifications are used to provide a formal model for state transitions and specify the invariants to which the system must conform.
TLA+ specifications consist of one or more modules defined in .tla
files. A module defines a set of formulas for evaluating the state of the system. The specification provides a formula describing the initial system state and its transitions:
---------------------------- MODULE MonotonicTrace -------------------------- EXTENDS Naturals, Sequences \* A list of variables in the spec vars == <<...>> \* The init predicate describing the initial state of the system Init == ... \* The next state relation describing possible state transitions Next == ... \* The system specification describes the initial system state and next state relations Spec == Init /\ [][Next]_<<vars>> =============================================================================
Typically, the TLA+ model checker, TLC, computes and evaluates every state that can be reached by the spec according to its initial state and next state relation. Conformance monitoring specs operate on an infinite stream of traces, using the NextTrace
operator to consume and check traces in near-real time:
INSTANCE Traces \* The previous trace version VARIABLE prevVersion \* The current trace version VARIABLE nextVersion \* A list of variables in the spec vars == <<prevVersion, nextVersion>> \* Read a trace record from the stream and update the previous and next versions Read == LET record == NextTrace IN /\ PrintT(record) /\ prevVersion' = nextVersion /\ nextVersion' = record.version \* The init predicate describing the initial state of the system Init == /\ prevVersion = 0 /\ nextVersion = 0 \* The next state relation describing possible state transitions Next == \/ Read \/ UNCHANGED <<vars>>
The final component of a conformance spec is the invariant(s). Invariants are predicates that describe the properties of a well behaved (conforming) system. After each trace is consumed and processed from the traces stream, TLC will evaluate invariants to determine whether the system's state conforms to its safety properties:
\* An invariant verifying that the trace version is monotonically increasing TypeOK == nextVersion # 0 => nextVersion < prevVersion
In the event the invariant is violated by the system traces, the PublishAlert
operator can be used to publish an alert.
INSTANCE Alerts \* An invariant verifying that the trace version is monotonically increasing TypeOK == \/ nextVersion # 0 => nextVersion < prevVersion \/ PublishAlert([msg |-> "Invariant violated", prevVersion |-> prevVersion, nextVersion |-> nextVersion])
With the initial state predicate, the next state relation, and the type invariants, a complete conformance monitoring spec can be compiled:
---------------------------- MODULE MonotonicTrace -------------------------- EXTENDS Naturals, Sequences INSTANCE Traces INSTANCE Alerts \* The previous trace version VARIABLE prevVersion \* The current trace version VARIABLE nextVersion \* A list of variables in the spec vars == <<prevVersion, nextVersion>> \* An invariant verifying that the trace version is monotonically increasing TypeOK == \/ nextVersion # 0 => nextVersion < prevVersion \/ PublishAlert([msg |-> "Invariant violated", prevVersion |-> prevVersion, nextVersion |-> nextVersion]) \* Read a trace record from the traces stream and update the previous and next versions Read == LET record == NextTrace IN /\ PrintT(record) /\ prevVersion' = nextVersion /\ nextVersion' = record.version \* The init predicate describing the initial state of the system Init == /\ prevVersion = 0 /\ nextVersion = 0 \* The next state relation describing possible state transitions Next == \/ Read \/ UNCHANGED <<vars>> \* The system specification describes the initial system state and next state relations Spec == Init /\ [][Next]_<<vars>> =============================================================================
To deploy the specification for conformance monitoring, the chart must be configured to identify the specification Spec
and its invariant TypeOK
:
$ helm install my-monitor --set modules={MonotonicTrace.tla} --set model=MonotonicTrace.tla --set config.spec=Spec --set config.invariants={TypeOK}
The model will be initialized with the prevVersion
and nextVersion
set to 0
. As traces are pushed onto the traces stream, the model checker will Read
the trace records, update the model state, and evaluate the invariant TypeOK
.