Statement for incident during epoch 270

What happened?

We were predicted a single block during this epoch that we missed due to possible bug in cardano-node

Why it happened?

Due to unknown bug in cardano-node, instance running on our block producer node got stuck in some weird state, where it was running and reporting itself as functional, but was not processing any transactions. In gLiveView we saw instance as "syncing". Reported forged / missed blocks during the slot that we were assigned block for were 0. Restart of block producer brought it back to operational state, albeit it was too late. There was no outage of underlying systems, or network as there is no gap in OS performance metrics (which are collected via separate monitoring server over network, every 15 seconds).

What did we do to ensure this will not repeat

This bug could have been prevented if we had a monitoring of processed TXs, that would alert us for sudden drop in processed TXs on producer node. Manual intervention before the predicted slot would have fixed this problem in time. In order to make sure this will not repeat, we created a new detailed monitoring of producer node (using built-in prometheus exporter) that collects various block producer metrics and alerts us when processed TXs or relay connections drop down bellow acceptable values. We are confident that this issue will never repeat.