Stress testing as a metric - next steps

Setting clear and relevant metrics for a decentralized product is challenging because it’s difficult or impossible to know what happens on the user’s device. To gain some insight, we are using telemetry, but this has limitations.

After reviewing with Insights, the preferred method for confirming the functional behavior of a given protocol and implementation under real conditions remains large-scale simulations and stress testing.

Using this stack overview from https://blog.waku.org/2024-06-20-message-reliability/ for terminology:

Core Protocols API are being stress tested by the Vac-DST team, as per their waku commitments.

  • Regression testing is performed at every nwaku release.
  • Req-res protocol work (store, light, filter) is still work in progress (afaik).

The bulk of 2024 Waku milestones focuses on Status app message reliability, also known as “messaging API”. These have been implemented in status-go and are being moved to go-waku.

The Vac-QA team has done an excellent job testing those protocols from a higher layer (Chat protocol), in networks with packet drop and latency.

However, we still need stress testing to confirm the protocols behave as expected at scale.

Stress Testing Messaging API

The question is, how do we get this done?

There are two strategies:

A. Ensure that reliability protocols for relay and resource-restricted clients are fully migrated to go-waku and proceed with stress testing go-waku nodes.

B. Wait until those protocols are migrated in nwaku, and used in Status Desktop, and then proceed to the stress testing.

In both cases, it means exposing those protocols to the node’s REST API.

The final goal is for nwaku’s REST API and FFI API to be closely related, and to include those new protocols. But this is unlikely to be delivered before mid-2025.

Stress Testing as a Metric

Stress testing is likely to be the best metric we have to confirm the efficacy of our work and that protocols and implementation behave as promised; so that applications can rely on them.

However, for stress testing to be a metric that measures the success of a milestone, the time gap between software delivery and stress testing run needs to be timely and minimal. So that a project team delivering a milestone, can get early enough feedback on whether metrics criteria are fulfilled, and proceed to correction before moving to production.

Considering the backlog DST is working with, the question is whether this is an achievable task and whether projects can use simulation reports as milestone metrics.

Moreover, if we agree that project/protocol metrics should be partially or mostly driven by simulations, then this should be part of Vac-DST narrative.

So, before jumping deeper in the topic, I would like to do a step back and clarify what we are referring here when it comes to “stress testing”.

From my point of view, this has two different meanings:

  1. Push the software to its limits. ie: make a node publish as much messages per second as possible, make it handle as much connections as possible, and so on.
  2. Check the lowest possible conditions for the software to be usable. ie: use the lowest bandwidth possible, highest drop package, low hardware resources, and so on

We are probably interested in both. But I think 1 give us more important insights than 2. The problem is that it is also much harder to do. There is a certain point that it is hard to know if a software is failing because you are pushing it too much, or because the hardware itself cannot handle the load.

I think both of them are doable, but in order to achieve this time gap that you mention, I think we would first need to agree and stablish a “protocol” of actions to do on points 1 and 2 (if we are interested in both), polish them, iterate over them, and once it is ready, we can start to apply them on every version/release.

Right now I think it is an achievable task with the vaclab and indeed it can be part of DST narrative. But I guess in DST we should also discuss about resources and time allocation for this. Because it is something we could consider for next quarter, but personally I don’t see this viable for this one.

1 Like

Thanks @Alberto for your input.

This conversation stems from the FURP initiative from the insight team and IFT directive for projects to define their metrics per milestones cc @petty.

It seems (from Insight team) that proceeding with simulation and confirming a protocol and software ability to handle X (ie, point 1) is a preferred way to handle performance measurement for decentralized protocols. As the metrics we can get from deployment are limited, due to the decentralized nature.
I agree with this view.

Hence, if we projects must set metrics, and best metrics come from simulations, then it should be clearly stated in Vac-DST’s NCT that support to projects for related to milestone metrics is needed
And until DST can do so (as you say, it is work, and hence need to be planned etc) then project metrics related to protocol performance are most likely limited to real world measurement coming from a subset of the network (ie, our fleet nodes).

Cc @ksr