Chat Protocol Testing Roadmap

Status Quo

status-cli-tests

The first initiative we planned when the Status chat protocol ownership was moved to the Waku team was to increase testing capabilities to help with software quality.

The aim was to improve the testing coverage of the chat protocols implementation, by complementing existing status-go unit tests and library tests with “black box” or “API” testing.
Status-QA was already performing UI level testing (one layer above), and Vac-QA was already performing API testing on Waku (one layer below).

The result is status-cli-tests which tests chat protocol behaviour in various network environment, and other conditions on the process (e.g. suspend). It validates functional behavior of the chat protocols implementation over Waku.

You can find those tests here [1] and results there [2].

Status PR Dogfooding

As the Waku team took ownership of Waku integration in status-go, we needed to move fast and transferred a lot of our development effort this domain. Moving fast usually means breaking things, which was not an option.

Hence, we put a special dogfooding process in place [3] [4]. Where a developer doing a change in go-waku/status-go had to:

  1. Create a PR and branch in status-go for the change
  2. Create a temporary PR in status-desktop that uses the status-go branch to test
  3. Same in status-mobile
  4. Add this to the Waku dogfooding backlog
  5. Every Thursday, in two meetings to cover different timezones, the Waku team proceeded with the dogfooding where they use specific binaries to test out the new features/changes together; analyse issues encountered and logs, to then report them to the developer [5]

This process does add a lot of boilerplate. The dogfooding sessions have now been extended to any changes in Waku, an opportunity for team members to share and test their latest work.
While less Status related changes have been done within the Waku team lately, I am looking forward to the dogfooding of nwaku-based Status Desktop builds [6].

I intend to keep this current process in place, until the initiatives below are completed.

nwaku benchmarks

The first task of the Vac-DST team was to benchmarks nwaku in large scale simulations.
While the 10k nodes is simulation isn’t yet stable [7], a lot came from this work [8]. Including nwaku release-to-release non-regression performance testing [9], and dive down in specific protocol such as discv5 [10].

Initiatives

There are a few reasons why we are not stopping here.

From a wide stack point-of-view, we have a few big changes coming. And like every refactoring, your refactoring is as good as your regression testing :slight_smile:
Only with strong testing coverage can we have confidence and proceed with bold changes.

The upcoming changes are:

  • Replacing go-waku with nwaku [11] [12]. This means swapping the entire network stack. Which mean that both functional behaviour and performance need to be validated.
  • Extracting private chat library: This is yet to be defined and planned (hoping to start in 2025 H2). The steps are: understanding and specifying private chat protocols [13], add more tests (see initiatives below), define an API, implement the API, move the code.

Increase functional test coverage of private chats

I define private chat protocols as one-to-one chats and private group chats. This excludes Communities, user settings back-up, device pairing and syncing, peer sync at this point in time.

The first step is to complement the existing specifications [14] to have a better documented understanding of those protocols, and there usage of Waku. This is tracked with the Specify private chat protocol deliverable [15]

Specifications will enable us to reason about volume and frequencies of messages being sent. Allowing us to:

  • cap it [16] to a rate limit that enables scalability, and using RLN to enforce this rate limit.
  • define scalability of private chats and make statements such as “assuming a user send N msg per day to Y contacts, then we can scale to 1 million users on N shards by applying a rate limit of M msgs per 10min via RLN”.

It also enables us to provide thorough specifications to the Vac-QA team to complement the existing status-cli-test suite with new functional tests. By having a clear expectation on behaviour, we can better test said behaviour.

Simulate mobile environments [17]

One of the challenges of running chat protocols in a mobile environment, is the variety of events that can occur to the network and process. This includes but is not limited to: network switch, network outages, ip changes, process killed, process suspended, packet loss, low bandwidth, high latency, ipv6-only, data saver mode, battery saving mode, VPN usage, app in background etc.

status-cli-tests [1] included some of those scenarios from inception (e.g. packet loss), other where added after bugs found by Status-QA (e.g. ip change due to network switch).

This initiative is about putting mobile dev experts (Status Mobile dev and QA) and protocol test analysts (Vac-QA) in the same (virtual) room to understand what can happen to an app running on Android and iOS, and simulate those scenarios on the chat protocol implementations.

This will allow us to be more pro-active in terms of software quality for mobile. In addition to improving chat protocols testing, it will increase our confidence that nwaku can support all those scenarios when embedded in the Status Mobile app.

A future step would be to include those scenarios directly in the nwaku testing suite, so any issue or regression can be found even earlier in the development cycle.

Define and Implement Baseline Benchmarks for Chat protocols [18]

We do not intend to increase test coverage or benchmarking for go-waku. Go-waku development is halted, only bug fixes are pushed and they are scarce (one bug fix in Status 2.33). There is limited value in benchmarking or further testing software that does not change.
Our development efforts are focused on nwaku and chat protocols implementation in status-go.

While our focus on chat protocols is more about understanding and documenting than modifying. Now is a good time to define benchmarks, so we have a baseline for when we get back into changing code or protocols.

Moreover, with the adoption of FURPS in the organization [19] [20]. Baseline benchmarks are needed to verify the Rs (Reliability) and Ps (Performance).

Hence, we are defining a number of Chat protocol benchmarks [21] for our friends at Vac-DST to run.

Those benchmarks will be useful to make statements such as “we are reducing bandwidth usage by X”, as well as ensuring that the switching to nwaku does not impact the software performance.

Checking those benchmarks, as well as status-cli-tests results [1], can then be added to Status apps and status-go release process.

Conclusion

A layered approach to software (Waku vs Chat protocols vs application), not only allows for more maintainable code, but also being able to test individual pieces, both from the inside (unit tests/library test) and the outside (black box/API testing).

While now the focus is not in adding more functionalities to the chat protocols, we are planning major architectural changes, as well as working towards scalability for private chats.

Having comprehensive test suites and baseline benchmarks will help ensure that no regressions, both functional and performance, are introduced when rolling out with those changes.

References

1 Like