Duo distills our development docs into context, shifts flaky-test fixes left, and closes its own knowledge gaps — getting better every cycle.
End-to-end tests flake on timing, selectors, and shared state. We retry, quarantine, and eventually fix them — but the fix lives in one merge request and one engineer's head. Duo and our agents never carry that hard-won context into the next review.
How to stabilize a page object or wait correctly lives with reviewers — not where an agent can reach it.
Flakiness usually surfaces after merge, in nightly E2E runs — far from the change that introduced it.
Each fix is one-off. Nothing distills it back into guidance the platform can reuse.

Turn every fix into durable context — then feed it straight back into review.

A scheduled pipeline reads docs.gitlab.com/development, extracts our end-to-end testing conventions, and compiles them into artifacts the AI applies on every change.

Content generated by AI should be seen as a starting point and verified before use.

A second scheduled job classifies merged flaky-test fixes by root cause and checks each recurring pattern against the testing docs — opening a merge request wherever guidance is missing.

Each merged fix becomes documented context, distilled into the next skill and review pass. Knowledge compounds instead of resetting.
Fixes move from nightly E2E failures into the merge request, where they are cheapest to make and easiest to learn from.
Two scheduled jobs do the distilling and the gap-finding. No manual curation backlog, no doc rot.

Pilot on the E2E browser suite, prove it closes once, then expand to every flaky-test bucket.
