GitLab
Self-learning systems · gitlab-org #21742 · @pedropombeiro

A self-learning loop
for flaky tests

Duo distills our development docs into context, shifts flaky-test fixes left, and closes its own knowledge gaps — getting better every cycle.

© GitLab Inc.
The problem

Today, every flaky test starts from zero

End-to-end tests flake on timing, selectors, and shared state. We retry, quarantine, and eventually fix them — but the fix lives in one merge request and one engineer's head. Duo and our agents never carry that hard-won context into the next review.

Knowledge stays tribal

How to stabilize a page object or wait correctly lives with reviewers — not where an agent can reach it.

Fixes land too late

Flakiness usually surfaces after merge, in nightly E2E runs — far from the change that introduced it.

Context never accrues

Each fix is one-off. Nothing distills it back into guidance the platform can reuse.

© GitLab Inc.
The idea

What if the platform
taught itself?

Turn every fix into durable context — then feed it straight back into review.

© GitLab Inc.
Step 01 · scheduled CI jobAutomated

Distill the docs into context Duo can use

A scheduled pipeline reads docs.gitlab.com/development, extracts our end-to-end testing conventions, and compiles them into artifacts the AI applies on every change.

It captures conventions like:
{{ assertRule }}
{{ selectorRule }}
{{ stateRule }}
development docs
docs.gitlab.com/development
distill-agent-context
.gitlab-ci.yml · schedule: {{ distillSchedule }}
Agent skill
{{ skillName }} · SKILL.md
Review instructions
{{ reviewInstructionsFile }}
© GitLab Inc.
Step 02 · in the merge requestAutomated

Fix flakiness before it ever merges

Duo flags it inline — citing the distilled rule.
An agent proposes the fix in the same MR.
Flakiness is prevented, not just retried away.

Content generated by AI should be seen as a starting point and verified before use.

Merge request !240958 · reorder Capybara assertions
spec/features/admin/admin_sees_background_migrations_spec.rb
  click_button 'Resume'
− expect(page).not_to have_button 'Resume'
− expect(page).to have_button 'Pause'
+ expect(page).to have_button 'Pause'
+ expect(page).not_to have_button 'Resume'
GitLab Duo Code Review
The negative matcher runs before the page settles, so Capybara polls the old button for the full 30-second timeout. Confirm the new state with a positive matcher first, then check for absence — per our testing guide.
Suggested fix applied by agent
© GitLab Inc.
Steps 03–04 · scheduled CI jobManually tested · needs automation

Mine the fixes, find the gaps, write the docs

A second scheduled job classifies merged flaky-test fixes by root cause and checks each recurring pattern against the testing docs — opening a merge request wherever guidance is missing.

Waiting on the right signal
testing-rspec.md
Query asserts in :js specs
testing-rspec.md
Timestamp ordering
testing-rspec.md
E2E waits & navigation
testing-e2e.md (new)
Gap identified
Assert UI before reload
most common cause
documented in
Distill flaky-test fix learnings into testing principles
merge request !241516 (adds a new testing-e2e principle)
Source: agent gap analysis of 64 fixed flaky-test issues, gitlab-org#603519
© GitLab Inc.
Why it works

Every fix makes the next review smarter

Self-reinforcing

Each merged fix becomes documented context, distilled into the next skill and review pass. Knowledge compounds instead of resetting.

Shift-left by default

Fixes move from nightly E2E failures into the merge request, where they are cheapest to make and easiest to learn from.

Runs itself

Two scheduled jobs do the distilling and the gap-finding. No manual curation backlog, no doc rot.

© GitLab Inc.
Next

Close the loop

Pilot on the E2E browser suite, prove it closes once, then expand to every flaky-test bucket.

Work itemgitlab-org #21742
SurfaceDuo Code Review + agents
First bucketE2E browser pages
© GitLab Inc.