docs: design GlitchTip/Gitea deduplication and linking (#78) #95

Merged
sysadmin merged 1 commits from docs/issue-78-glitchtip-deduplication-linking into master 2026-07-02 15:01:37 -05:00
Showing only changes of commit 6df3c86e89 - Show all commits
@@ -0,0 +1,59 @@
# GlitchTip-Gitea Deduplication and Linking Design
- **Status:** Design (child of #74)
- **Issue:** #78 (parent: #74 / #75)
- **Date:** 2026-07-02
## 1. Overview and Goals
To prevent automated error-reporting from flooding the Gitea issue tracker with duplicate tickets for the same underlying GlitchTip error, the filing orchestrator must deduplicate reports. Every filed Gitea issue will be cleanly linked back to its originating GlitchTip error via structured metadata.
## 2. Structured Metadata Marker
Each Gitea issue filed by the orchestrator will contain a machine-readable, structured metadata block in its body. This metadata will contain the GlitchTip issue ID and fingerprint.
We will use a hidden HTML comment at the end of the issue body:
```markdown
<!-- glitchtip-metadata: {"issue_id": "12345", "fingerprint": "abc123xyz"} -->
```
Adding this as a hidden comment allows orchestrators to parse the metadata reliably without cluttering the user interface or affecting human readability.
## 3. Search and Duplicate Detection Strategy
Before the orchestrator files a new issue, it must search the target Gitea repository for any existing issues referencing the same GlitchTip error.
### Search Process:
1. **API Query:** Query the Gitea repository's issues endpoint using the search term `"glitchtip-metadata"`. This narrows the results down to issues filed by this workflow. The query must search **both open and closed** issues (using Gitea API `state=all`).
2. **Client-side Parsing:** Fetch the details/body of matching issues and extract the metadata block.
3. **Identity Match:** Check if the Gitea issue's `issue_id` or `fingerprint` matches the incoming GlitchTip error. If a match is found, it is flagged as a duplicate.
## 4. Handling Closed Matching Issues (Open Owner Decision)
When a matching duplicate Gitea issue is found but its status is **closed**, the workflow cannot assume a single correct behavior (e.g. reopening could cause infinite loops on flaky errors; creating new issues could cause duplicate spam).
The orchestrator must support configurable modes for this scenario:
* Mode A: **Ask Human** (Prompt for decision: reopen, file new, or ignore) - *Default Mode*.
* Mode B: **Comment-Only** (Post a comment in the closed Gitea issue noting that the error recurred, rather than reopening it).
* Mode C: **Reopen** (Reopen the closed Gitea issue and apply `status:triage` / `status:in-progress`).
* Mode D: **Create New** (Ignore the closed issue and file a new one, linking it to the previous closed issue).
> [!IMPORTANT]
> **Open Owner Decision:** The final default behavior and Mode configuration must be confirmed by the owner prior to implementation.
## 5. Concurrency and Race Condition Mitigation
Since multiple runs of the orchestrator could occur concurrently (e.g. parallel Jenkins builds or multiple webhook deliveries), there is a risk of two runs checking for duplicates simultaneously and both creating new issues.
### Mitigation Strategies:
1. **Single-Concurrency Gate:** Limit execution of the issue filing runbook to a single-concurrency queue (e.g. GHA `concurrency` groups, Jenkins lockable resources).
2. **Double-Check Query:** Add a randomized delay/jitter (0-5 seconds) before creating the issue, and perform a final check of Gitea issues immediately prior to POSTing the new issue.
3. **Idempotency Header / Cache:** (Optional) Keep a lightweight, short-lived external state store or cache if a persistent runner is used.
## 6. Spam Prevention (Spam Cap)
To protect Gitea from an unexpected surge in errors (e.g., during a major site outage), the orchestrator must enforce a maximum spam cap per execution:
- **Default Cap:** Maximum of 5 new Gitea issues filed per execution run.
- **Exceeded Behavior:** If the cap is reached, the runbook will halt filing new issues, log a warning, and print a summary of all skipped issues to the console/audit logs.
## 7. Testing Strategy (Mocked Verification)
Unit tests for the implementing orchestrator must use mocked Gitea/GlitchTip APIs to assert:
1. **Deduplication:** A second run with a matching fingerprint does not trigger issue creation.
2. **State Search:** Both open and closed issues are queried (`state=all`).
3. **Closed Match mode:** Mode logic operates as configured (`comment`, `reopen`, `new`, `ask`).
4. **Spam Cap:** Asserts that only the capped limit of issues is created, even if more errors are fetched from GlitchTip.
5. **No Secrets/PII Leak:** Check that metadata and issue content are clean of credentials.