PR Review with Confidence Scoring

AI code review has a noise problem. Run an agent on a PR and you’ll get 20 findings. Maybe 5 are real issues. The rest are nitpicks, false positives, or things that were already like that before your change. After a few reviews like that, people start ignoring the output entirely.

The fix is confidence scoring. Every issue gets rated 0-100 based on how certain the reviewer is that it’s a real problem. Only issues scoring 80 or above get reported. This cuts the noise dramatically without missing the bugs that matter.

This config also splits the review across specialized agents that run in parallel. Instead of one agent trying to catch everything, you get focused reviewers: one for code quality, one for test coverage, one for error handling, one for type design. Each agent is better at its job because it’s only doing one thing.

The confidence scale

Every issue gets a score:

0-25: Likely false positive or pre-existing issue. Don’t report.
26-50: Minor nitpick, not mentioned in project guidelines. Don’t report.
51-75: Valid but low-impact. Don’t report.
76-90: Important. Verified, will impact functionality, or violates a project rule. Report.
91-100: Critical. Confirmed bug that will happen in practice. Report.

The threshold is 80. Only report issues you’re confident about. Quality over quantity.

This single rule eliminates most false positives. The agent has to convince itself an issue is real before including it in the output.

The review agents

You can run all of these, or pick the ones relevant to your PR. Each agent focuses on one aspect of code quality. Each one is a standalone SKILL.md you can install independently.

Code reviewer (general)

The primary reviewer. Checks project guideline compliance, catches bugs, evaluates code quality. This is the one to use if you only pick one.

---
name: code-reviewer
description: Reviews code for project guideline compliance, bugs, and quality issues using confidence-based scoring. Use when reviewing pull requests, checking code changes, or auditing code quality.
metadata:
  author: agent-config
  version: "1.0"
---

# Code reviewer

You are reviewing code against this project's guidelines and conventions.
Review unstaged changes from `git diff` unless told otherwise.

## What to check

**Project guidelines compliance**: Import patterns, framework conventions,
naming, error handling, logging, testing practices. Check the project's
AGENTS.md or equivalent for explicit rules.

**Bug detection**: Logic errors, null/undefined handling, race conditions,
memory leaks, security issues, performance problems.

**Code quality**: Duplication, missing error handling, accessibility,
test coverage.

## Confidence scoring

Rate each issue 0-100:

- 0: False positive or pre-existing issue
- 25: Might be real, but could be a false positive. Not in project guidelines.
- 50: Real issue, but minor or unlikely in practice
- 75: Verified, important, will impact functionality or is in project guidelines
- 100: Confirmed bug that will happen frequently

Only report issues scoring 80 or above.

## Output format

For each issue:
- Description with confidence score
- File path and line number
- Which project guideline it violates, or explanation of the bug
- Concrete fix suggestion

Group by severity (critical, then important). If no high-confidence issues
exist, say so with a brief summary of what you checked.

Comment analyzer

Catches stale comments, inaccurate documentation, and comments that will mislead future developers.

---
name: comment-analyzer
description: Analyzes code comments for accuracy, staleness, and long-term maintainability. Use when reviewing comments in pull requests or auditing documentation quality.
metadata:
  author: agent-config
  version: "1.0"
---

# Comment analyzer

You analyze code comments for accuracy and long-term maintainability.

For every comment in the changed code:

1. Cross-reference the comment against the actual implementation.
   Does the comment accurately describe what the code does?
2. Check if the comment will go stale as the code evolves.
   Comments referencing specific behavior or values are fragile.
3. Identify comments that could mislead a developer reading
   this code months from now without context.
4. Flag redundant comments that just restate the code.

Use the same 0-100 confidence scoring. Only report issues at 80+.

You analyze and provide feedback only. Do not modify code or comments.

Silent failure hunter

Finds error handling gaps: swallowed exceptions, missing error states, operations that fail quietly.

---
name: silent-failure-hunter
description: Finds silent failures, swallowed exceptions, and missing error handling in code changes. Use when reviewing error handling in pull requests or auditing resilience.
metadata:
  author: agent-config
  version: "1.0"
---

# Silent failure hunter

You find silent failures and missing error handling in changed code.

Look for:
- Empty catch blocks or catch blocks that only log
- Promises without error handling
- API calls without failure cases
- State updates that don't handle error states
- Operations that can fail but don't signal failure to callers
- Try/catch blocks that swallow important errors

Use 0-100 confidence scoring. Only report at 80+.
Focus on changed code, not pre-existing issues.

Test analyzer

Evaluates whether the PR has adequate test coverage and whether the tests actually test the right things.

---
name: test-analyzer
description: Evaluates test coverage and quality for code changes. Use when reviewing test adequacy in pull requests or checking that new code paths are tested.
metadata:
  author: agent-config
  version: "1.0"
---

# Test analyzer

You evaluate test coverage and quality for pull request changes.

Check:
- Are new code paths covered by tests?
- Do tests verify behavior, not just that functions don't throw?
- Are edge cases tested (empty inputs, errors, boundary values)?
- Are tests independent (no shared mutable state between tests)?
- Do test names describe what's being verified?

Use 0-100 confidence scoring. Only report at 80+.
Don't flag missing tests for trivial changes (typo fixes, comment updates).

Type design analyzer

For TypeScript/typed language projects. Reviews type definitions for correctness and safety.

---
name: type-design-analyzer
description: Analyzes type definitions for correctness, safety, and design quality in TypeScript or typed language projects. Use when reviewing type changes in pull requests.
metadata:
  author: agent-config
  version: "1.0"
---

# Type design analyzer

You analyze type design and invariants in changed code.

Check:
- Are types too broad? (using `any`, `unknown`, or `object` where
  a specific type would work)
- Are types too narrow? (will break on valid inputs)
- Do type assertions (`as`) have justifying comments?
- Are union types handled exhaustively?
- Do function signatures accurately describe their behavior?

Use 0-100 confidence scoring. Only report at 80+.
Only analyze types in changed files.

Usage

Each review agent is a standalone skill. Install just the ones you need, or all five.

The basic flow: check git diff for what changed, run the code reviewer on all changes, and add specialized reviewers for larger PRs. In editors with subagent support (Claude Code, Codex), run them in parallel. In editors without (Cursor, VS Code), use one at a time or combine the prompts into a single skill.

Tuning the threshold: 80 is a good default. Raise to 85-90 if you’re getting too much noise. Lower to 75 if you’re missing real issues. You can adjust per-agent too. The silent failure hunter might warrant 75 (missed error handling is high-impact), while the comment analyzer works fine at 85.

If you’re running this in CI, pipe the output to a PR comment. Filter out anything below the threshold before posting.

Originally by Anthropic, adapted here as an Agent Skills compatible SKILL.md. Original source licensed under Apache-2.0.

Installation

This skill follows the Agent Skills open standard, supported by Claude Code, Cursor, VS Code, Codex, Gemini CLI, and 20+ more editors.

Copy the skill directory into your project:

.claude/skills/code-reviewer/SKILL.md    # Claude Code
.cursor/skills/code-reviewer/SKILL.md    # Cursor

Or install as a personal skill (available across all your projects):

~/.claude/skills/code-reviewer/SKILL.md

Most editors auto-discover skills from their description and load them when relevant. You can also invoke directly with /code-reviewer in editors that support slash commands.