Static Analysis

Added March 5, 2026 Source: Trail of Bits

This skill uses CodeQL to scan your codebase for security vulnerabilities, supporting a wide range of languages. It handles the entire process, from building CodeQL databases and creating data extensions to processing SARIF output. Use this when you need to perform in-depth security analysis on your project.

Installation

This skill has dependencies (scripts or reference files). Install using the method below to make sure everything is in place.

npx skills add trailofbits/skills

Requires Node.js 18+. The skills CLI auto-detects your editor and installs to the right directory.

Or install manually from the source repository.

SKILL.md (reference - install via npx or source for all dependencies)

---
name: codeql
description: >-
  Scans a codebase for security vulnerabilities using CodeQL's interprocedural data flow and
  taint tracking analysis. Triggers on "run codeql", "codeql scan", "codeql analysis", "build
  codeql database", or "find vulnerabilities with codeql". Supports "run all" (security-and-quality
  suite) and "important only" (high-precision security findings) scan modes. Also handles
  creating data extension models and processing CodeQL SARIF output.
allowed-tools:
  - Bash
  - Read
  - Write
  - Edit
  - Glob
  - Grep
  - AskUserQuestion
  - TaskCreate
  - TaskList
  - TaskUpdate
  - TaskGet
  - TodoRead
  - TodoWrite
---

# CodeQL Analysis

Supported languages: Python, JavaScript/TypeScript, Go, Java/Kotlin, C/C++, C#, Ruby, Swift.

**Skill resources:** Reference files and templates are located at `{baseDir}/references/` and `{baseDir}/workflows/`.

## Essential Principles

1. **Database quality is non-negotiable.** A database that builds is not automatically good. Always run quality assessment (file counts, baseline LoC, extractor errors) and compare against expected source files. A cached build produces zero useful extraction.

2. **Data extensions catch what CodeQL misses.** Even projects using standard frameworks (Django, Spring, Express) have custom wrappers around database calls, request parsing, or shell execution. Skipping the create-data-extensions workflow means missing vulnerabilities in project-specific code paths.

3. **Explicit suite references prevent silent query dropping.** Never pass pack names directly to `codeql database analyze` — each pack's `defaultSuiteFile` applies hidden filters that can produce zero results. Always generate a custom `.qls` suite file.

4. **Zero findings needs investigation, not celebration.** Zero results can indicate poor database quality, missing models, wrong query packs, or silent suite filtering. Investigate before reporting clean.

5. **macOS Apple Silicon requires workarounds for compiled languages.** Exit code 137 is `arm64e`/`arm64` mismatch, not a build failure. Try Homebrew arm64 tools or Rosetta before falling back to `build-mode=none`.

6. **Follow workflows step by step.** Once a workflow is selected, execute it step by step without skipping phases. Each phase gates the next — skipping quality assessment or data extensions leads to incomplete analysis.

## Output Directory

All generated files (database, build logs, diagnostics, extensions, results) are stored in a single output directory.

- **If the user specifies an output directory** in their prompt, use it as `OUTPUT_DIR`.
- **If not specified**, default to `./static_analysis_codeql_1`. If that already exists, increment to `_2`, `_3`, etc.

In both cases, **always create the directory** with `mkdir -p` before writing any files.

```bash
# Resolve output directory
if [ -n "$USER_SPECIFIED_DIR" ]; then
  OUTPUT_DIR="$USER_SPECIFIED_DIR"
else
  BASE="static_analysis_codeql"
  N=1
  while [ -e "${BASE}_${N}" ]; do
    N=$((N + 1))
  done
  OUTPUT_DIR="${BASE}_${N}"
fi
mkdir -p "$OUTPUT_DIR"
```

The output directory is resolved **once** at the start before any workflow executes. All workflows receive `$OUTPUT_DIR` and store their artifacts there:

```
$OUTPUT_DIR/
├── rulesets.txt                 # Selected query packs (logged after Step 3)
├── codeql.db/                   # CodeQL database (dir containing codeql-database.yml)
├── build.log                    # Build log
├── codeql-config.yml            # Exclusion config (interpreted languages)
├── diagnostics/                 # Diagnostic queries and CSVs
├── extensions/                  # Data extension YAMLs
├── raw/                         # Unfiltered analysis output
│   ├── results.sarif
│   └── <mode>.qls
└── results/                     # Final results (filtered for important-only, copied for run-all)
    └── results.sarif
```

### Database Discovery

A CodeQL database is identified by the presence of a `codeql-database.yml` marker file inside its directory. When searching for existing databases, **always collect all matches** — there may be multiple databases from previous runs or for different languages.

**Discovery command:**

```bash
# Find ALL CodeQL databases (top-level and one subdirectory deep)
find . -maxdepth 3 -name "codeql-database.yml" -not -path "*/\.*" 2>/dev/null \
  | while read -r yml; do dirname "$yml"; done
```

- **Inside `$OUTPUT_DIR`:** `find "$OUTPUT_DIR" -maxdepth 2 -name "codeql-database.yml"`
- **Project-wide (for auto-detection):** `find . -maxdepth 3 -name "codeql-database.yml"` — covers databases at the project top level (`./db-name/`) and one subdirectory deep (`./subdir/db-name/`). Does not search deeper.

Never assume a database is named `codeql.db` — discover it by its marker file.

**When multiple databases are found:**

For each discovered database, collect metadata to help the user choose:

```bash
# For each database, extract language and creation time
for db in $FOUND_DBS; do
  CODEQL_LANG=$(codeql resolve database --format=json -- "$db" 2>/dev/null | jq -r '.languages[0]')
  CREATED=$(grep '^creationMetadata:' -A5 "$db/codeql-database.yml" 2>/dev/null | grep 'creationTime' | awk '{print $2}')
  echo "$db — language: $CODEQL_LANG, created: $CREATED"
done
```

Then use `AskUserQuestion` to let the user select which database to use, or to build a new one. **Skip `AskUserQuestion` if the user explicitly stated which database to use or to build a new one in their prompt.**

## Quick Start

For the common case ("scan this codebase for vulnerabilities"):

```bash
# 1. Verify CodeQL is installed
if ! command -v codeql >/dev/null 2>&1; then
  echo "NOT INSTALLED: codeql binary not found on PATH"
else
  codeql --version || echo "ERROR: codeql found but --version failed (check installation)"
fi

# 2. Resolve output directory
BASE="static_analysis_codeql"; N=1
while [ -e "${BASE}_${N}" ]; do N=$((N + 1)); done
OUTPUT_DIR="${BASE}_${N}"; mkdir -p "$OUTPUT_DIR"
```

Then execute the full pipeline: **build database → create data extensions → run analysis** using the workflows below.

## When to Use

- Scanning a codebase for security vulnerabilities with deep data flow analysis
- Building a CodeQL database from source code (with build capability for compiled languages)
- Finding complex vulnerabilities that require interprocedural taint tracking or AST/CFG analysis
- Performing comprehensive security audits with multiple query packs

## When NOT to Use

- **Writing custom queries** - Use a dedicated query development skill
- **CI/CD integration** - Use GitHub Actions documentation directly
- **Quick pattern searches** - Use Semgrep or grep for speed
- **No build capability** for compiled languages - Consider Semgrep instead
- **Single-file or lightweight analysis** - Semgrep is faster for simple pattern matching

## Rationalizations to Reject

These shortcuts lead to missed findings. Do not accept them:

- **"security-extended is enough"** - It is the baseline. Always check if Trail of Bits packs and Community Packs are available for the language. They catch categories `security-extended` misses entirely.
- **"The database built, so it's good"** - A database that builds does not mean it extracted well. Always run quality assessment and check file counts against expected source files.
- **"Data extensions aren't needed for standard frameworks"** - Even Django/Spring apps have custom wrappers that CodeQL does not model. Skipping extensions means missing vulnerabilities.
- **"build-mode=none is fine for compiled languages"** - It produces severely incomplete analysis. Only use as an absolute last resort. On macOS, try the arm64 toolchain workaround or Rosetta first.
- **"The build fails on macOS, just use build-mode=none"** - Exit code 137 is caused by `arm64e`/`arm64` mismatch, not a fundamental build failure. See [macos-arm64e-workaround.md](references/macos-arm64e-workaround.md).
- **"No findings means the code is secure"** - Zero findings can indicate poor database quality, missing models, or wrong query packs. Investigate before reporting clean results.
- **"I'll just run the default suite"** / **"I'll just pass the pack names directly"** - Each pack's `defaultSuiteFile` applies hidden filters and can produce zero results. Always use an explicit suite reference.
- **"I'll put files in the current directory"** - All generated files must go in `$OUTPUT_DIR`. Scattering files in the working directory makes cleanup impossible and risks overwriting previous runs.
- **"Just use the first database I find"** - Multiple databases may exist for different languages or from previous runs. When more than one is found, present all options to the user. Only skip the prompt when the user already specified which database to use.
- **"The user said 'scan', that means they want me to pick a database"** - "Scan" is not database selection. If multiple databases exist and the user didn't name one, ask.

---

## Workflow Selection

This skill has three workflows. **Once a workflow is selected, execute it step by step without skipping phases.**

| Workflow | Purpose |
|----------|---------|
| [build-database](workflows/build-database.md) | Create CodeQL database using build methods in sequence |
| [create-data-extensions](workflows/create-data-extensions.md) | Detect or generate data extension models for project APIs |
| [run-analysis](workflows/run-analysis.md) | Select rulesets, execute queries, process results |

### Auto-Detection Logic

**If user explicitly specifies** what to do (e.g., "build a database", "run analysis on ./my-db"), execute that workflow directly. **Do NOT call `AskUserQuestion` for database selection if the user's prompt already makes their intent clear** — e.g., "build a new database", "analyze the codeql database in static_analysis_codeql_2", "run a full scan from scratch".

**Default pipeline for "test", "scan", "analyze", or similar:** Discover existing databases first, then decide.

```bash
# Find ALL CodeQL databases by looking for codeql-database.yml marker file
# Search top-level dirs and one subdirectory deep
FOUND_DBS=()
while IFS= read -r yml; do
  db_dir=$(dirname "$yml")
  codeql resolve database -- "$db_dir" >/dev/null 2>&1 && FOUND_DBS+=("$db_dir")
done < <(find . -maxdepth 3 -name "codeql-database.yml" -not -path "*/\.*" 2>/dev/null)

echo "Found ${#FOUND_DBS[@]} existing database(s)"
```

| Condition | Action |
|-----------|--------|
| No databases found | Resolve new `$OUTPUT_DIR`, execute build → extensions → analysis (full pipeline) |
| One database found | Use `AskUserQuestion`: reuse it or build new? |
| Multiple databases found | Use `AskUserQuestion`: list all with metadata, let user pick one or build new |
| User explicitly stated intent | Skip `AskUserQuestion`, act on their instructions directly |

### Database Selection Prompt

When existing databases are found **and the user did not explicitly specify which to use**, present via `AskUserQuestion`:

```
header: "Existing CodeQL Databases"
question: "I found existing CodeQL database(s). What would you like to do?"
options:
  - label: "<db_path_1> (language: python, created: 2026-02-24)"
    description: "Reuse this database"
  - label: "<db_path_2> (language: cpp, created: 2026-02-23)"
    description: "Reuse this database"
  - label: "Build a new database"
    description: "Create a fresh database in a new output directory"
```

After selection:
- **If user picks an existing database:** Set `$OUTPUT_DIR` to its parent directory (or the directory containing it), set `$DB_NAME` to the selected path, then proceed to extensions → analysis.
- **If user picks "Build new":** Resolve a new `$OUTPUT_DIR`, execute build → extensions → analysis.

### General Decision Prompt

If the user's intent is ambiguous (neither database selection nor workflow is clear), ask:

```
I can help with CodeQL analysis. What would you like to do?

1. **Full scan (Recommended)** - Build database, create extensions, then run analysis
2. **Build database** - Create a new CodeQL database from this codebase
3. **Create data extensions** - Generate custom source/sink models for project APIs
4. **Run analysis** - Run security queries on existing database

[If databases found: "I found N existing database(s): <list paths with language>"]
[Show output directory: "Output will be stored in <OUTPUT_DIR>"]
```

---

## Reference Index

| File | Content |
|------|---------|
| **Workflows** | |
| [workflows/build-database.md](workflows/build-database.md) | Database creation with build method sequence |
| [workflows/create-data-extensions.md](workflows/create-data-extensions.md) | Data extension generation pipeline |
| [workflows/run-analysis.md](workflows/run-analysis.md) | Query execution and result processing |
| **References** | |
| [references/macos-arm64e-workaround.md ([source](https://raw.githubusercontent.com/trailofbits/skills/main/plugins/static-analysis/skills/codeql/references/macos-arm64e-workaround.md))](references/macos-arm64e-workaround.md) | Apple Silicon build tracing workarounds |
| [references/build-fixes.md](references/build-fixes.md) | Build failure fix catalog |
| [references/quality-assessment.md ([source](https://raw.githubusercontent.com/trailofbits/skills/main/plugins/static-analysis/skills/codeql/references/quality-assessment.md))](references/quality-assessment.md) | Database quality metrics and improvements |
| [references/extension-yaml-format.md ([source](https://raw.githubusercontent.com/trailofbits/skills/main/plugins/static-analysis/skills/codeql/references/extension-yaml-format.md))](references/extension-yaml-format.md) | Data extension YAML column definitions and examples |
| [references/sarif-processing.md](references/sarif-processing.md) | jq commands for SARIF output processing |
| [references/diagnostic-query-templates.md ([source](https://raw.githubusercontent.com/trailofbits/skills/main/plugins/static-analysis/skills/codeql/references/diagnostic-query-templates.md))](references/diagnostic-query-templates.md) | QL queries for source/sink enumeration |
| [references/important-only-suite.md ([source](https://raw.githubusercontent.com/trailofbits/skills/main/plugins/static-analysis/skills/codeql/references/important-only-suite.md))](references/important-only-suite.md) | Important-only suite template and generation |
| [references/run-all-suite.md](references/run-all-suite.md) | Run-all suite template |
| [references/ruleset-catalog.md](references/ruleset-catalog.md) | Available query packs by language |
| [references/threat-models.md](references/threat-models.md) | Threat model configuration |
| [references/language-details.md ([source](https://raw.githubusercontent.com/trailofbits/skills/main/plugins/static-analysis/skills/codeql/references/language-details.md))](references/language-details.md) | Language-specific build and extraction details |
| [references/performance-tuning.md](references/performance-tuning.md) | Memory, threading, and timeout configuration |

---

## Success Criteria

A complete CodeQL analysis run should satisfy:

- [ ] Output directory resolved (user-specified or auto-incremented default)
- [ ] All generated files stored inside `$OUTPUT_DIR`
- [ ] Database built (discovered via `codeql-database.yml` marker) with quality assessment passed (baseline LoC > 0, errors < 5%)
- [ ] Data extensions evaluated — either created in `$OUTPUT_DIR/extensions/` or explicitly skipped with justification
- [ ] Analysis run with explicit suite reference (not default pack suite)
- [ ] All installed query packs (official + Trail of Bits + Community) used or explicitly excluded
- [ ] Selected query packs logged to `$OUTPUT_DIR/rulesets.txt`
- [ ] Unfiltered results preserved in `$OUTPUT_DIR/raw/results.sarif`
- [ ] Final results in `$OUTPUT_DIR/results/results.sarif` (filtered for important-only, copied for run-all)
- [ ] Zero-finding results investigated (database quality, model coverage, suite selection)
- [ ] Build log preserved at `$OUTPUT_DIR/build.log` with all commands, fixes, and quality assessments


---

## Companion Files

The following reference files are included for convenience:

### references/build-fixes.md

# Build Fixes

Fixes to apply when a CodeQL database build method fails. Try these in order, then retry the current build method. **Log each fix attempt.**

## 1. Clean existing state

```bash
log_step "Applying fix: clean existing state"
rm -rf "$DB_NAME"
log_result "Removed $DB_NAME"
```

## 2. Clean build cache

```bash
log_step "Applying fix: clean build cache"
CLEANED=""
make clean 2>/dev/null && CLEANED="$CLEANED make"
rm -rf build CMakeCache.txt CMakeFiles 2>/dev/null && CLEANED="$CLEANED cmake-artifacts"
./gradlew clean 2>/dev/null && CLEANED="$CLEANED gradle"
mvn clean 2>/dev/null && CLEANED="$CLEANED maven"
cargo clean 2>/dev/null && CLEANED="$CLEANED cargo"
log_result "Cleaned: $CLEANED"
```

## 3. Install missing dependencies

> **Note:** The commands below install the *target project's* dependencies so CodeQL can trace the build. Use whatever package manager the target project expects (`pip`, `npm`, `go mod`, etc.) — these are not the skill's own tooling preferences.

```bash
log_step "Applying fix: install dependencies"

# Python — use target project's package manager (pip/uv/poetry)
if [ -f requirements.txt ]; then
  log_cmd "pip install -r requirements.txt"
  pip install -r requirements.txt 2>&1 | tee -a "$LOG_FILE"
fi
if [ -f setup.py ] || [ -f pyproject.toml ]; then
  log_cmd "pip install -e ."
  pip install -e . 2>&1 | tee -a "$LOG_FILE"
fi

# Node - log installed packages
if [ -f package.json ]; then
  log_cmd "npm install"
  npm install 2>&1 | tee -a "$LOG_FILE"
fi

# Go
if [ -f go.mod ]; then
  log_cmd "go mod download"
  go mod download 2>&1 | tee -a "$LOG_FILE"
fi

# Java - log downloaded dependencies
if [ -f build.gradle ] || [ -f build.gradle.kts ]; then
  log_cmd "./gradlew dependencies --refresh-dependencies"
  ./gradlew dependencies --refresh-dependencies 2>&1 | tee -a "$LOG_FILE"
fi
if [ -f pom.xml ]; then
  log_cmd "mvn dependency:resolve"
  mvn dependency:resolve 2>&1 | tee -a "$LOG_FILE"
fi

# Rust
if [ -f Cargo.toml ]; then
  log_cmd "cargo fetch"
  cargo fetch 2>&1 | tee -a "$LOG_FILE"
fi

log_result "Dependencies installed - see above for details"
```

## 4. Handle private registries

If dependencies require authentication, ask user:
```
AskUserQuestion: "Build requires private registry access. Options:"
  1. "I'll configure auth and retry"
  2. "Skip these dependencies"
  3. "Show me what's needed"
```

```bash
# Log authentication setup if performed
log_step "Private registry authentication configured"
log_result "Registry: <REGISTRY_URL>, Method: <AUTH_METHOD>"
```

**After fixes:** Retry current build method. If still fails, move to next method.

### references/sarif-processing.md

# SARIF Processing

jq commands for processing CodeQL SARIF output. Used in the run-analysis workflow Step 5.

> **SARIF structure note:** `security-severity` and `level` are stored on rule definitions (`.runs[].tool.driver.rules[]`), NOT on individual result objects. Results reference rules by `ruleIndex`. The jq commands below join results with their rule metadata.
>
> **Portability note:** These jq patterns assume CodeQL SARIF output where `ruleIndex` is populated. For SARIF from other tools (e.g., Semgrep), use `ruleId`-based lookups instead.

> **Directory convention:** Unfiltered output lives in `$RAW_DIR` (`$OUTPUT_DIR/raw`). Final results live in `$RESULTS_DIR` (`$OUTPUT_DIR/results`). The summary commands below operate on `$RESULTS_DIR/results.sarif` (the final output).

## Count Findings

```bash
jq '.runs[].results | length' "$RESULTS_DIR/results.sarif"
```

## Summary by SARIF Level

```bash
jq -r '
  .runs[] |
  . as $run |
  .results[] |
  ($run.tool.driver.rules[.ruleIndex].defaultConfiguration.level // "unknown")
' "$RESULTS_DIR/results.sarif" \
  | sort | uniq -c | sort -rn
```

## Summary by Security Severity (most useful for triage)

```bash
jq -r '
  .runs[] |
  . as $run |
  .results[] |
  ($run.tool.driver.rules[.ruleIndex].properties["security-severity"] // "none") + " | " +
  .ruleId + " | " +
  (.locations[0].physicalLocation.artifactLocation.uri // "?") + ":" +
  ((.locations[0].physicalLocation.region.startLine // 0) | tostring) + " | " +
  (.message.text // "no message" | .[0:80])
' "$RESULTS_DIR/results.sarif" | sort -rn | head -20
```

## Summary by Rule

```bash
jq -r '.runs[].results[] | .ruleId' "$RESULTS_DIR/results.sarif" \
  | sort | uniq -c | sort -rn
```

## Important-Only Post-Filter

If scan mode is "important only", filter out medium-precision results with `security-severity` < 6.0 from the report. The suite includes all medium-precision security queries to let CodeQL evaluate them, but low-severity medium-precision findings are noise.

The filter reads from `$RAW_DIR/results.sarif` (unfiltered) and writes to `$RESULTS_DIR/results.sarif` (final). The raw file is preserved unmodified.

```bash
# Filter important-only results: drop medium-precision findings with security-severity < 6.0
# Medium-precision queries without a security-severity score default to 0.0 (excluded).
# Non-medium queries are always kept regardless of security-severity.
# Reads from raw/, writes to results/ — preserving the unfiltered original.
RAW_DIR="$OUTPUT_DIR/raw"
RESULTS_DIR="$OUTPUT_DIR/results"
jq '
  .runs[] |= (
    . as $run |
    .results = [
      .results[] |
      ($run.tool.driver.rules[.ruleIndex].properties.precision // "unknown") as $prec |
      ($run.tool.driver.rules[.ruleIndex].properties["security-severity"] // null) as $raw_sev |
      (if $prec == "medium" then ($raw_sev // "0" | tonumber) else 10 end) as $sev |
      select(
        ($prec == "high") or ($prec == "very-high") or ($prec == "unknown") or
        ($prec == "medium" and $sev >= 6.0)
      )
    ]
  )
' "$RAW_DIR/results.sarif" > "$RESULTS_DIR/results.sarif"
```

### references/run-all-suite.md

# Run-All Query Suite

In run-all mode, generate a custom `.qls` query suite file at runtime. This ensures all queries from all installed packs actually execute, avoiding the silent filtering caused by each pack's `defaultSuiteFile`.

## Why a Custom Suite

When you pass a pack name directly to `codeql database analyze` (e.g., `-- codeql/cpp-queries`), CodeQL uses the pack's `defaultSuiteFile` field from `qlpack.yml`. For official packs, this is typically `codeql-suites/<lang>-code-scanning.qls`, which applies strict precision and severity filters. This silently drops many queries and can produce zero results for small codebases.

The run-all suite explicitly references the broadest built-in suite (`security-and-quality`) for official packs and loads third-party packs with minimal filtering.

## Suite Template

Generate this file as `run-all.qls` in the results directory before running analysis:

```yaml
- description: Run-all — all security and quality queries from all installed packs
# Official queries: use security-and-quality suite (broadest built-in suite)
- import: codeql-suites/<CODEQL_LANG>-security-and-quality.qls
  from: codeql/<CODEQL_LANG>-queries
# Third-party packs (include only if installed, one entry per pack)
# - queries: .
#   from: trailofbits/<CODEQL_LANG>-queries
# - queries: .
#   from: GitHubSecurityLab/CodeQL-Community-Packs-<CODEQL_LANG>
# Minimal filtering — only select alert-type queries
- include:
    kind:
      - problem
      - path-problem
- exclude:
    deprecated: //
- exclude:
    tags contain:
      - modeleditor
      - modelgenerator
```

## Generation Script

```bash
RAW_DIR="$OUTPUT_DIR/raw"
SUITE_FILE="$RAW_DIR/run-all.qls"

# NOTE: CODEQL_LANG must be set before running this script (e.g., CODEQL_LANG=cpp)
# NOTE: INSTALLED_THIRD_PARTY_PACKS must be a space-separated list of pack names

cat > "$SUITE_FILE" << HEADER
- description: Run-all — all security and quality queries from all installed packs
- import: codeql-suites/${CODEQL_LANG}-security-and-quality.qls
  from: codeql/${CODEQL_LANG}-queries
HEADER

# Add each installed third-party pack
for PACK in $INSTALLED_THIRD_PARTY_PACKS; do
  cat >> "$SUITE_FILE" << PACK_ENTRY
- queries: .
  from: ${PACK}
PACK_ENTRY
done

# Append minimal filtering rules (quoted heredoc — no expansion needed)
cat >> "$SUITE_FILE" << 'FILTERS'
- include:
    kind:
      - problem
      - path-problem
- exclude:
    deprecated: //
- exclude:
    tags contain:
      - modeleditor
      - modelgenerator
FILTERS

# Verify the suite resolves correctly
: "${CODEQL_LANG:?ERROR: CODEQL_LANG must be set before generating suite}"
: "${SUITE_FILE:?ERROR: SUITE_FILE must be set}"

if ! codeql resolve queries "$SUITE_FILE" | wc -l; then
  echo "ERROR: Suite file failed to resolve. Check CODEQL_LANG=$CODEQL_LANG and installed packs."
fi
echo "Suite generated: $SUITE_FILE"
```

## How This Differs From Important-Only

| Aspect | Run all | Important only |
|--------|---------|----------------|
| Official pack suite | `security-and-quality` (all security + code quality) | All queries loaded, filtered by precision |
| Third-party packs | All `problem`/`path-problem` queries | Only `security`-tagged queries with precision metadata |
| Precision filter | None | high/very-high always; medium only if security-severity >= 6.0 |
| Post-analysis filter | None | Drops medium-precision results with security-severity < 6.0 |

### references/ruleset-catalog.md

# Ruleset Catalog

## Official CodeQL Suites

| Suite | False Positives | Use Case |
|-------|-----------------|----------|
| `security-extended` | Low | **Default** - Security audits |
| `security-and-quality` | Medium | Comprehensive review |
| `security-experimental` | Higher | Research, vulnerability hunting |

**Usage:** `codeql/<lang>-queries:codeql-suites/<lang>-security-extended.qls`

**Languages:** `cpp`, `csharp`, `go`, `java`, `javascript`, `python`, `ruby`, `swift`

---

## Trail of Bits Packs

| Pack | Language | Focus |
|------|----------|-------|
| `trailofbits/cpp-queries` | C/C++ | Memory safety, integer overflows |
| `trailofbits/go-queries` | Go | Concurrency, error handling |
| `trailofbits/java-queries` | Java | Security, code quality |

**Install:**
```bash
codeql pack download trailofbits/cpp-queries
codeql pack download trailofbits/go-queries
codeql pack download trailofbits/java-queries
```

---

## CodeQL Community Packs

| Pack | Language |
|------|----------|
| `GitHubSecurityLab/CodeQL-Community-Packs-JavaScript` | JavaScript/TypeScript |
| `GitHubSecurityLab/CodeQL-Community-Packs-Python` | Python |
| `GitHubSecurityLab/CodeQL-Community-Packs-Go` | Go |
| `GitHubSecurityLab/CodeQL-Community-Packs-Java` | Java |
| `GitHubSecurityLab/CodeQL-Community-Packs-CPP` | C/C++ |
| `GitHubSecurityLab/CodeQL-Community-Packs-CSharp` | C# |
| `GitHubSecurityLab/CodeQL-Community-Packs-Ruby` | Ruby |

**Install:**
```bash
codeql pack download GitHubSecurityLab/CodeQL-Community-Packs-<Lang>
```

**Source:** [github.com/GitHubSecurityLab/CodeQL-Community-Packs](https://github.com/GitHubSecurityLab/CodeQL-Community-Packs)

---

## Verify Installation

```bash
# List all installed packs
codeql resolve qlpacks

# Check specific packs
codeql resolve qlpacks | grep -E "(trailofbits|GitHubSecurityLab)"
```

### references/threat-models.md

# Threat Models Reference

Control which source categories are active during CodeQL analysis. By default, only `remote` sources are tracked.

## Available Models

| Model | Sources Included | When to Enable | False Positive Impact |
|-------|------------------|----------------|----------------------|
| `remote` | HTTP requests, network input | Always (default). Covers web services, APIs, network-facing code. | Low — these are the most common attack vectors. |
| `local` | Command line args, local files | CLI tools, batch processors, desktop apps where local users are untrusted. | Medium — generates noise for web-only services where CLI args are developer-controlled. |
| `environment` | Environment variables | Apps that read config from env vars at runtime (12-factor apps, containers). Skip for apps that only read env at startup into validated config objects. | Medium — many env reads are startup-only config, not runtime-tainted data. |
| `database` | Database query results | Second-order injection scenarios: stored XSS, data from shared databases where other writers are untrusted. | High — most apps trust their own database. Only enable when auditing for stored/second-order attacks. |
| `file` | File contents | File upload processors, log parsers, config file readers that accept user-provided files. | Medium — triggers on all file reads including trusted config files. |

## Default Behavior

With no `--threat-model` flag, CodeQL uses `remote` only (the `default` group). This is correct for most web applications and APIs. Expanding beyond `remote` is useful when the application's trust boundary extends to local inputs.

## Usage

Enable additional threat models with the `--threat-model` flag (singular, NOT `--threat-models`):

```bash
# Web service (default — remote only, no flag needed)
codeql database analyze codeql.db \
  -- results/suite.qls

# CLI tool — local users can provide malicious input
codeql database analyze codeql.db \
  --threat-model local \
  -- results/suite.qls

# Container app reading env vars from untrusted orchestrator
codeql database analyze codeql.db \
  --threat-model local --threat-model environment \
  -- results/suite.qls

# Full coverage — audit mode for all input vectors
codeql database analyze codeql.db \
  --threat-model all \
  -- results/suite.qls

# Enable all except database (to reduce noise)
codeql database analyze codeql.db \
  --threat-model all --threat-model '!database' \
  -- results/suite.qls
```

The `--threat-model` flag can be repeated. Each invocation adds (or removes with `!` prefix) a threat model group. The `remote` group is always enabled by default — use `--threat-model '!default'` to disable it (rare). The `all` group enables everything, and `!<name>` disables a specific model.

Multiple models can be combined. Each additional model expands the set of sources CodeQL considers tainted, increasing coverage but potentially increasing false positives. Start with the narrowest set that matches the application's actual threat model, then expand if needed.

### references/performance-tuning.md

# Performance Tuning

## Memory Configuration

### CODEQL_RAM Environment Variable

Control maximum heap memory (in MB):

```bash
# 48GB for large codebases
CODEQL_RAM=48000 codeql database analyze codeql.db ...

# 16GB for medium codebases
CODEQL_RAM=16000 codeql database analyze codeql.db ...
```

**Guidelines:**
| Codebase Size | Recommended RAM |
|---------------|-----------------|
| Small (<100K LOC) | 4-8 GB |
| Medium (100K-1M LOC) | 8-16 GB |
| Large (1M+ LOC) | 32-64 GB |

## Thread Configuration

### Analysis Threads

```bash
# Use all available cores
codeql database analyze codeql.db --threads=0 ...

# Use specific number
codeql database analyze codeql.db --threads=8 ...
```

**Note:** `--threads=0` uses all available cores. For shared machines, use explicit count.

## Query-Level Timeouts

Prevent individual queries from running indefinitely:

```bash
# Set per-query timeout (in milliseconds)
codeql database analyze codeql.db --timeout=600000 ...
```

A 10-minute timeout (`600000`) catches runaway queries without killing legitimate complex analysis. Taint-tracking queries on large codebases may need longer.

## Evaluator Diagnostics

When analysis is slow, use `--evaluator-log` to identify which queries consume the most time:

```bash
codeql database analyze codeql.db \
  --evaluator-log=evaluator.log \
  --format=sarif-latest \
  --output=results.sarif \
  -- codeql/python-queries:codeql-suites/python-security-extended.qls

# Summarize the log
codeql generate log-summary evaluator.log --format=text
```

The summary shows per-query timing and tuple counts. Queries producing millions of tuples are likely the bottleneck.

## Disk Space

| Phase | Typical Size | Notes |
|-------|-------------|-------|
| Database creation | 2-10x source size | Compiled languages are larger due to build tracing |
| Analysis cache | 1-5 GB | Stored in database directory |
| SARIF output | 1-50 MB | Depends on finding count |

Check available space before starting:

```bash
df -h .
du -sh codeql_*.db 2>/dev/null
```

## Caching Behavior

CodeQL caches query evaluation results inside the database directory. Subsequent runs of the same queries skip re-evaluation.

| Scenario | Cache Effect |
|----------|-------------|
| Re-run same packs | Fast — uses cached results |
| Add new query pack | Only new queries evaluate |
| `codeql database cleanup` | Clears cache — forces full re-evaluation |
| `--rerun` flag | Ignores cache for this run |

**When to clear cache:**
- After deploying new data extensions (cache may hold stale results)
- When investigating unexpected zero-finding results
- Before benchmark comparisons (ensures consistent timing)

```bash
# Clear evaluation cache
codeql database cleanup codeql_1.db
```

## Troubleshooting Performance

| Symptom | Likely Cause | Solution |
|---------|--------------|----------|
| OOM during analysis | Not enough RAM | Increase `CODEQL_RAM` |
| Slow database creation | Complex build | Use `--threads`, simplify build |
| Slow query execution | Large codebase | Reduce query scope, add RAM |
| Database too large | Too many files | Use exclusion config (`codeql-config.yml` with `paths-ignore`) |
| Single query hangs | Runaway evaluation | Use `--timeout` and check `--evaluator-log` |
| Repeated runs still slow | Cache not used | Check you're using same database path |

Originally by Trail of Bits, adapted here as an Agent Skills compatible SKILL.md.

Works with

Agent Skills format — supported by 20+ editors. Learn more