Replicate

Added March 5, 2026 Source: Replicate

Access and run a vast collection of AI models using the Replicate API. This skill handles finding appropriate models, validating inputs against their schemas, and managing prediction requests to retrieve results. It’s useful for adding diverse machine learning features to your agent without hosting models yourself.

Installation

This skill is self-contained. Copy the SKILL.md below directly into your project to get started.

.claude/skills/replicate/SKILL.md    # Claude Code
.cursor/skills/replicate/SKILL.md    # Cursor

Or install as a personal skill (available across all your projects):

~/.claude/skills/replicate/SKILL.md

You can also install using the skills CLI:

npx skills add replicate/skills --skill replicate

Requires Node.js 18+.

SKILL.md

---
name: replicate
description: Discover, compare, and run AI models using Replicate's API
---

## Docs

- Reference docs: https://replicate.com/docs/llms.txt
- HTTP API schema: https://api.replicate.com/openapi.json
- MCP server: https://mcp.replicate.com
- Set an `Accept: text/markdown` header when requesting docs pages to get a Markdown response.

## Workflow

Here's a common workflow for using Replicate's API to run a model:

1. **Choose the right model** - Search with the API or ask the user
2. **Get model metadata** - Fetch model input and output schema via API
3. **Create prediction** - POST to /v1/predictions
4. **Poll for results** - GET prediction until status is "succeeded"
5. **Return output** - Usually URLs to generated content

## Choosing models

- Use the search and collections APIs to find and compare the best models. Do not list all the models via API, as it's basically a firehose.
- Collections are curated by Replicate staff, so they're vetted.
- Official models are in the "official" collection.
- Use official models because they:
  - are always running
  - have stable API interfaces
  - have predictable output pricing
  - are maintained by Replicate staff
- If you must use a community model, be aware that it can take a long time to boot.
- You can create always-on deployments of community models, but you pay for model uptime.

## Running models

Models take time to run. There are three ways to run a model via API and get its output:

1. Create a prediction, store its id from the response, and poll until completion.
2. Set a `Prefer: wait` header when creating a prediction for a blocking synchronous response. Only recommended for very fast models.
3. Set an HTTPS webhook URL when creating a prediction, and Replicate will POST to that URL when the prediction completes.

Follow these guideliness when running models:

- Use the "POST /v1/predictions" endpoint, as it supports both official and community models.
- Every model has its own OpenAPI schema. Always fetch and check model schemas to make sure you're setting valid inputs. Even popular models change their schemas.
- Validate input parameters against schema constraints (minimum, maximum, enum values). Don't generate values that violate them.
- When unsure about a parameter value, use the model's default example or omit the optional parameter.
- Don't set optional inputs unless you have a reason to. Stick to the required inputs and let the model's defaults do the work.
- Use HTTPS URLs for file inputs whenever possible. You can also send base64-encoded files, but they should be avoided.
- Fire off multiple predictions concurrently. Don't wait for one to finish before starting the next.
- Output file URLs expire after 1 hour, so back them up if you need to keep them, using a service like Cloudflare R2.
- Webhooks are a good mechanism for receiving and storing prediction output.

Originally by Replicate, adapted here as an Agent Skills compatible SKILL.md.

This skill follows the Agent Skills open standard, supported by Claude Code, Cursor, Codex, Gemini CLI, and 20+ more editors.

Works with

Claude Code Cursor Codex CLI Gemini CLI VS Code Windsurf Amp Roo Code Goose Trae OpenCode Spring AI

Agent Skills format — supported by 20+ editors. Learn more