Tinybird Best Practices

Added March 5, 2026 Source: Tinybird

Apply Tinybird best practices for file formats, SQL rules, and data modeling. This skill guides you through creating datasources, pipes, and endpoints, ensuring your SQL queries are optimized and your schemas are well-designed. It also includes patterns for materialized views and data deduplication.

Installation

This skill is self-contained. Copy the SKILL.md below directly into your project to get started.

.claude/skills/tinybird-best-practices/SKILL.md    # Claude Code
.cursor/skills/tinybird-best-practices/SKILL.md    # Cursor

Or install as a personal skill (available across all your projects):

~/.claude/skills/tinybird-best-practices/SKILL.md

You can also install using the skills CLI:

npx skills add tinybirdco/tinybird-agent-skills --skill tinybird

Requires Node.js 18+.

SKILL.md

---
name: tinybird
description: Tinybird file formats, SQL rules, optimization patterns, and best practices for datasources, pipes, endpoints, and materialized views.
---

# Tinybird Best Practices

Guidance for Tinybird file formats, SQL rules, optimization patterns, and data modeling. Use this skill when creating or editing Tinybird datafiles.

## When to Apply

- Creating or updating Tinybird resources (.datasource, .pipe, .connection)
- Writing or optimizing SQL queries
- Designing endpoint schemas and data models
- Working with materialized views or copy pipes
- Implementing deduplication patterns
- Reviewing or refactoring Tinybird project files

## Rule Files

- `rules/project-files.md`
- `rules/datasource-files.md`
- `rules/pipe-files.md`
- `rules/endpoint-files.md`
- `rules/materialized-files.md`
- `rules/sink-files.md`
- `rules/copy-files.md`
- `rules/connection-files.md`
- `rules/sql.md`
- `rules/endpoint-optimization.md` ([source](https://raw.githubusercontent.com/tinybirdco/tinybird-agent-skills/main/skills/tinybird-best-practices/rules/endpoint-optimization.md))
- `rules/tests.md`
- `rules/deduplication-patterns.md`

## Quick Reference

- Project local files are the source of truth.
- SQL is SELECT-only with Tinybird templating rules and strict parameter handling.
- Use MergeTree by default; AggregatingMergeTree for materialized targets.
- Filter early, select only needed columns, push complex work later in the pipeline.


---

## Companion Files

The following reference files are included for convenience:

### rules/project-files.md

# Project Files

## Project Root

- By default, create a `tinybird/` folder at the project root and nest Tinybird folders under it.
- Ensure the `.tinyb` credentials file is at the same level where the CLI commands are run.

## tb info

Use `tb info` to confirm CLI context, especially for credentials issues.

It reports information about Local and Cloud environments:
- Where the CLI is loading the `.tinyb` file from
- Current logged workspace
- API URL
- UI URL
- ClickHouse HTTP interface URL

It can show values for both Cloud and Local environments.

## File Locations

Default locations (use these unless the project uses a different structure):

- Endpoints: `/endpoints`
- Materialized pipes: `/materializations`
- Sink pipes: `/sinks`
- Copy pipes: `/copies`
- Connections: `/connections`
- Datasources: `/datasources`
- Fixtures: `/fixtures`

## File-Specific Rules

See these rule files for detailed requirements:

- `rules/datasource-files.md`
- `rules/pipe-files.md`
- `rules/endpoint-files.md`
- `rules/materialized-files.md`
- `rules/sink-files.md`
- `rules/copy-files.md`
- `rules/connection-files.md`

After making changes in the project files, check `rules/build-deploy.md` for next steps.

### rules/datasource-files.md

# Datasource Files

- Content cannot be empty.
- Datasource names must be unique.
- No indentation for property names (DESCRIPTION, SCHEMA, ENGINE, etc.).
- Use MergeTree by default.
- Use AggregatingMergeTree for materialized targets.
- Always use JSON paths for schema (example: `user_id` String `json:$.user_id`).
- Array syntax: `items` Array(String) `json:$.items[:]`.
- DateTime64 requires precision (use DateTime64(3)).
- Only include ENGINE_PARTITION_KEY and ENGINE_PRIMARY_KEY when explicitly requested.
- Import configuration:
  - S3/GCS: set IMPORT_CONNECTION_NAME, IMPORT_BUCKET_URI, IMPORT_SCHEDULE (GCS supports @on-demand only, S3 supports @auto).
  - Kafka: set KAFKA_CONNECTION_NAME, KAFKA_TOPIC, KAFKA_GROUP_ID.
- For landing datasources created from a .ndjson file with no schema specified, use:
  - `SCHEMA >`
  - `` `data` String `json:$` ``

Example:

```
DESCRIPTION >
    Some meaningful description of the datasource

SCHEMA >
    `column_name_1` Type `json:$.column_name_1`,
    `column_name_2` Type `json:$.column_name_2`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "partition_key"
ENGINE_SORTING_KEY "sorting_key_1, sorting_key_2"
```

## Updating Datasource Schemas (Cloud)

If a schema change is incompatible with the deployed Cloud datasource, add a FORWARD_QUERY to transform data to the new schema. The query is a SELECT list only (no FROM/WHERE). Use accurateCastOrDefault for lossy conversions.

Example:

```
FORWARD_QUERY >
    SELECT timestamp, CAST(session_id, 'UUID') as session_id, action, version, payload
```

## Sharing Datasources

```
SHARED_WITH >
    destination_workspace,
    other_destination_workspace
```

Limitations:
- Shared datasources are read-only.
- You cannot share a shared datasource.
- You cannot create a materialized view from a shared datasource.

### rules/pipe-files.md

# Pipe Files (General)

- Pipe names must be unique.
- Node names must differ from the pipe name and any resource name.
- No indentation for property names (DESCRIPTION, NODE, SQL, TYPE, etc.).
- Allowed TYPE values: endpoint, copy, materialized, sink.
- Add the output node in the TYPE section or in the last node.

Example:

```
DESCRIPTION >
    Some meaningful description of the pipe

NODE node_1
SQL >
    SELECT ...
TYPE endpoint
```

### rules/endpoint-files.md

# Endpoint Files

Endpoint files are `.pipe` files with `TYPE endpoint` and should live under `/endpoints`.

- Follow all general pipe rules.
- Ensure SQL follows Tinybird SQL rules (templating, SELECT-only, parameters).
- Include the output node in TYPE or in the last node.

Example:

```
DESCRIPTION >
    Some meaningful description of the endpoint

NODE endpoint_node
SQL >
    SELECT ...
TYPE endpoint
```

## Endpoint URLs

- Run `tb endpoint ls` to list all endpoints and their URLs.
- Include dynamic parameters when needed.
- Date formats:
  - DateTime64: `YYYY-MM-DD HH:MM:SS.MMM`
  - DateTime: `YYYY-MM-DD HH:MM:SS`
  - Date: `YYYYMMDD`

## OpenAPI definitions

- curl `<api_base_url>/v0/pipes/openapi.json?token=<token>` to get the OpenAPI definition for all endpoints.

### rules/materialized-files.md

# Materialized Pipe Files

- Do not create by default unless requested.
- Create under `/materializations`.
- Use TYPE MATERIALIZED and set DATASOURCE to the target datasource.
- Use State modifiers in the pipe; use AggregateFunction in the target datasource.
- Use Merge modifiers when reading AggregateFunction columns.
- Put all dimensions in ENGINE_SORTING_KEY, ordered from least to most cardinality.

Example:

```
NODE daily_sales
SQL >
    SELECT toStartOfDay(starting_date) day, country, sumState(sales) as total_sales
    FROM teams
    GROUP BY day, country

TYPE MATERIALIZED
DATASOURCE sales_by_hour
```

Target datasource example:

```
SCHEMA >
    `total_sales` AggregateFunction(sum, Float64),
    `sales_count` AggregateFunction(count, UInt64),
    `dimension_1` String,
    `dimension_2` String,
    `date` DateTime

ENGINE "AggregatingMergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(date)"
ENGINE_SORTING_KEY "date, dimension_1, dimension_2"
```

## Usual gotchas
- Materialized Views work as insert triggers, which means a delete or truncate operation on your original Data Source doesn't affect the related Materialized Views.

- As transformation and ingestion in the Materialized View is done on each block of inserted data in the original Data Source, some operations such as GROUP BY, ORDER BY, DISTINCT and LIMIT might need a specific engine, such as AggregatingMergeTree or SummingMergeTree, which can handle data aggregations.

- The Data Source resulting from a Materialized View generated using JOIN is automatically updated only if and when a new operation is performed over the Data Source in the FROM.

### rules/sink-files.md

# Sink Pipe Files

- Do not create by default unless requested.
- Create under `/sinks`.
- Valid external systems: Kafka, S3, GCS.
- Sink pipes depend on a connection; reuse existing connections when possible.
- Do not include EXPORT_SCHEDULE unless explicitly requested.
- Use TYPE SINK and set EXPORT_CONNECTION_NAME.

Example:

```
DESCRIPTION Sink Pipe to export sales hour every hour using my_connection

NODE daily_sales
SQL >
    %
    SELECT toStartOfDay(starting_date) day, country, sum(sales) as total_sales
    FROM teams
    WHERE day BETWEEN toStartOfDay(now()) - interval 1 day AND toStartOfDay(now())
    and country = {{ String(country, 'US')}}
    GROUP BY day, country

TYPE sink
EXPORT_CONNECTION_NAME "my_connection"
EXPORT_BUCKET_URI "s3://tinybird-sinks"
EXPORT_FILE_TEMPLATE "daily_prices"
EXPORT_SCHEDULE "*/5 * * * *"
EXPORT_FORMAT "csv"
EXPORT_COMPRESSION "gz"
EXPORT_STRATEGY "truncate"
```

### rules/copy-files.md

# Copy Pipe Files

- Do not create by default unless requested.
- Create under `/copies`.
- Do not include COPY_SCHEDULE unless explicitly requested.
- Use TYPE COPY and TARGET_DATASOURCE.
- The default `copy_mode` is `append`; but it's better if you set it explicitly. The other option is `replace`

Example:

```
DESCRIPTION Copy Pipe to export sales hour every hour to the sales_hour_copy Data Source

NODE daily_sales
SQL >
    %
    SELECT toStartOfDay(starting_date) day, country, sum(sales) as total_sales
    FROM teams
    WHERE day BETWEEN toStartOfDay(now()) - interval 1 day AND toStartOfDay(now())
    and country = {{ String(country, 'US')}}
    GROUP BY day, country

TYPE COPY
TARGET_DATASOURCE sales_hour_copy
COPY_SCHEDULE 0 * * * *
COPY_MODE append
```

### rules/connection-files.md

# Connection Files

- Content cannot be empty.
- Connection names must be unique.
- No indentation for property names.
- Supported types: kafka, gcs, s3.
- If user requests an unsupported type, report it and do not create it.

Kafka example:
```
TYPE kafka
KAFKA_BOOTSTRAP_SERVERS {{ tb_secret("PRODUCTION_KAFKA_SERVERS", "localhost:9092") }}
KAFKA_SECURITY_PROTOCOL SASL_SSL
KAFKA_SASL_MECHANISM PLAIN
KAFKA_KEY {{ tb_secret("PRODUCTION_KAFKA_USERNAME", "") }}
KAFKA_SECRET {{ tb_secret("PRODUCTION_KAFKA_PASSWORD", "") }}
```

S3 example:
```
TYPE s3
S3_REGION {{ tb_secret("PRODUCTION_S3_REGION", "") }}
S3_ARN {{ tb_secret("PRODUCTION_S3_ARN", "") }}
```

GCS service account example:
```
TYPE gcs
GCS_SERVICE_ACCOUNT_CREDENTIALS_JSON {{ tb_secret("PRODUCTION_GCS_SERVICE_ACCOUNT_CREDENTIALS_JSON", "") }}
```

GCS HMAC example:
```
TYPE gcs
GCS_HMAC_ACCESS_ID {{ tb_secret("gcs_hmac_access_id") }}
GCS_HMAC_SECRET {{ tb_secret("gcs_hmac_secret") }}
```

### rules/sql.md

# SQL Rules

## Core Principles

1. Filter early and read as little data as possible.
2. Select only needed columns.
3. Do complex work later in the pipeline.
4. Prefer ClickHouse functions; only supported functions are allowed.

## Query Requirements

- SQL must be valid ClickHouse SQL with Tinybird templating (Tornado).
- Only SELECT statements are allowed.
- Avoid CTEs; use nodes or subqueries instead.
- Do not use system tables (system.tables, system.datasources, information_schema.tables).
- Do not use CREATE/INSERT/DELETE/TRUNCATE or currentDatabase().

## Parameter and Templating Rules

- If parameters are used, the query must start with `%` on its own line.
- Parameter functions: String, DateTime, Date, Float32, Float64, Int, Integer, UInt8, UInt16, UInt32, UInt64, UInt128, UInt256, Int8, Int16, Int32, Int64, Int128, Int256.
- Parameter names must be different from column names.
- Default values must be hardcoded.
- Parameters are never quoted.
- In `defined()` checks, do not quote the parameter name.

Bad:
```
SELECT * FROM events WHERE session_id={{String(my_param, "default")}}
```

Good:
```
%
SELECT * FROM events WHERE session_id={{String(my_param, "default")}}
```

## Join and Aggregation Rules

- Filter before JOINs and GROUP BY.
- Avoid joining tables with >1M rows without filtering.
- Avoid nested aggregates; use subqueries instead.
- Use AggregateFunction columns with -Merge combinators.

## Operation Order

1. WHERE filters
2. Select needed columns
3. JOIN
4. GROUP BY / aggregates
5. ORDER BY
6. LIMIT

## External Tables

Iceberg:
```
FROM iceberg('s3://bucket/path/to/table', {{tb_secret('aws_access_key_id')}}, {{tb_secret('aws_secret_access_key')}})
```

Postgres:
```
FROM postgresql({{ tb_secret("db_host_port") }}, 'database', 'table', {{tb_secret('db_username')}}, {{tb_secret('db_password')}}, 'schema_optional')
```

Do not split host and port into multiple secrets.

### rules/tests.md

# Tests

- Test file name must match the pipe name.
- Scenario names must be unique inside a test file.
- Parameters format: `param1=value1&param2=value2`.
- Preserve case and formatting when user provides parameters.
- If no parameters, create a single test with empty parameters.
- Use fixture data for expected results; do not query endpoints or SQL to infer data.
- Before creating tests, analyze fixture files used by the endpoint tables.
- `expected_result` should always be an empty string; the tool fills it.
- Only create tests when explicitly requested (e.g. "Create tests for this endpoint").
- If asked to "test" or "call" an endpoint, call the endpoint instead of creating tests.

Test format:

```
- name: kpis_single_day
  description: Test hourly granularity for a single day
  parameters: date_from=2024-01-01&date_to=2024-01-01
  expected_result: ''
```

### rules/deduplication-patterns.md

# Deduplication and Lambda Architecture

Strategies for handling duplicates and combining batch with real-time processing.

## Deduplication Strategy Selection

| Strategy | When to use |
|----------|-------------|
| Query-time (`argMax`, `LIMIT BY`, subquery) | Prototyping or small datasets |
| ReplacingMergeTree | Large datasets, need latest row per key |
| Periodic snapshots (Copy Pipes) | Freshness not critical, need rollups or different sorting keys |
| Lambda architecture | Need freshness + complex transformations that MVs can't handle |

For dimensional/small tables, periodic full replace is usually best.

## Query-time Deduplication

```sql
-- argMax: get latest value per key
SELECT post_id, argMax(views, updated_at) as views
FROM posts GROUP BY post_id

-- LIMIT BY
SELECT * FROM posts ORDER BY updated_at DESC LIMIT 1 BY post_id

-- Subquery
SELECT * FROM posts WHERE (post_id, updated_at) IN (
    SELECT post_id, max(updated_at) FROM posts GROUP BY post_id
)
```

## ReplacingMergeTree

```
ENGINE "ReplacingMergeTree"
ENGINE_SORTING_KEY "unique_id"
ENGINE_VER "updated_at"
ENGINE_IS_DELETED "is_deleted"  -- optional, UInt8: 1=deleted, 0=active
```

- Always query with `FINAL` or use alternative deduplication method
- Deduplication happens during merges (asynchronous, uncontrollable)
- **Do not** build AggregatingMergeTree MVs on top of ReplacingMergeTree—MVs only see incoming blocks, not merged state, so duplicates persist

```sql
SELECT * FROM posts FINAL WHERE post_id = {{Int64(post_id)}}
```

## Snapshot-based Deduplication (Copy Pipes)

Use Copy Pipes when:
- ReplacingMergeTree + FINAL is too slow
- You need different sorting keys that change with updates
- You need downstream Materialized Views for rollups
- The default `copy_mode` is `append`.
- Use `COPY_MODE replace` for full refreshes when the table is not massive and you don't control when duplicates can occur.
- Keep `COPY_MODE append` (default) when you do control duplicate generation and can process incrementally.

```
NODE generate_snapshot
SQL >
    SELECT post_id, argMax(views, updated_at) as views, max(updated_at) as updated_at
    FROM posts_raw
    GROUP BY post_id

TYPE COPY
TARGET_DATASOURCE posts_snapshot
COPY_SCHEDULE 0 * * * *
COPY_MODE replace
```

## Lambda Architecture

Combine batch snapshots with real-time queries when:
- Aggregating over ReplacingMergeTree (MVs fail—they only see blocks, not merged state)
- Window functions requiring full table scans
- CDC workloads
- `uniqState` performance is problematic
- endpoints that require JOINs at query time

### Pattern

1. **Batch layer**: Copy Pipe creates periodic deduplicated snapshots or intermediate tables.
2. **Real-time layer**: Query fresh data since last snapshot  
3. **Serving layer**: UNION ALL combines both

```sql
SELECT * FROM posts_snapshot
UNION ALL
SELECT post_id, argMax(views, updated_at) as views, max(updated_at) as updated_at
FROM posts_raw
WHERE updated_at > (SELECT max(updated_at) FROM posts_snapshot)
GROUP BY post_id
```

### Freshness vs Cost Trade-off

- More frequent Copy Pipe runs = fresher snapshots but higher cost
- Less frequent = stale batch layer but real-time layer covers the gap
- Balance based on query patterns and data volume

## argMax with Null Values

**Warning**: `argMaxMerge` prefers non-null values over null, even with lower timestamps.

Workaround—convert nulls to epoch before aggregation:
```sql
SELECT post_id,
    argMaxState(CASE WHEN flagged_at IS NULL THEN toDateTime('1970-01-01 00:00:00') ELSE flagged_at END, updated_at) as flagged_at
FROM posts
GROUP BY post_id
```

Handle the sentinel value in downstream queries.

Originally by Tinybird, adapted here as an Agent Skills compatible SKILL.md.

This skill follows the Agent Skills open standard, supported by Claude Code, Cursor, Codex, Gemini CLI, and 20+ more editors.

Works with

Claude Code Cursor Codex CLI Gemini CLI VS Code Windsurf Amp Roo Code Goose Trae OpenCode Spring AI

Agent Skills format — supported by 20+ editors. Learn more