Skip to content

Python API Reference

fdl exposes the same operations available in the CLI as a Python package, so pipelines and orchestrators (e.g. Dagster, Airflow) can drive DuckLake catalogs without spawning a subprocess.

At a glance

import fdl

fdl.init("mydata", target_url="s3://my-bucket")
fdl.pull("default")

with fdl.connect("default") as conn:
    conn.execute("CREATE TABLE t (x INTEGER)")
    conn.execute("INSERT INTO t VALUES (1), (2), (3)")

fdl.push("default")

Function-to-command mapping

Python API CLI
fdl.init(name, ...) fdl init NAME
fdl.pull(target) fdl pull TARGET
fdl.push(target) fdl push TARGET
fdl.run(target, command) fdl run TARGET -- COMMAND
fdl.sync(target, command) fdl sync TARGET -- COMMAND
fdl.connect(target) (use directly via DuckDB)

fdl.connect() is the Python-only entry point — the CLI uses it internally to implement fdl sql.

Conventions

  • target is a required positional argument. There is no implicit "default" target; pass the target name explicitly.
  • Each function accepts a project_dir: Path | None keyword. When omitted, fdl walks up from the current working directory to find the nearest fdl.toml, mirroring CLI behavior.
  • Console output (progress lines, conflict detection, etc.) matches the CLI and is written to stdout.
  • fdl.run() and fdl.sync() return the subprocess exit code as an int. They do not raise on non-zero exit; check the return value.
  • fdl.init() is not idempotent: it raises FileExistsError if fdl.toml is already present. Initialize once (typically via the CLI) and commit fdl.toml to the repo; the Python API is for day-to-day operations, not reinitialization.

See Dagster for a worked example of using these APIs inside a Dagster asset.

Reference

fdl

Frozen DuckLake: manage DuckLake catalogs on object storage.

fdl is both a CLI (fdl) and a Python API. The CLI handlers are thin wrappers over the same internal functions exposed as the Python API, so the two stay in sync by construction.

Example

import fdl fdl.pull("default") with fdl.connect("default") as conn: ... conn.execute("CREATE TABLE cities (name VARCHAR, pop INTEGER)") fdl.push("default")

init

init(name: str, *, target_name: str = 'default', target_url: str | None = None, public_url: str = 'http://localhost:4001', project_dir: Path | None = None) -> None

Initialize an fdl project (CLI: fdl init).

Writes fdl.toml and creates .fdl/{target_name}/ducklake.sqlite. On failure, partially created fdl.toml / .fdl/ are rolled back.

PARAMETER DESCRIPTION
name

Datasource name. Must be a valid SQL identifier.

TYPE: str

target_name

Target name in fdl.toml.

TYPE: str DEFAULT: 'default'

target_url

Storage URL for the target. Defaults to :func:default_target_url.

TYPE: str | None DEFAULT: None

public_url

Public URL the dataset will be served under.

TYPE: str DEFAULT: 'http://localhost:4001'

project_dir

Directory to initialize in. Defaults to Path.cwd().

TYPE: Path | None DEFAULT: None

RAISES DESCRIPTION
ValueError

If name is not a valid SQL identifier.

FileExistsError

If fdl.toml already exists in the project.

sync

sync(target: str, command: list[str] | None = None, *, force: bool = False, project_dir: Path | None = None) -> int

Run command then push in one step (CLI: fdl sync).

When the command exits non-zero, push is skipped and the exit code is returned as-is.

PARAMETER DESCRIPTION
target

Target name defined in fdl.toml.

TYPE: str

command

Command to run. When None, reads the command field from fdl.toml.

TYPE: list[str] | None DEFAULT: None

force

Override conflict detection on push.

TYPE: bool DEFAULT: False

project_dir

Project directory containing fdl.toml. Defaults to the nearest ancestor that contains one.

TYPE: Path | None DEFAULT: None

RETURNS DESCRIPTION
int

The subprocess exit code (0 if both run and push succeeded).

RAISES DESCRIPTION
FileNotFoundError

If fdl.toml cannot be located.

ValueError

If target is not defined in fdl.toml, or if command is None and no command is set in fdl.toml.

PushConflictError

When the remote has been updated since the last pull (only when force=False).

connect

connect(target: str, *, project_dir: Path | None = None) -> Iterator[duckdb.DuckDBPyConnection]

Open a DuckDB connection with the DuckLake catalog attached.

The datasource (from fdl.toml name) is attached and selected via USE, so table references can be bare::

with fdl.connect("default") as conn:
    conn.execute("CREATE TABLE cities (...)")
    rows = conn.execute("SELECT * FROM cities").fetchall()
PARAMETER DESCRIPTION
target

Target name defined in fdl.toml.

TYPE: str

project_dir

Project directory containing fdl.toml. Defaults to the nearest ancestor that contains one.

TYPE: Path | None DEFAULT: None

YIELDS DESCRIPTION
DuckDBPyConnection

A DuckDB connection with the DuckLake catalog attached.

RAISES DESCRIPTION
FileNotFoundError

If fdl.toml or the local catalog file cannot be located. Run fdl.init or fdl.pull first.

ValueError

If target is not defined in fdl.toml.

default_target_url

default_target_url() -> str

Default target URL ($XDG_DATA_HOME/fdl or ~/.local/share/fdl).

Returns a display-friendly path using ~ when under the home directory.

fdl_target_dir

fdl_target_dir(target_name: str) -> Path

Target-specific directory under .fdl/.

ducklake_data_path

ducklake_data_path(catalog_url: str) -> str

Derive DuckLake DATA_PATH from a catalog URL or path.