π Mold Scripting
Mold scripts are written in the Python subset supported by Monty, a Rust implementation of Python's core semantics. No system Python required.
π― The transform function
Your script must define a function named transform that receives data and returns the result. Only declare the parameters you need β args, env, and headers are passed as keyword arguments:
# Minimal β data only
def transform(data, **_):
return [item for item in data if item["active"]]
# With args
def transform(data, args, **_):
return [item for item in data if item["age"] > int(args["min_age"])]
# Full signature β all parameters
def transform(data, args, env, headers, pipeline):
# process data
return data
β‘ Inline expressions (-e)
For quick one-liners, use -e. The expression receives data and its return value is the output:
fimod s -i data.json -e '[u for u in data if u["active"]]'
fimod s -i data.json -e '{"count": len(data)}'
Multi-statement expressions need an explicit def transform:
fimod s -i data.json -e '
def transform(data, args, env, headers):
result = []
for item in data:
result.append(item["name"].upper())
return result
'
π¦ Data types
Data arrives as standard Python types:
| Type | When |
|---|---|
dict {} |
JSON objects, YAML/TOML mappings |
list [] |
JSON arrays, CSV datasets, NDJSON lines |
str |
TXT input (raw content), Lines elements |
int, float, bool, None |
Primitives |
π§° Built-in functions
fimod injects a set of helpers into every mold β no import needed. See Built-ins Reference for complete signatures.
π Regex (re_*)
Powered by fancy-regex β supports lookahead, lookbehind, backreferences, atomic groups. Not Python's re module β see Built-ins Reference for the full syntax differences.
# π§ Find all email addresses
def transform(data, args, env, headers):
return re_findall(r"\w+@\w+\.\w+", data["text"])
# π§Ή Clean whitespace
def transform(data, args, env, headers):
return {"cleaned": re_sub(r"\s+", " ", data["text"])}
# βοΈ Split on multiple delimiters
def transform(data, args, env, headers):
return re_split(r"[,;]\s*", data["tags"])
# π€ Lookahead β extract usernames from emails
def transform(data, args, env, headers):
return re_findall(r"\w+(?=@)", data["text"])
# π Capture groups β extract structured data
def transform(data, args, env, headers):
m = re_search(r"(?P<user>\w+)@(?P<domain>\w+)", data["email"])
if m:
return {"user": m["groups"][0], "domain": m["named"]["domain"]}
return None
# π Replacement with group references β Python syntax (\1, \g<name>)
def transform(data, args, env, headers):
return re_sub(r"(\w+)@(\w+)", r"\2/\1", data["text"])
# π Named group replacement
def transform(data, args, env, headers):
return re_sub(r"(?P<user>\w+)@(?P<domain>\w+)", r"\g<domain>/\g<user>", data["text"])
# π’ Replace only first N occurrences (count argument)
def transform(data, args, env, headers):
return re_sub(r"\d+", "X", data["text"], 1) # replace first match only
Available: re_search Β· re_match Β· re_findall Β· re_sub Β· re_split
And their _fancy counterparts: re_search_fancy Β· re_match_fancy Β· re_findall_fancy Β· re_sub_fancy Β· re_split_fancy
Two syntaxes for replacements
re_sub uses Python re syntax: \1, \2, \g<name>.
re_sub_fancy uses fancy-regex syntax: $1, $2, ${name}.
For all other functions (re_search, re_match, re_findall, re_split), the _fancy variants are identical β provided for API consistency in fancy-mode molds.
ποΈ Dotpath (dp_*)
Navigate and mutate nested structures without chained dict/array accesses:
def transform(data, args, env, headers):
city = dp_get(data, "user.address.city")
country = dp_get(data, "user.address.country", "unknown") # with default
last = dp_get(data, "items.-1") # π’ negative index = from end
return {"city": city, "country": country}
# dp_set returns a new deep copy β original unchanged
def transform(data, args, env, headers):
data = dp_set(data, "meta.processed", True)
data = dp_set(data, "config.db.host", "localhost")
return data
π Iteration helpers (it_*)
Convenience functions for common list/dict operations:
# π Group by field name (string, not lambda!)
def transform(data, args, env, headers):
return it_group_by(data, "department")
# πΌ Sort by field
def transform(data, args, env, headers):
return it_sort_by(data, "age")
# π§Ή Deduplicate by field (keeps first occurrence)
def transform(data, args, env, headers):
return it_unique_by(data, "email")
# π Recursive flatten: [1, [2, [3, 4]]] β [1, 2, 3, 4]
def transform(data, args, env, headers):
return it_flatten(data["nested_lists"])
# π Unique primitives
def transform(data, args, env, headers):
return it_unique(data["tags"])
Field name, not lambda
it_group_by, it_sort_by, and it_unique_by take a field name string β not a lambda function.
#οΈβ£ Hash functions (hs_*)
# π Anonymize PII
def transform(data, args, env, headers):
for user in data:
user["email"] = hs_sha256(user["email"])
return data
# π Stable ID from composite key
def transform(data, args, env, headers):
for row in data:
row["id"] = hs_md5(f"{row['name']}|{row['dob']}")
return data
Available: hs_md5 Β· hs_sha1 Β· hs_sha256 β all return lowercase hex strings.
π Templating (tpl_*)
Generate any text file from data using Jinja2 templates β Dockerfiles, nginx configs, k8s manifests, reports, .env files. This extends fimod from dataβdata to dataβtext.
Inline templates with tpl_render_str β great for one-liners and small molds:
def transform(data, args, env, headers):
return tpl_render_str("""
FROM python:{{ python_version }}-slim
{% for pkg in packages %}
RUN pip install {{ pkg }}
{% endfor %}
COPY . /app
CMD {{ cmd | tojson }}
""", data)
File templates with tpl_render_from_mold β for larger templates, keep .j2 files alongside the mold for clean separation of logic and presentation:
my_mold/
βββ my_mold.py # Python logic
βββ templates/
βββ Dockerfile.j2 # Jinja2 template
βββ compose.yaml.j2
# my_mold/my_mold.py
"""Generate Dockerfile from project config."""
# fimod: output-format=txt
def transform(data, args, env, headers):
tpl = args.get("template", "Dockerfile.j2")
return tpl_render_from_mold(f"templates/{tpl}", data)
All Jinja2 features are available: loops, conditions, filters (upper, join, tojson, default, selectattr, β¦), macros, and {% break %}/{% continue %}. Dict key order is preserved.
Tip
Combine with --output-format txt (or # fimod: output-format=txt in mold defaults) so the rendered text is written as-is, without JSON quoting.
Available: tpl_render_str(template, ctx) Β· tpl_render_from_mold(path, ctx) β see Built-ins Reference for auto_escape option and the Quick Tour for more examples.
π’ Message logging (msg_*)
Output diagnostic messages to stderr β useful for progress, warnings, and debugging without polluting stdout:
def transform(data, args, env, headers):
msg_info(f"Processing {len(data)} records")
for row in data:
if not row.get("email"):
msg_warn("Record missing email: " + str(row.get("id")))
return data
Available: msg_print (no prefix) Β· msg_info ([INFO]) Β· msg_warn ([WARN]) Β· msg_error ([ERROR])
π‘οΈ Validation gates (gk_*)
Assert conditions and fail the pipeline with a non-zero exit code:
def transform(data, args, env, headers):
gk_assert(data.get("version"), "missing 'version' field")
gk_warn(len(data.get("items", [])) > 0, "items list is empty")
if data.get("coverage", 0) < 80:
gk_fail(f"Coverage {data['coverage']}% below 80% threshold")
return data
Available: gk_fail(msg) Β· gk_assert(cond, msg) Β· gk_warn(cond, msg) β see Built-ins Reference for truthiness rules.
π‘οΈ Sandbox-gated stdlib calls
A few stdlib calls that read host state are gated by the sandbox policy. They raise PermissionError when the policy denies them β catch it if you want a graceful fallback.
| Call | Gate | Default |
|---|---|---|
datetime.now() |
allow_clock = true |
denied |
date.today() |
allow_clock = true |
denied |
os.getenv(KEY) |
allow_env glob matches KEY |
denied (empty list) |
os.environ |
(always) | denied |
pathlib.Path I/O |
(always) | denied |
# Clock β requires allow_clock = true
from datetime import datetime
def transform(data, **_):
data["stamped_at"] = datetime.now().isoformat()
return data
# Env β requires "LANG" to match an allow_env pattern
import os
def transform(data, **_):
try:
data["locale"] = os.getenv("LANG")
except PermissionError:
data["locale"] = None
return data
Bootstrap the canonical sandbox file with fimod setup sandbox defaults --yes, then edit it to grant the clock / env keys your molds need. For ad-hoc runs, pass --sandbox-file <path> to point at a specific policy, or --sandbox-file="" to force zero-authorization.
π Environment substitution
env_subst(template, dict) replaces ${VAR} placeholders using a dict:
π¦ Exit control
set_exit(code) sets the process exit code without stopping execution:
When combined with --check, set_exit takes priority for the exit code β see Exit Codes.
π CSV headers global
When the input is CSV with a header row, fimod injects a headers global (list of column names in file order):
def transform(data, args, env, headers):
# headers = ["name", "age", "email"] β auto-injected by fimod
return {"columns": headers, "count": len(data)}
# π’ Generic numeric column processing
def transform(data, args, env, headers):
numeric_cols = [h for h in headers if h.endswith("_amount")]
for row in data:
row["total"] = sum(float(row[c]) for c in numeric_cols)
return data
Note
headers is only available when the input has a header row. Not injected with --csv-no-input-header.
π The pipeline parameter
pipeline gives a mold live access to the step chain it is running inside. Declare it only when you need it:
Introspection
pipeline.current_step() returns the current step as a handle. Fields are read via step.get('key') β direct attribute access (step.index) and indexing (step['index']) are not supported (single API surface).
| Key | Type | Description |
|---|---|---|
'index' |
int |
0-based position in the chain |
'input_format' |
str | None |
Effective input format |
'output_format' |
str | None |
Output format override |
'input' |
str | None |
Input file path |
'output' |
str | None |
Output file path |
'in_place' |
bool |
--in-place flag |
'slurp' |
bool |
--slurp flag |
'no_input' |
bool |
--no-input flag |
'args' |
dict |
Merged args (CLI βͺ Step.create.args, spec wins) for the current step; spec args (or {}) for a future step |
def transform(data, pipeline, **_):
step = pipeline.current_step()
return {
"step": step.get('index'),
"total": pipeline.length(),
"fmt": step.get('output_format'),
}
pipeline.step(i) accesses any step by absolute index β current or future, never past:
# Middle step of a 3-step chain reads its own index
fimod s -i data.json -e data -e "pipeline.current_step().get('index')" -e data
# β 1
Unknown keys raise Step.get('<key>'): unknown field.
Mutating the current step
Use step.set(key, value) to mutate a step. Writable keys:
| Key | Value type | Effect | Where |
|---|---|---|---|
'exit' |
int |
Set process exit code | current or future |
'output_format' |
str |
Override output serialization format | current or future |
'input_format' |
str |
Force re-parse before the next step | current or future |
'output_file' |
str |
Override output file path | current or future |
'args' |
dict |
Replace the entire args block (CLI --arg merge still applied at exec) |
future only |
def transform(data, pipeline, **_):
if not data.get("valid"):
pipeline.current_step().set('exit', 1)
return data
Works on future steps too β the mutation is queued and applied just before the target step runs:
Global functions still available
set_exit(), set_output_format(), set_output_file() remain available as global functions.
Injecting steps dynamically
Add new steps to the running chain with Step.create(expr=...) (inline expression) or Step.create(mold=...) (mold reference):
def transform(data, pipeline, **_):
# Insert a step immediately after this one
pipeline.insert_next(Step.create(expr="data[:100]"))
pipeline.insert_next(Step.create(mold="@my-registry/sort"))
# Append at the end of the chain
pipeline.append(Step.create(expr="len(data)"))
return data
Step.create() arguments:
| Arg | Required | Description |
|---|---|---|
expr= |
one of | Inline expression (like -e) |
mold= |
one of | Mold reference (like -m) |
input_format= |
no | Force input format for the injected step |
output_format= |
no | Set output format for the injected step |
args= |
no | Per-step args dict propagated to the injected mold (heterogeneous types: bool, int, nested dicts). Merged with CLI --arg at exec time β spec values win on conflict. |
pipeline.append(Step.create(
mold="@my-registry/normalize",
args={"strict": True, "limits": {"rows": 1000}},
))
Snapshot semantics for pipeline.length() and pipeline.step(i)
Both are computed at the start of each step. A step injected by step i
via insert_next / append is only visible from step i+1 onwards β
inside the same mold run, pipeline.length() and pipeline.step(j) still
reflect the chain as it was when the current step started. Plan injections
accordingly: you cannot read or mutate a step you've just appended in the
same transform() call.
βοΈ Mold defaults
Scripts can embed default CLI options via # fimod: directives at the very top of the file:
# fimod: input-format=csv, output-format=json
# fimod: csv-delimiter=;
def transform(data, args, env, headers):
return [{"name": row["name"], "age": int(row["age"])} for row in data]
CLI wins by default β != locks a directive
Explicit CLI arguments override mold defaults declared with =.
Use != to force a value the caller cannot override (e.g. output-format!=yaml).
See Mold Defaults for all supported directives.
π The args dict
--arg name=value populates the args parameter of transform(data, args, env, headers):
def transform(data, args, env, headers):
limit = int(args["threshold"])
prefix = args.get("prefix", "")
return [u for u in data if u["name"].startswith(prefix) and u["age"] > limit]
When no --arg is passed, args is an empty dict {}.
β Available Python features
- List/dict comprehensions
- Ternary expressions (
x if cond else y) - String methods:
.upper(),.strip(),.split(),.replace(),.startswith(), ... - Dict methods:
.get(),.keys(),.values(),.items(),.pop() -
for/whileloops,if/elif/else -
in/not inoperators -
isinstance(),len(),int(),str(),float(),bool() - f-strings (
f"Hello {name}",f"{x:.2f}",f"{x!r}") - Nested functions, multiple return values (tuples)
- All built-in helpers (
re_*,re_*_fancy,dp_*,it_*,hs_*,tpl_*,msg_*,gk_*,env_subst,set_exit,set_input_format,set_output_format,set_output_file,Step.create(...))
β Monty limitations
-
importβ onlyre,math,datetime,json,sys,typing,asyncio,pathlib,os(partial) -
delstatement - File I/O, network, system calls