๐ Mold Scripting
Mold scripts are written in the Python subset supported by Monty, a Rust implementation of Python's core semantics. No system Python required.
๐ฏ The transform function
Your script must define a function named transform that accepts data and returns the result:
โก Inline expressions (-e)
For quick one-liners, use -e. The expression receives data and its return value is the output:
fimod s -i data.json -e '[u for u in data if u["active"]]'
fimod s -i data.json -e '{"count": len(data)}'
Multi-statement expressions need an explicit def transform:
fimod s -i data.json -e '
def transform(data, args, env, headers):
result = []
for item in data:
result.append(item["name"].upper())
return result
'
๐ฆ Data types
Data arrives as standard Python types:
| Type | When |
|---|---|
dict {} |
JSON objects, YAML/TOML mappings |
list [] |
JSON arrays, CSV datasets, NDJSON lines |
str |
TXT input (raw content), Lines elements |
int, float, bool, None |
Primitives |
๐งฐ Built-in functions
fimod injects a set of helpers into every mold โ no import needed. See Built-ins Reference for complete signatures.
๐ Regex (re_*)
Powered by fancy-regex โ supports lookahead, lookbehind, backreferences, atomic groups. Not Python's re module โ see Built-ins Reference for the full syntax differences.
# ๐ง Find all email addresses
def transform(data, args, env, headers):
return re_findall(r"\w+@\w+\.\w+", data["text"])
# ๐งน Clean whitespace
def transform(data, args, env, headers):
return {"cleaned": re_sub(r"\s+", " ", data["text"])}
# โ๏ธ Split on multiple delimiters
def transform(data, args, env, headers):
return re_split(r"[,;]\s*", data["tags"])
# ๐ค Lookahead โ extract usernames from emails
def transform(data, args, env, headers):
return re_findall(r"\w+(?=@)", data["text"])
# ๐ Capture groups โ extract structured data
def transform(data, args, env, headers):
m = re_search(r"(?P<user>\w+)@(?P<domain>\w+)", data["email"])
if m:
return {"user": m["groups"][0], "domain": m["named"]["domain"]}
return None
# ๐ Replacement with group references โ Python syntax (\1, \g<name>)
def transform(data, args, env, headers):
return re_sub(r"(\w+)@(\w+)", r"\2/\1", data["text"])
# ๐ Named group replacement
def transform(data, args, env, headers):
return re_sub(r"(?P<user>\w+)@(?P<domain>\w+)", r"\g<domain>/\g<user>", data["text"])
# ๐ข Replace only first N occurrences (count argument)
def transform(data, args, env, headers):
return re_sub(r"\d+", "X", data["text"], 1) # replace first match only
Available: re_search ยท re_match ยท re_findall ยท re_sub ยท re_split
And their _fancy counterparts: re_search_fancy ยท re_match_fancy ยท re_findall_fancy ยท re_sub_fancy ยท re_split_fancy
Two syntaxes for replacements
re_sub uses Python re syntax: \1, \2, \g<name>.
re_sub_fancy uses fancy-regex syntax: $1, $2, ${name}.
For all other functions (re_search, re_match, re_findall, re_split), the _fancy variants are identical โ provided for API consistency in fancy-mode molds.
๐๏ธ Dotpath (dp_*)
Navigate and mutate nested structures without chained dict/array accesses:
def transform(data, args, env, headers):
city = dp_get(data, "user.address.city")
country = dp_get(data, "user.address.country", "unknown") # with default
last = dp_get(data, "items.-1") # ๐ข negative index = from end
return {"city": city, "country": country}
# dp_set returns a new deep copy โ original unchanged
def transform(data, args, env, headers):
data = dp_set(data, "meta.processed", True)
data = dp_set(data, "config.db.host", "localhost")
return data
๐ Iteration helpers (it_*)
Convenience functions for common list/dict operations:
# ๐ Group by field name (string, not lambda!)
def transform(data, args, env, headers):
return it_group_by(data, "department")
# ๐ผ Sort by field
def transform(data, args, env, headers):
return it_sort_by(data, "age")
# ๐งน Deduplicate by field (keeps first occurrence)
def transform(data, args, env, headers):
return it_unique_by(data, "email")
# ๐ Recursive flatten: [1, [2, [3, 4]]] โ [1, 2, 3, 4]
def transform(data, args, env, headers):
return it_flatten(data["nested_lists"])
# ๐ Unique primitives
def transform(data, args, env, headers):
return it_unique(data["tags"])
Field name, not lambda
it_group_by, it_sort_by, and it_unique_by take a field name string โ not a lambda function.
#๏ธโฃ Hash functions (hs_*)
# ๐ Anonymize PII
def transform(data, args, env, headers):
for user in data:
user["email"] = hs_sha256(user["email"])
return data
# ๐ Stable ID from composite key
def transform(data, args, env, headers):
for row in data:
row["id"] = hs_md5(f"{row['name']}|{row['dob']}")
return data
Available: hs_md5 ยท hs_sha1 ยท hs_sha256 โ all return lowercase hex strings.
๐ข Message logging (msg_*)
Output diagnostic messages to stderr โ useful for progress, warnings, and debugging without polluting stdout:
def transform(data, args, env, headers):
msg_info(f"Processing {len(data)} records")
for row in data:
if not row.get("email"):
msg_warn("Record missing email: " + str(row.get("id")))
return data
Available: msg_print (no prefix) ยท msg_info ([INFO]) ยท msg_warn ([WARN]) ยท msg_error ([ERROR])
๐ก๏ธ Validation gates (gk_*)
Assert conditions and fail the pipeline with a non-zero exit code:
def transform(data, args, env, headers):
gk_assert(data.get("version"), "missing 'version' field")
gk_warn(len(data.get("items", [])) > 0, "items list is empty")
if data.get("coverage", 0) < 80:
gk_fail(f"Coverage {data['coverage']}% below 80% threshold")
return data
Available: gk_fail(msg) ยท gk_assert(cond, msg) ยท gk_warn(cond, msg) โ see Built-ins Reference for truthiness rules.
๐ Environment substitution
env_subst(template, dict) replaces ${VAR} placeholders using a dict:
๐ฆ Exit control
set_exit(code) sets the process exit code without stopping execution:
When combined with --check, set_exit takes priority for the exit code โ see Exit Codes.
๐ CSV headers global
When the input is CSV with a header row, fimod injects a headers global (list of column names in file order):
def transform(data, args, env, headers):
# headers = ["name", "age", "email"] โ auto-injected by fimod
return {"columns": headers, "count": len(data)}
# ๐ข Generic numeric column processing
def transform(data, args, env, headers):
numeric_cols = [h for h in headers if h.endswith("_amount")]
for row in data:
row["total"] = sum(float(row[c]) for c in numeric_cols)
return data
Note
headers is only available when the input has a header row. Not injected with --csv-no-input-header.
โ๏ธ Mold defaults
Scripts can embed default CLI options via # fimod: directives at the very top of the file:
# fimod: input-format=csv, output-format=json
# fimod: csv-delimiter=;
def transform(data, args, env, headers):
return [{"name": row["name"], "age": int(row["age"])} for row in data]
CLI always wins
Explicit CLI arguments always override mold defaults.
See Mold Defaults for all supported directives.
๐ The args dict
--arg name=value populates the args parameter of transform(data, args, env, headers):
def transform(data, args, env, headers):
limit = int(args["threshold"])
prefix = args.get("prefix", "")
return [u for u in data if u["name"].startswith(prefix) and u["age"] > limit]
When no --arg is passed, args is an empty dict {}.
โ Available Python features
- List/dict comprehensions
- Ternary expressions (
x if cond else y) - String methods:
.upper(),.strip(),.split(),.replace(),.startswith(), ... - Dict methods:
.get(),.keys(),.values(),.items(),.pop() -
for/whileloops,if/elif/else -
in/not inoperators -
isinstance(),len(),int(),str(),float(),bool() - f-strings (
f"Hello {name}",f"{x:.2f}",f"{x!r}") - Nested functions, multiple return values (tuples)
- All built-in helpers (
re_*,re_*_fancy,dp_*,it_*,hs_*,msg_*,gk_*,env_subst,set_exit,set_input_format,set_output_format,set_output_file)
โ Monty limitations
-
importโ no stdlib, no modules -
delstatement - File I/O, network, system calls