Skip to content

๐Ÿ Mold Scripting

Mold scripts are written in the Python subset supported by Monty, a Rust implementation of Python's core semantics. No system Python required.

๐ŸŽฏ The transform function

Your script must define a function named transform that accepts data and returns the result:

def transform(data, args, env, headers):
    # process data
    return data

โšก Inline expressions (-e)

For quick one-liners, use -e. The expression receives data and its return value is the output:

fimod s -i data.json -e '[u for u in data if u["active"]]'
fimod s -i data.json -e '{"count": len(data)}'

Multi-statement expressions need an explicit def transform:

fimod s -i data.json -e '
def transform(data, args, env, headers):
    result = []
    for item in data:
        result.append(item["name"].upper())
    return result
'

๐Ÿ“ฆ Data types

Data arrives as standard Python types:

Type When
dict {} JSON objects, YAML/TOML mappings
list [] JSON arrays, CSV datasets, NDJSON lines
str TXT input (raw content), Lines elements
int, float, bool, None Primitives

๐Ÿงฐ Built-in functions

fimod injects a set of helpers into every mold โ€” no import needed. See Built-ins Reference for complete signatures.

๐Ÿ” Regex (re_*)

Powered by fancy-regex โ€” supports lookahead, lookbehind, backreferences, atomic groups. Not Python's re module โ€” see Built-ins Reference for the full syntax differences.

# ๐Ÿ“ง Find all email addresses
def transform(data, args, env, headers):
    return re_findall(r"\w+@\w+\.\w+", data["text"])

# ๐Ÿงน Clean whitespace
def transform(data, args, env, headers):
    return {"cleaned": re_sub(r"\s+", " ", data["text"])}

# โœ‚๏ธ Split on multiple delimiters
def transform(data, args, env, headers):
    return re_split(r"[,;]\s*", data["tags"])

# ๐Ÿ‘ค Lookahead โ€” extract usernames from emails
def transform(data, args, env, headers):
    return re_findall(r"\w+(?=@)", data["text"])

# ๐Ÿ“‹ Capture groups โ€” extract structured data
def transform(data, args, env, headers):
    m = re_search(r"(?P<user>\w+)@(?P<domain>\w+)", data["email"])
    if m:
        return {"user": m["groups"][0], "domain": m["named"]["domain"]}
    return None

# ๐Ÿ”„ Replacement with group references โ€” Python syntax (\1, \g<name>)
def transform(data, args, env, headers):
    return re_sub(r"(\w+)@(\w+)", r"\2/\1", data["text"])

# ๐Ÿ”„ Named group replacement
def transform(data, args, env, headers):
    return re_sub(r"(?P<user>\w+)@(?P<domain>\w+)", r"\g<domain>/\g<user>", data["text"])

# ๐Ÿ”ข Replace only first N occurrences (count argument)
def transform(data, args, env, headers):
    return re_sub(r"\d+", "X", data["text"], 1)   # replace first match only

Available: re_search ยท re_match ยท re_findall ยท re_sub ยท re_split

And their _fancy counterparts: re_search_fancy ยท re_match_fancy ยท re_findall_fancy ยท re_sub_fancy ยท re_split_fancy

Two syntaxes for replacements

re_sub uses Python re syntax: \1, \2, \g<name>. re_sub_fancy uses fancy-regex syntax: $1, $2, ${name}. For all other functions (re_search, re_match, re_findall, re_split), the _fancy variants are identical โ€” provided for API consistency in fancy-mode molds.

๐Ÿ—‚๏ธ Dotpath (dp_*)

Navigate and mutate nested structures without chained dict/array accesses:

def transform(data, args, env, headers):
    city    = dp_get(data, "user.address.city")
    country = dp_get(data, "user.address.country", "unknown")  # with default
    last    = dp_get(data, "items.-1")   # ๐Ÿ”ข negative index = from end
    return {"city": city, "country": country}

# dp_set returns a new deep copy โ€” original unchanged
def transform(data, args, env, headers):
    data = dp_set(data, "meta.processed", True)
    data = dp_set(data, "config.db.host", "localhost")
    return data

๐Ÿ” Iteration helpers (it_*)

Convenience functions for common list/dict operations:

# ๐Ÿ“‚ Group by field name (string, not lambda!)
def transform(data, args, env, headers):
    return it_group_by(data, "department")

# ๐Ÿ”ผ Sort by field
def transform(data, args, env, headers):
    return it_sort_by(data, "age")

# ๐Ÿงน Deduplicate by field (keeps first occurrence)
def transform(data, args, env, headers):
    return it_unique_by(data, "email")

# ๐ŸŒ€ Recursive flatten: [1, [2, [3, 4]]] โ†’ [1, 2, 3, 4]
def transform(data, args, env, headers):
    return it_flatten(data["nested_lists"])

# ๐Ÿ”‘ Unique primitives
def transform(data, args, env, headers):
    return it_unique(data["tags"])

Field name, not lambda

it_group_by, it_sort_by, and it_unique_by take a field name string โ€” not a lambda function.

#๏ธโƒฃ Hash functions (hs_*)

# ๐Ÿ”’ Anonymize PII
def transform(data, args, env, headers):
    for user in data:
        user["email"] = hs_sha256(user["email"])
    return data

# ๐Ÿ”‘ Stable ID from composite key
def transform(data, args, env, headers):
    for row in data:
        row["id"] = hs_md5(f"{row['name']}|{row['dob']}")
    return data

Available: hs_md5 ยท hs_sha1 ยท hs_sha256 โ€” all return lowercase hex strings.

๐Ÿ“ข Message logging (msg_*)

Output diagnostic messages to stderr โ€” useful for progress, warnings, and debugging without polluting stdout:

def transform(data, args, env, headers):
    msg_info(f"Processing {len(data)} records")
    for row in data:
        if not row.get("email"):
            msg_warn("Record missing email: " + str(row.get("id")))
    return data

Available: msg_print (no prefix) ยท msg_info ([INFO]) ยท msg_warn ([WARN]) ยท msg_error ([ERROR])

๐Ÿ›ก๏ธ Validation gates (gk_*)

Assert conditions and fail the pipeline with a non-zero exit code:

def transform(data, args, env, headers):
    gk_assert(data.get("version"), "missing 'version' field")
    gk_warn(len(data.get("items", [])) > 0, "items list is empty")
    if data.get("coverage", 0) < 80:
        gk_fail(f"Coverage {data['coverage']}% below 80% threshold")
    return data

Available: gk_fail(msg) ยท gk_assert(cond, msg) ยท gk_warn(cond, msg) โ€” see Built-ins Reference for truthiness rules.

๐Ÿ”„ Environment substitution

env_subst(template, dict) replaces ${VAR} placeholders using a dict:

def transform(data, args, env, headers):
    return env_subst("https://${HOST}:${PORT}/api", env)
fimod s -i data.json --env 'HOST,PORT' -e 'env_subst("${HOST}:${PORT}", env)' --output-format txt

๐Ÿšฆ Exit control

set_exit(code) sets the process exit code without stopping execution:

def transform(data, args, env, headers):
    if not data.get("valid"):
        set_exit(1)
    return data

When combined with --check, set_exit takes priority for the exit code โ€” see Exit Codes.


๐Ÿ“Š CSV headers global

When the input is CSV with a header row, fimod injects a headers global (list of column names in file order):

def transform(data, args, env, headers):
    # headers = ["name", "age", "email"]  โ† auto-injected by fimod
    return {"columns": headers, "count": len(data)}

# ๐Ÿ”ข Generic numeric column processing
def transform(data, args, env, headers):
    numeric_cols = [h for h in headers if h.endswith("_amount")]
    for row in data:
        row["total"] = sum(float(row[c]) for c in numeric_cols)
    return data

Note

headers is only available when the input has a header row. Not injected with --csv-no-input-header.


โš™๏ธ Mold defaults

Scripts can embed default CLI options via # fimod: directives at the very top of the file:

# fimod: input-format=csv, output-format=json
# fimod: csv-delimiter=;
def transform(data, args, env, headers):
    return [{"name": row["name"], "age": int(row["age"])} for row in data]

CLI always wins

Explicit CLI arguments always override mold defaults.

See Mold Defaults for all supported directives.


๐Ÿ“Ž The args dict

--arg name=value populates the args parameter of transform(data, args, env, headers):

def transform(data, args, env, headers):
    limit  = int(args["threshold"])
    prefix = args.get("prefix", "")
    return [u for u in data if u["name"].startswith(prefix) and u["age"] > limit]
fimod s -i users.json -m filter.py --arg threshold=30 --arg prefix="A"

When no --arg is passed, args is an empty dict {}.


โœ… Available Python features

  • List/dict comprehensions
  • Ternary expressions (x if cond else y)
  • String methods: .upper(), .strip(), .split(), .replace(), .startswith(), ...
  • Dict methods: .get(), .keys(), .values(), .items(), .pop()
  • for / while loops, if / elif / else
  • in / not in operators
  • isinstance(), len(), int(), str(), float(), bool()
  • f-strings (f"Hello {name}", f"{x:.2f}", f"{x!r}")
  • Nested functions, multiple return values (tuples)
  • All built-in helpers (re_*, re_*_fancy, dp_*, it_*, hs_*, msg_*, gk_*, env_subst, set_exit, set_input_format, set_output_format, set_output_file)

โŒ Monty limitations

  • import โ€” no stdlib, no modules
  • del statement
  • File I/O, network, system calls