Metrics
Foundry supports displaying metrics from your experiment as live plots in the web UI. To log metrics, write JSON lines to foundry/metrics.jsonl:
{"ts": 1709012345.123, "name": "train/loss", "value": 0.342, "step": 100}{"ts": 1709012345.123, "name": "train/lr", "value": 0.001, "step": 100}{"ts": 1709012400.456, "name": "eval/accuracy", "value": 0.89, "step": 100}Project Root
Section titled “Project Root”The project root is the directory containing your .foundryconfig file. Foundry sets this as the working directory when running your experiment, so relative paths like foundry/metrics.jsonl resolve from here automatically — both on remote GPU instances and locally.
my-experiment/ ← project root (contains .foundryconfig)├── .foundryconfig├── train.py└── foundry/ ├── metrics.jsonl ← write metrics here ├── outputs/ └── checkpoints/Schema
Section titled “Schema”Each line in metrics.jsonl is a JSON object with the following fields:
| Field | Required | Type | Description |
|---|---|---|---|
name | yes | string | Metric name, e.g. train/loss |
value | yes | number | Numeric value |
step | no | number | Training step or iteration |
ts | no | number | Unix timestamp (seconds) |
You can include any additional fields you like (e.g. epoch, split, run). Foundry stores them as metadata and makes them available for filtering in the UI.
Naming Convention
Section titled “Naming Convention”Use / to namespace your metrics. The web UI groups metrics by the prefix before the first slash:
{"ts": 1709012345.123, "name": "train/loss", "value": 0.34, "step": 100}{"ts": 1709012345.123, "name": "train/lr", "value": 0.001, "step": 100}{"ts": 1709012400.456, "name": "eval/loss", "value": 0.52, "step": 100}{"ts": 1709012400.456, "name": "eval/accuracy", "value": 0.89, "step": 100}{"ts": 1709012400.456, "name": "gpu/memory_gb", "value": 22.4, "step": 100}This would create three panels in the UI: train, eval, and gpu.
Logging Metrics from Python
Section titled “Logging Metrics from Python”Here’s a minimal implementation you can drop into your training script:
import jsonimport timeimport os
os.makedirs("foundry", exist_ok=True)
def log_metric(name, value, step=None, **kwargs): m = {"ts": time.time(), "name": name, "value": value} if step is not None: m["step"] = step m.update(kwargs) with open("foundry/metrics.jsonl", "a") as f: f.write(json.dumps(m) + "\n")Then call it from your training loop:
for step, batch in enumerate(dataloader): loss = train_step(batch) log_metric("train/loss", loss.item(), step=step)
if step % eval_interval == 0: acc = evaluate(model, val_set) log_metric("eval/accuracy", acc, step=step, split="val")