Adding a new formulation

A MILP formulation must be added to an existing optimization problem. To create a new problem, follow the instructions in Adding a new problem.

Adding a new formulation consists of writing the following files:

  • JSON file expressing every component of the formulation (e.g., variables, constraints, objective) in natural-language, math (LaTeX), and code (GurobiPy).

  • Parameter generation script that translates problem data into parameter input for the formulation.

  • Lean 4 encoding following the Formulation definition

By convention, formulation names are single letter labels (e.g., a, b). Pick the next free label x for pN and create problems/pN/formulations/x/. Next, populate the directory with the required files as outlined below.

JSON File

The formulation.json file is the core object defining the formulation. See Formulation Directory for the full schema. In order for Formulation.gen_solve_py() to generate a working solver script, all assumptions and definitions must contain code.python and all constraints and the objective must contain code.gurobipy.

Tip

Reading raw LaTeX while editing the JSON file can be cumbersome. The Problems documentation is automatically generated from the dataset and provides a nice way to view a rendering of the formulation while editing. If you’re working in the FLARE monorepo, you can run make docs-serve from packages/formulation_bench to host the docs with live-reload on http://127.0.0.1:8000.

problems/p12/formulations/a/formulation.json
{
    "valid": true,
    "parameters": {
        "n": {
            "description": "Number of cities",
            "type": "integer",
            "shape": []
        },
        "c": {
            "description": "Travel cost from city i to city j",
            "type": "continuous",
            "shape": [
                "n",
                "n"
            ]
        }
    },
    "assumptions": [
        {
            "description": "There must be at least two cities to form a tour.",
            "formulation": "n \\geq 2",
            "explicit": false,
            "code": {
                "python": "assert n >= 2"
            }
        }
    ],
    "variables": {
        "x": {
            "description": "1 if the tour goes directly from city i to city j, 0 otherwise",
            "type": "binary",
            "shape": [
                "n",
                "n"
            ]
        },
        "u": {
            "description": "MTZ position of city i in the tour",
            "type": "continuous",
            "shape": [
                "n"
            ]
        }
    },
    "constraints": [
        {
            "description": "Each city has exactly one outgoing arc in the tour.",
            "formulation": "\\sum_{j \\in V,\\, j \\neq i} x_{ij} = 1 \\quad \\forall i \\in V",
            "code": {
                "gurobipy": "model.addConstrs(gp.quicksum(x[i, j] for j in range(n) if j != i) == 1 for i in range(n))"
            },
            "explicit": true
        },
        {
            "description": "Each city has exactly one incoming arc in the tour.",
            "formulation": "\\sum_{i \\in V,\\, i \\neq j} x_{ij} = 1 \\quad \\forall j \\in V",
            "code": {
                "gurobipy": "model.addConstrs(gp.quicksum(x[i, j] for i in range(n) if i != j) == 1 for j in range(n))"
            },
            "explicit": true
        },
        {
            "description": "MTZ subtour elimination constraint.",
            "formulation": "u_i - u_j + n \\cdot x_{ij} \\leq n - 1 \\quad \\forall i, j \\in V \\setminus \\{0\\},\\; i \\neq j",
            "code": {
                "gurobipy": "model.addConstrs(u[i] - u[j] + n * x[i, j] <= n - 1 for i in range(1, n) for j in range(1, n) if i != j)"
            },
            "explicit": true
        },
        {
            "description": "Depot position is fixed to 1 to anchor the tour ordering.",
            "formulation": "u_0 = 1",
            "code": {
                "gurobipy": "model.addConstr(u[0] == 1)"
            },
            "explicit": true
        },
        {
            "description": "Lower bound on MTZ position: each non-depot city's position is at least 2.",
            "formulation": "u_i \\geq 2 \\quad \\forall i \\in V \\setminus \\{0\\}",
            "code": {
                "gurobipy": "model.addConstrs(u[i] >= 2 for i in range(1, n))"
            },
            "explicit": true
        },
        {
            "description": "Upper bound on MTZ position: each non-depot city's position is at most n.",
            "formulation": "u_i \\leq n \\quad \\forall i \\in V \\setminus \\{0\\}",
            "code": {
                "gurobipy": "model.addConstrs(u[i] <= n for i in range(1, n))"
            },
            "explicit": true
        },
        {
            "description": "No self-loops: a city cannot have an arc to itself.",
            "formulation": "x_{ii} = 0 \\quad \\forall i \\in V",
            "code": {
                "gurobipy": "model.addConstrs(x[i, i] == 0 for i in range(n))"
            },
            "explicit": false
        }
    ],
    "objective": {
        "description": "Minimize the total travel cost of the Hamiltonian cycle.",
        "formulation": "\\min \\sum_{i \\in V} \\sum_{j \\in V,\\, j \\neq i} c_{ij} \\cdot x_{ij}",
        "code": {
            "gurobipy": "model.setObjective(gp.quicksum(c[i][j] * x[i, j] for i in range(n) for j in range(n) if i != j), GRB.MINIMIZE)"
        }
    },
    "metadata": {
        "source": {
            "dataset": "EvoCut"
        },
        "notes": [
            "This is the Miller-Tucker-Zemlin (MTZ) MILP formulation given in Section F.1 of EvoCut arXiv submission `v1`."
        ]
    }
}

Parameter Generation Script

The parameter generation script gen_params.py reads the problem data instance defined in data.json and transforms it into the parameter input for the formulation (parameters.json). This allows the dataset to have a single source of data per problem. The script should include --data and --output flags for the data.json and parameters.json paths, respectively.

problems/p1/formulations/b/gen_params.py
import argparse
import json


def main(data_path: str, output_path: str) -> None:
    with open(data_path) as f:
        data = json.load(f)

    params = {
        "A": data["CashMachineProcessingRate"],
        "K": data["CardMachineProcessingRate"],
        "Y": data["CashMachinePaperRolls"],
        "W": data["CardMachinePaperRolls"],
        "U": data["MinPeopleProcessed"],
        "V": data["MaxPaperRolls"],
    }

    with open(output_path, "w") as f:
        json.dump(params, f, indent=4)


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("data", help="Path to data.json")
    parser.add_argument("output", help="Path to write parameters.json")
    args = parser.parse_args()
    main(args.data, args.output)

Lean 4 Encoding

Each formulation must have a Lean 4 encoding in Formulation.lean following the Formulation definition. This file should import MILPFormulation from Common.lean and define formulation : MILPFormulation inside the PN.x namespace where N is the problem identifier and x is the formulation identifier.

The FLARE monorepo ships with the milp-formulator agent which uses the lean-milp-formulation agent skill from FLARE to automatically generate Formulation.lean.

problems/p1/formulations/b/Formulation.lean
import Common

namespace P1.b

structure Params where
  A :   -- cash machine processing rate (people/hour)
  K :   -- card machine processing rate (people/hour)
  Y :   -- paper rolls/hour for cash machine
  W :   -- paper rolls/hour for card machine
  U :   -- min people processed per hour
  V :   -- max paper rolls per hour
  -- Implicit Assumptions
  hA_nn : 0  A
  hK_nn : 0  K
  hY_nn : 0  Y
  hW_nn : 0  W
  hU_nn : 0  U
  hV_nn : 0  V

structure Vars (p : Params) where
  s :   -- number of cash machines
  r :   -- number of card machines

structure Feasible (p : Params) (v : Vars p) : Prop where
  -- Process at least U people per hour
  hpeople : p.U  p.A * (v.s : ) + p.K * (v.r : )
  -- Use at most V paper rolls per hour
  hpaper : (v.s : ) * p.Y + (v.r : ) * p.W  p.V
  -- Card machines ≤ cash machines
  hcard : v.r  v.s
  -- [Implicit Constraints]
  hs_nn : 0  v.s
  hr_nn : 0  v.r

-- Minimize the total number of machines
def obj (p : Params) (v : Vars p) :  := (v.s : ) + (v.r : )

def formulation : MILPFormulation where
  Params   := Params
  Vars     := Vars
  feasible := Feasible
  obj      := obj

end P1.b

Registering Reformulation Pairs

If your new formulation forms a (positive or negative) reformulation pair with another formulation of the same problem, add an entry to the reformulations field in dataset.json:

{
  "a": {"problem": 21, "formulation": "a"},
  "b": {"problem": 21, "formulation": "b"},
  "reformulation": true
}

For positive pairs, create a Lean proof following the Reformulation definition. This should be defined in reformulations/pN/x_y.lean where N is the common problem and the proof shows y is a reformulation of x. The file should import Common and both formulations:

import Common
import problems.pN.formulations.x.Formulation
import problems.pN.formulations.y.Formulation

The definition xYReformulation : MILPReformulation PN.x.formulation PN.y.formulation should be defined within the PN namespace.

The FLARE monorepo ships with the milp-reformulation-autoformalizer agent which uses the lean-milp-reformulation agent skill from FLARE to automatically generate the Lean proof.

reformulations/p1/a_b.lean
import Common
import problems.p1.formulations.a.Formulation
import problems.p1.formulations.b.Formulation

namespace P1

-- ============================================================================
-- § Parameter Mapping
-- ============================================================================

private def paramMap (p : P1.a.Params) : P1.b.Params :=
  { A := p.CashMachineProcessingRate
    K := p.CardMachineProcessingRate
    Y := p.CashMachinePaperRolls
    W := p.CardMachinePaperRolls
    U := p.MinPeopleProcessed
    V := p.MaxPaperRolls
    hA_nn := p.hCashMachineProcessingRate_nn
    hK_nn := p.hCardMachineProcessingRate_nn
    hY_nn := p.hCashMachinePaperRolls_nn
    hW_nn := p.hCardMachinePaperRolls_nn
    hU_nn := p.hMinPeopleProcessed_nn
    hV_nn := p.hMaxPaperRolls_nn }

-- ============================================================================
-- § Forward Mapping and Feasibility
-- ============================================================================

private def fwd (p : P1.a.Params) (v : P1.a.Vars p) : P1.b.Vars (paramMap p) :=
  { s := v.NumCashMachines
    r := v.NumCardMachines }

private lemma fwd_feas (p : P1.a.Params) (v : P1.a.Vars p)
    (h : P1.a.Feasible p v) :
    P1.b.Feasible (paramMap p) (fwd p v) :=
  { hpeople := h.hpeople
    hpaper  := h.hpaper
    hcard   := h.hcard
    hs_nn   := h.hNumCashMachines_nn
    hr_nn   := h.hNumCardMachines_nn }

-- ============================================================================
-- § Backward Mapping and Feasibility
-- ============================================================================

private def bwd (p : P1.a.Params) (v : P1.b.Vars (paramMap p)) : P1.a.Vars p :=
  { NumCashMachines := v.s
    NumCardMachines := v.r }

private lemma bwd_feas (p : P1.a.Params) (v : P1.b.Vars (paramMap p))
    (h : P1.b.Feasible (paramMap p) v) :
    P1.a.Feasible p (bwd p v) :=
  { hpeople  := h.hpeople
    hpaper   := h.hpaper
    hcard    := h.hcard
    hNumCashMachines_nn := h.hs_nn
    hNumCardMachines_nn := h.hr_nn }

-- ============================================================================
-- § Reformulation Structure
-- ============================================================================

def aBReformulation : MILPReformulation P1.a.formulation P1.b.formulation where
  paramMap    := paramMap
  fwd         := fwd
  bwd         := bwd
  fwd_feas    := fwd_feas
  bwd_feas    := bwd_feas
  objMap      := id
  objMap_mono := strictMono_id
  fwd_obj _ _ _ := rfl
  bwd_obj _ _ _ := rfl

end P1

Validating

The FLARE monorepo provides a validation script to test if the solve.py script generated by Formulation.gen_solve_py() achieves the ground-truth optimal solution specified in the problem’s solution.json file. This also verifies that formulation.json is well-structured.

python scripts/dataset/validate_solve.py -p N  # run on problem pN

Additionally, follow the instructions in Compiling Lean files to compile Formulation.lean and any reformulation proof Lean files.