The default make-vs-buy analysis is a cost comparison: model the in-house cost, get a supplier quote, pick the cheaper one. It is fast, it is defensible in a budget meeting, and it is wrong often enough to be dangerous. The decisions that come back to hurt a program are rarely the ones where the cheaper option was also the safe one. They are the ones where cost pointed one way and something the spreadsheet never captured pointed the other.
A part that is cheaper to buy but carries proprietary design IP. A flight-critical component with one qualified supplier in the market. An ITAR-controlled item where an unrestricted outsource is not viable without a qualified, compliant sourcing path. A part that looks attractive to insource until you notice the program is being retired in three years and the tooling will never amortize. None of these show up in a unit-cost delta, and all of them change the right answer.
The core architectural decision was to refuse to collapse the decision into a single number. The engine scores a part on two independent layers: an Economic layer that captures whether making is cheaper and whether you have the capacity and volume to justify it, and a Strategic layer that captures criticality, IP control, supply-market risk, export status, and where the program sits in its lifecycle. Each layer is scored 0 to 100, where higher favors Make and 50 is neutral.
Plotting the two scores against each other produces a 2x2 quadrant. The two diagonal cases are the easy ones: both layers favor Make, or both favor Buy. The value of the tool is in the two off-diagonal cases, where the layers disagree. A part can be more expensive to make and still be the right thing to make. A part can be cheaper to make and still be the wrong thing to commit to. Those are the decisions a cost-only model gets backwards.
The signature output is "Strategic Make": a part the economics say to buy, but criticality, IP, single-source exposure, or export control say to keep in-house. It is the recommendation a spreadsheet would never produce, and it is the one most worth getting right.
The engine is built in Python with strict separation between the deterministic core and the explanatory layer. Scoring, quadrant assignment, mode logic, and overrides are fully deterministic and reproducible: the same inputs always produce the same recommendation. Claude generates the plain-language rationale from the computed result. It never generates or modifies a score or a recommendation.
The recommendation is derived from the quadrant, then modified by an explicit precedence chain: scoring mode can tip genuine close calls, a sunsetting-program override enforces capital discipline, and an ITAR override sits at the top as a hard constraint. Every override surfaces as a visible flag. Nothing is folded silently into a number.
Economic Layer: does making this part make financial sense right now?
Strategic Layer: should this part be controlled in-house regardless of cost?
The two diagonal quadrants are agreement cases. The two off-diagonal quadrants are where the layers disagree and where a cost-only analysis fails. When both layers land inside a neutral band (45 to 55 on each axis), the tool returns Marginal — Gather More Data rather than forcing a call it cannot defend. That is the human-in-the-loop principle made literal: the tool declines to fake precision on a genuine coin flip.
Some factors do not belong on a weighted axis because they are not tradeoffs. They are constraints. The engine handles these as overrides that fire after the quadrant assignment, in a fixed precedence order, each surfacing as a visible flag rather than being absorbed into a number.
Three scoring modes re-weight the blend between the two layers, matching how a sourcing organization actually frames a decision cycle: are we under cost pressure, or are we prioritizing supply security? The mode does not change the individual factor scores, and it cannot move a decisive case. It only tips a recommendation when the case is genuinely close: when the mode-weighted blend falls within a defined margin of neutral and the balanced read was already near the line.
This was a deliberate correction during development. An earlier version computed a blended score that moved with the mode but never actually changed the recommendation, which made the control misleading. The fix tied mode to the recommendation in close cases only, so the lens means something without letting a posture override a clear or compliance-bound decision.
| Part | What the cost says | What the tool recommends, and why |
|---|---|---|
| Titanium structural fitting (flight-critical, ITAR, single-source, growing program) | Buying is roughly $134K/year cheaper all-in. A cost model says outsource. | Strategic Make. Flight-criticality, ITAR control, and a single-source market justify the cost premium and rule out an unrestricted buy. The economics lean Buy at 31/100; strategy leans Make at 83/100. This is the case a spreadsheet gets wrong. |
| Sunsetting actuator housing (flight-critical, single-source, program winding down) | Same high-risk criticality profile as the fitting, but on a winding-down program. A strategic checklist says make it. | Buy / Qualify Second Source. The program is being retired and the economics do not favor making, so the lifecycle override forbids sinking tooling capital into a dying program. But single-source exposure on a flight-critical part still needs hedging, so the buy is paired with qualifying a second source. The case a strategic checklist gets wrong. |
The two examples are deliberate mirror images. The first shows strategy correctly overriding cost. The second shows lifecycle discipline correctly overriding strategy. Neither answer falls out of a single number, and that is the entire point of the tool.
The AI layer translates a computed decision into a short, leadership-ready rationale. Every score, the quadrant, the recommendation, and any override are computed before the model is called. The prompt passes the final recommendation as a fixed fact and instructs the model that implying a different one is a defect. The engine recommendation is also displayed directly above the narrative, so the deterministic result and its explanation can never be confused.
If no API key is present, the layer falls back to a deterministic template, so the live tool never breaks on the AI path. The narrative is honest about mode-tipped outcomes: when a part lands on Buy despite favorable economics, the explanation says so rather than claiming both layers favored buying.
The test suite enforces 121 assertions including: cost-delta direction and break-even math, every category-to-score lookup, layer-weight integrity, mode-blend correctness, quadrant assignment across all four regions plus the neutral band, the sunsetting and ITAR overrides and their precedence, mode-driven tipping on close calls, input validation bounds, determinism (identical inputs yield identical output), and a regression lock asserting each of the six sample parts produces its intended recommendation.
The test suite earned its place during development: the regression lock caught a capital-burden factor that was defaulting to a Make-favoring value when no investment was required, biasing the engine toward Make on zero-NRE parts. The fix reset it to neutral. The bug was invisible without the tests.
The Make vs. Buy Decision Framework is the highest-altitude tool in the portfolio. It operates at the level a sourcing leader works at: not "what should this part cost" or "how good is this supplier," but "should we make this at all, given that cost is only one of the factors that matter." It consumes a make-cost figure that the Should-Cost Model can produce, and it treats the buy path as a strategy that the Supplier Scorecard can then evaluate vendor by vendor. Together the tools form a buy-side decision system from part prioritization through award.
Built to close the gap between "the supplier quote is lower, so we outsource" and "we have a structured, defensible view of whether a part belongs in-house across cost, control, risk, and program life." Deterministic by design. Explainable by requirement. Aerospace-specific by intent. The cheapest option is often the wrong one, and the most strategic option is not always worth the capital. The tool surfaces both. The sourcing professional decides.