Blog

Verifying Small Molecule Optimization Results

Salt Team

Salt Team

Feb 19, 2026

Feb 19, 2026


We've been building drug design pipelines on Salt AI - started in large molecule, moved into small molecule - and recently put one together that showcases something important about how workflows themselves can drive better outcomes in computational chemistry.

The pipeline uses Claude Opus 4.6 to generate SMILES strings - the text-based language representing molecular structures. We're leveraging the nondeterministic nature of a generalized LLM to our benefit here. It scores every candidate in real time against Boltz docking, Lipinski drug-likeness, ADMET, and synthesizability and then assembles those scores into a report, generates structured feedback, and loops it back to the beginning for the next round.

It's recursive. Claude Opus 4.6 is the same model in round 1 as it is in round 15. But outputs get measurably better - docking scores improve, drug-likeness tightens, synthetic feasibility climbs - because the workflow is accumulating context and applying real constraints at every step.

We wanted to know: does this actually work? So we designed a dry lab test. We started with bacampicillin, a penicillin-family prodrug, ran 15 rounds, then searched output SMILES against PubMed, PubChem, and published literature.

The pipeline independently produced ampicillin and amoxicillin - two of the most prescribed antibiotics in history - plus one other known compound and 11 novel molecules, each with a full scored profile across every dimension the pipeline measures.

This wasn't an exercise in asking whether AI can retrieve known drugs. The pipeline takes a starting structure, applies constraints, and proposes modifications. We were testing whether a recursive workflow using off-the-shelf scoring packages and a general-purpose LLM could converge on molecules with genuine pharmacological properties. The fact that it landed on drugs validated through decades of clinical use tells us the approach works.

Someone will reasonably point out these could have been in Claude's training data. But Boltz docking scores don't come from training data. Lipinski compliance is math. Synthetic accessibility is chemistry. The pipeline converged there because the physics pointed there.

Most of the AI drug discovery conversation is about model quality. That matters. But this suggests a parallel investment being undervalued: workflow architecture. Recursive feedback, multi-model scoring, constraint design - these improve output quality independent of the model. And they compound as models get better.

Conner Lambden built this on Salt's visual platform - every node visible and auditable. He recorded a 2-minute walkthrough, embedded below.

We're not claiming this is a production drug discovery engine. We're showing constrained, recursive workflows can produce pharmacologically interesting results with full transparency. What other ways would you test output quality from a pipeline like this?