When AI Writes the Software, Who Verifies It?

(leodemoura.github.io)

3 points | by vinhnx 3 hours ago

1 comments

derrak 2 hours ago
> AI must control the proof search, not delegate to a black box.
Cannot disagree more. Take a look at modern SAT solvers, SMT solvers, and automated theorem provers like ACL2. We should _not_ be letting the AI’s white knuckle any search process. AI is not going to give us completeness, even if we can tack on a post-hoc soundness check. Moreover, AI can act as a heuristic but it is not going to automatically exploit symmetries and other symbolic contours of the space. Exploiting such contours is what makes these symbolic tools so powerful.
There has been a lot of work on combining symbolic search with machine learning as a heuristic. To me, this is the most promising route. My take is that we should use the statistical models to bias, and the symbolic models to control.
> Verifying software is mathematics: the same reasoning that proves a theorem in abstract algebra proves that a cryptographic library correctly implements its specification.
I also want to push back on this. If you have written a mechanical proof about software and a mechanical proof about a mathematical object, you’ll note that they feel qualitatively different.
In my experience, proofs about software have way more case splits involving thousands or even millions of cases. Proofs about more traditional mathematical objects tend to have far fewer case splits of this nature. The obligations in proofs about software tend be easier to automatically. Mathematical objects tend to require a lot more insight and infrastructure around them.
Of course proofs about software are still mathematics and the same infrastructure can found both. I am merely suggesting that proofs about software may be amenable to more specialization than proofs about mathematics generally.
I will also say that finding a counterexample tends to be more important in software settings than finding a proof. This is because software tends to have bugs whereas mathematics tends to eliminate such bugs before we bother to mechanically formalize it.
> The AI community has already made its choice.
I wonder if this is the wrong choice. Lean is great for reasoning about mathematics. I don’t think it’s the right choice for reasoning about industry scale software. I hate the ACL2 user experience, but I think it is fundamentally more fit to reason about software. I wish the AI community would take a broader look at what formal methods tools are available instead of just picking up the hottest most popular tool and shoe horning everything into it.