I dunno. I just let Claude build a python script that calls Claude code though subprocess.run().
I recently made a sort of Autoresearch with that approach. The script calls Claude Code to create a hyphotesis, then code based on that, evaluate- rinse and repeat. I am still trying to figure out if I am actually on to something or just burning tokens. Jury is still out.
A noob question: is there a tool that automatically instructs Claude Code to "continue" when the token quota is reset after 5h? I am interested in that more than some rather fancy loops.
IMO the raw Claude CLI is great for one-off interactive sessions, but as soon as you want repeatable multi-step workflows you’re either copy-pasting prompts forever or hacking your own solution manually. That’s exactly the gap these tools fill.
My take on a solution for this is https://ossature.dev — .smd spec markdown files + ossature audit / build that gives you DAG orchestration, SHA-traced increments, and tiny focused contexts.
Yeah bash scripts start clean but the sprawl kicks in quick as the workflow and project becomes more complex. Prompts get copied, deps turn manual, and maintenance of your workflow itself becomes the chore.
Ossature swaps that for structured SMDs and optional AMDs. Multiple specs build a clean DAG that drops into an editable plan.toml so everything stays traceable without the mess.
I use bash scripts. Both Claude and Vibe support all kinds of arguments if you need a prompt to “become a task”. Bash is also deterministic and easy to read and debug.
They bootstrap a workflow with a prompt then build an orchestrator off that then prompt it to be converted to an opencode plugin and then prompt a website to be generated advertising it and then prompt a tool that reviews hacker news feedback and automatically incorporates feedback into next generation of the tool. At the end of the week they go to their manager and complain they are out of tokens for the actual job they are being paid for.
Given that subagents have different thinking/effort behavior from the main agent and very limited control on that front (I’m not completely sure about this but see https://github.com/anthropics/claude-code/issues/14321 and I’ve also noticed very different behavior when the same prompt is used in the main agent or passed to a subagent), I’m not sure this skill will be the same.
Semi-on-topic: Anyone know a way to get a good alternative UI on top of Cursor?
My company’s tracking how much we use the damn thing (its autocomplete is literally less-useful than standard VSCode, only time it’s consistently good is when it sees me do one thing to a line, sees repeated similar lines after that, and suggests I do it on the next one too, one at a time, and that’s only useful to me because I’ve never actually bothered to learn how to properly use a text editor) so I can’t avoid it, but even on codebases in the hundreds of lines it’s OOM killing things on my 16GB laptop (it, plus goddamn Teams, were eating half the memory by themselves the other day… with Cursor sitting at almost 6GB alone. JFC. On the plus side if this is what software from a company that should be full of experts at using these things looks like, guess our jobs are safe from them… though not from recession and ZIRP unwinding)
Looks pretty nice. I think a lot of devs have been making similar tools, I've written my own thing that does a work review loop. I like the interface you've made. I'll probably give it a go, but I'm also reluctant to relinquish the control I have when it's my own code doing orchestration.
How heavy on tokens is this? I don't use these style workflows and am fairly new to claude code, so I assume it's better than 3x tokens when doing 3 passes?
It's not 3x because of 3 runs; can be more token, can be less.
The way of thinking it is, telling Claude to tackle the problem 3 times, each time it may or may not use different approach, fix or improve on things it did previously.
If you impl this as a backend and connect to Telegram bots, agents can just do `$ ask "Should I do this?"` for agent→human and `$ alert "this thing blocked me"` for coder→planner. That's what I'm actually doing — I have 1 manager + 3 designers + 1 researcher + 2 debugger + 1 communicator + any number of temporal coders/reviewers in my setup, all connected to taskwarrior for task-driven-dev
[0]: https://news.ycombinator.com/item?id=47262711 [1]: https://getcook.dev
I recently made a sort of Autoresearch with that approach. The script calls Claude Code to create a hyphotesis, then code based on that, evaluate- rinse and repeat. I am still trying to figure out if I am actually on to something or just burning tokens. Jury is still out.
My take on a solution for this is https://ossature.dev — .smd spec markdown files + ossature audit / build that gives you DAG orchestration, SHA-traced increments, and tiny focused contexts.
Ossature swaps that for structured SMDs and optional AMDs. Multiple specs build a clean DAG that drops into an editable plan.toml so everything stays traceable without the mess.
Feel free to check the example projects on https://github.com/ossature/ossature-examples
Then just use Python.
Was wondering if using front-matter instead of a "custom" encoding for parseble data was considered?
But in general this is meta to the CLI agent.
So if you were to use the CLI to perform a review of some code. This tool would allow you to loop the output of the code review 5 times onto itself.
Claude already does that if you ask nicely.
Where are people finding time for these sort of projects.
My take? I like it. It's concise enough for me to try it out. And I love the webpage.
[1] https://github.com/rjcorwin/cook/blob/main/no-code/SKILL.md
Might work out fine on codex.
My company’s tracking how much we use the damn thing (its autocomplete is literally less-useful than standard VSCode, only time it’s consistently good is when it sees me do one thing to a line, sees repeated similar lines after that, and suggests I do it on the next one too, one at a time, and that’s only useful to me because I’ve never actually bothered to learn how to properly use a text editor) so I can’t avoid it, but even on codebases in the hundreds of lines it’s OOM killing things on my 16GB laptop (it, plus goddamn Teams, were eating half the memory by themselves the other day… with Cursor sitting at almost 6GB alone. JFC. On the plus side if this is what software from a company that should be full of experts at using these things looks like, guess our jobs are safe from them… though not from recession and ZIRP unwinding)
The way of thinking it is, telling Claude to tackle the problem 3 times, each time it may or may not use different approach, fix or improve on things it did previously.
* coolers whirring, gpus on fire, tokens flying, investors happy, developer goes for 6th break of the day