I built something like this at work using plain Docker images. Can you help me understand your value prop a little better?
The memory forking seems like a cool technical achievement, but I don't understand how it benefits me as a user. If I'm delegating the whole thing to the AI anyway, I care more about deterministic builds so that the AI can tackle the problem.
Would love to understand how you compare to other providers like Modal, Daytona, Blaxel, E2B and Vercel. I think most other agent builders will have the same question. Can you provide a feature/performance comparison matrix to make this easier?
I'm working on an article deep diving into the differences between all of us. I think the goal of Freestyle is to be the most powerful and most EC2 like of the bunch.
I haven't played around with Blaxel personally yet.
E2B/Vercel are both great hardware virtualized "sandboxes"
Freestyle VMS are built based on the feedback our users gave us that things they expected to be able to do on existing sandboxes didn't work. A good example here is Freestyle is the only provider of the above (haven't tested blaxel) that gives users access to the boot disk, or the ability to reboot a VM.
Fly.io sprites is the most similar to us of the bunch. They do hardware virtualization as well, have comparable start times and are full Linux. What we call snapshots they call checkpoints.
The big pros of Sprites over us is their advanced networking stack and the Fly.io ecosystem. The big cons are that Sprites are incredibly bare bones — they don't have any templating utilities. I've also heard that Sprites sometimes become unavailable for extended periods of time.
The big pros of Freestyle over Sprites is fork, advanced templating, and IMO a better debugging experience because of our structure.
Thanks for the thoughtful response. I'm predominantly a self-hoster, but I think your product makes a lot of sense for a wide variety of users and businesses. I'm excited to try out freestyle!
Freestyle/other providers will likely provide better debugging experience but thats something you can probably get past for a lot of workloads.
The time when you/anyone should think about Freestyle/anyone is when the load spikes/the need to create hundreds of VMs in short spikes shows up, or when you're looking for some of the more complex feature sets any given provider has built out (forks, GPUs, network boundaries, etc).
I also highly recommend self hosting anything you do outside of your normal VPC. Sandboxes are the biggest possible attack surface and it is a feature of us that we're not in your cloud; If we mess up security your app is still fine.
Obviously your service/approach is different than exe, more like sprites but like you said more targeted/opinionated to AI coding/sandboxing tasks it looks like. Interesting space for sure!
The technical challenges in getting memory forking to deliver those sub-second start and fork times are significant. I've seen the pain of trying to achieve that level of state transfer and rapid provisioning. While "EC2-like" gets the point across for many, going bare metal reveals the practical limits of cloud virtualization for high-performance, complex workloads like these. It shows a real understanding of where cloud abstraction helps and where it just adds overhead.
The cost argument for owning the hardware for this specific use case also makes sense, considering the scale these agent environments will demand. Also worth noting, sandboxes are effectively an open attack surface; architecting them not to be in your main VPC is a sound security decision from the start.
It doesn't seem very easy to calculate how much it would cost per month to keep a mostly-idle VM running (for example, with a personal web app). The $20/month plan from exe.dev seems more hobbyist-friendly for that. Maybe that's not the intended use, though?
Cool! I've been using your API for running sandboxed JS. Nice to see you also support VMs now.
> we mean forking the whole memory of it
How does this work? Are you copying the entire snapshot, or is this something fancy like copy-on-write memory? If it's the former, doesn't the fork time depend on the size of the machine?
We're using copy on write with the memory itself. Fork time is completely decoupled from the size of the machine.
Creating snapshots takes a 2-4 second interruption in the VM due to sheer IO that we didn't want here.
Whats especially cool about this approach is not only is fork time O(1) with respect to machine size, but its also O(1) with respect to the amount of forks.
We're working on a similar solution at UnixShells.com [1]. We built a VMM that forks, and boots, in < 20ms and is live, serving customers! We have a lot of great tools available, via MIT, on our github repo [2] as well!
I have so many interesting problems on Ai, sandboxing isn't one of them. It's a pointless excercise yet disproportionately so many people love to to do this. Probably because sandboxing doesn't feel as magic as Agents itself and more like the old times of "traditional" software development.
It is a mostly pointless exercise if the goal is trying to contain negative impact of AI agents (e.g. OpenClaw).
It is a very necessary building block for many common features that can be steered in a more deterministic way, e.g. "code interpreter" feature for data analysis or file creation like commonly seen in chat web UIs.
With respect to the market, every single sandbox sucks. I'm not gonna shit talk competitors but there is not a good sandboxing platform out there yet — including me — compared to where we'll be in 6 months.
We've heard all the platforms have consistent uptime, feature completeness, networking and debugging issues. And in our own platform we're not 1/10ths of the way through solving the requests we've gotten.
Next generation of Agents needs computers, and those computers are gonna look really different than "sandboxes" do today.
I don't think you're wrong, but if you really want to really re-think the approach, building an orchestration layer for Firecracker like every other company in the space is doing is probably not it.
So this is an ongoing optimization point, no perfect solution exists. Freestyle VMs work with a network namespace and virtual ethernet cable going into them, so they all think they are the same IP.
This means that while complex protocol connections like remote Postgres can break in the forks, stuff like Websockets just automatically reconnects.
The memory forking seems like a cool technical achievement, but I don't understand how it benefits me as a user. If I'm delegating the whole thing to the AI anyway, I care more about deterministic builds so that the AI can tackle the problem.
When I’m thinking of sandboxes, I’m thinking of isolated execution environments.
What does forking sandboxes bring me? What do your sandboxes in general bring me?
Please take this in the best possible way: I’m missing a use case example that’s not abstract and/or small. What’s the end goal here(
Daytona runs on Sysbox (https://github.com/nestybox/sysbox) which is VM-like but when you run low level things it has issues.
Modal is the only provider with GPU support.
I haven't played around with Blaxel personally yet.
E2B/Vercel are both great hardware virtualized "sandboxes"
Freestyle VMS are built based on the feedback our users gave us that things they expected to be able to do on existing sandboxes didn't work. A good example here is Freestyle is the only provider of the above (haven't tested blaxel) that gives users access to the boot disk, or the ability to reboot a VM.
The big pros of Sprites over us is their advanced networking stack and the Fly.io ecosystem. The big cons are that Sprites are incredibly bare bones — they don't have any templating utilities. I've also heard that Sprites sometimes become unavailable for extended periods of time.
The big pros of Freestyle over Sprites is fork, advanced templating, and IMO a better debugging experience because of our structure.
You can handroll a lot with: https://github.com/nestybox/sysbox?tab=readme-ov-file https://gvisor.dev https://github.com/containers/bubblewrap?tab=readme-ov-file
For hardware virtualized machines it much harder but you can do it via: https://github.com/firecracker-microvm/firecracker/ https://github.com/cloud-hypervisor/cloud-hypervisor
Freestyle/other providers will likely provide better debugging experience but thats something you can probably get past for a lot of workloads.
The time when you/anyone should think about Freestyle/anyone is when the load spikes/the need to create hundreds of VMs in short spikes shows up, or when you're looking for some of the more complex feature sets any given provider has built out (forks, GPUs, network boundaries, etc).
I also highly recommend self hosting anything you do outside of your normal VPC. Sandboxes are the biggest possible attack surface and it is a feature of us that we're not in your cloud; If we mess up security your app is still fine.
https://GitHub.com/jgbrwn/vibebin
Also I'm a huge proponent of exe.dev
Obviously your service/approach is different than exe, more like sprites but like you said more targeted/opinionated to AI coding/sandboxing tasks it looks like. Interesting space for sure!
The cost argument for owning the hardware for this specific use case also makes sense, considering the scale these agent environments will demand. Also worth noting, sandboxes are effectively an open attack surface; architecting them not to be in your main VPC is a sound security decision from the start.
That said, our $50 a month plan can be used as an individual for your coding agents, but I wouldn't recommend it.
Creating snapshots takes a 2-4 second interruption in the VM due to sheer IO that we didn't want here.
Whats especially cool about this approach is not only is fork time O(1) with respect to machine size, but its also O(1) with respect to the amount of forks.
We're working on a similar solution at UnixShells.com [1]. We built a VMM that forks, and boots, in < 20ms and is live, serving customers! We have a lot of great tools available, via MIT, on our github repo [2] as well!
[1] https://unixshells.com
[2] https://github.com/unixshells
It is a very necessary building block for many common features that can be steered in a more deterministic way, e.g. "code interpreter" feature for data analysis or file creation like commonly seen in chat web UIs.
But like I see multiple sandbox for agents products a week. Way too saturated of a market
With respect to the market, every single sandbox sucks. I'm not gonna shit talk competitors but there is not a good sandboxing platform out there yet — including me — compared to where we'll be in 6 months.
We've heard all the platforms have consistent uptime, feature completeness, networking and debugging issues. And in our own platform we're not 1/10ths of the way through solving the requests we've gotten.
Next generation of Agents needs computers, and those computers are gonna look really different than "sandboxes" do today.
This means that while complex protocol connections like remote Postgres can break in the forks, stuff like Websockets just automatically reconnects.