Tinybox- offline AI device 120B parameters

(tinygrad.org)

179 points | by albelfio 2 hours ago

30 comments

ivraatiems 2 hours ago
There's some irony in the fact that this website reads as extremely NOT AI-generated, very human in the way it's designed and the tone of its writing.
Still, this is a great idea, and one I hope takes off. I think there's a good argument that the future of AI is in locally-trained models for everyone, rather than relying on a big company's own model.
One thought: The ability to conveniently get this onto a 240v circuit would be nice. Having to find two different 120v circuits to plug this into will be a pain for many folks.
[-]
- Lerc 1 hour ago
  I am a little surprised that they openly solicit code contributions with "Invest with your PRs" but don't have any statement on AI contributions.
  Maybe the volume for them is ok that well-intentioned but poor quality PRs can be politely(or otherwise, culture depending) disregarded and the method of generation is not important.
  [-]
  - KeplerBoy 1 hour ago
    Tinygrad sure shared a few opinions on AI PRs on Twitter. I believe the gist was "we have Claude code as well, if that's all you bring don't bother".
  - cyanydeez 36 minutes ago
    I'm starting to think that if you have an AI repo thats basically about codegen, you should just close all issues automatically, the manually (or whatever) open the ones you/maintainers actually care about. Thats about the only way to kill some of the signal/noise ratio AIs are creating.
    Then you could focus fire, like the script kiddies did with DDoS in the old days on fixing whatever preferred issues you have.
- wat10000 1 hour ago
  If you’re spending $65,000 on this thing, needing two circuits seems like a minor problem
  [-]
  - ivraatiems 1 hour ago
    The $12,000 one also requires it.
    [-]
    - isatty 1 hour ago
      Surprisingly affordable but I’m not really interested in the 9070XT.
      If it shipped with like 4090+ (for a higher price) it’d be more tempting.
      [-]
      - dmarcos 51 minutes ago
        They offered a version a few months ago with 4x5090 for 25k
        https://x.com/__tinygrad__/status/1983917797781426511
        Stopped due to raising GPU prices:
        https://x.com/__tinygrad__/status/2011263292753526978
- trollbridge 1 hour ago
  A typical U.S. 240V circuit is actually just two 120V circuits. Fairly trivial to rewire for that.
  [-]
  - amluto 1 hour ago
    Sometimes. 240V circuits may or may not have a neutral.
  - doubled112 1 hour ago
    I’ve actually had half of my dryer outlet fail when half of the breaker failed.
    Can confirm.
  - jcgrillo 54 minutes ago
    If you actually use two 120V circuits that way and one breaker flips the other half will send 120V through the load back into the other circuit. So while that circuit's breaker is flipped it is still live. Very bad. Much better to use a 240V breaker that picks up two rails in the panel.
paxys 6 minutes ago
The problem with all these "AI box" startups is that the product is too expensive for hobbyists, and companies that need to run workloads at scale can always build their own servers and racks and save on the markup (which is substantial). Unless someone can figure out how to get cheaper GPUs & RAM there is really no margin left to squeeze out.
vessenes 2 hours ago
The exabox is interesting. I wonder who the customer is; after watching the Vera Rubin launch, I cannot imagine deciding I wanted to compete with NVIDIA for hyperscale business right now. Maybe it’s aiming at a value-conscious buyer? Maybe it’s a sensible buy for a (relatively) cash-strapped ML startup; actually I just checked prices, and it looks like Vera Rubin costs half for a similar amount of GPU RAM. I’m certain that the interconnect will not be as good as NV’s.
I have no idea who would buy this. Maybe if you think Vera Rubin is three years out? But NV ships, man, they are shipping.
[-]
- kulahan 43 minutes ago
  Sometimes you can compete with the big boys simply because they built their infra 5+ years ago and it’s not economically viable for them to upgrade yet, because it’s a multi-billion dollar process for them. They can run a deficit to run you out of the business, but if you’re taking less than 0.01% of their business, I doubt they’d give a crap.
- zozbot234 2 hours ago
  > The exabox is interesting.
  Can it run Crysis?
  [-]
  - bastawhiz 1 hour ago
    Probably, the rdna5 can do graphics. But it would be a huge waste, since you could probably only use one of the 720 GPUs
  - WithinReason 1 hour ago
    Only gamers understand that reference
    -- Jensen Huang
  - dist-epoch 41 minutes ago
    Yes, it can generate Crysis with diffusion models at 60 fps.
bastawhiz 1 hour ago
There's no way the red v2 is doing anything with a 120b parameter model. I just finished building a dual a100 ai homelab (80gb vram combined with nvlink). Similar stats otherwise. 120b only fits with very heavy quantization, enough to make the model schizophrenic in my experience. And there's no room for kv, so you'll OOM around 4k of context.
I'm running a 70b model now that's okay, but it's still fairly tight. And I've got 16gb more vram then the red v2.
I'm also confused why this is 12U. My whole rig is 4u.
The green v2 has better GPUs. But for $65k, I'd expect a much better CPU and 256gb of RAM. It's not like a threadripper 7000 is going to break the bank.
I'm glad this exists but it's... honestly pretty perplexing
[-]
- oceanplexian 1 hour ago
  It will work fine but it’s not necessarily insane performance. I can run a q4 of gpt-oss-120b on my Epyc Milan box that has similar specs and get something like 30-50 Tok/sec by splitting it across RAM and GPU.
  The thing that’s less useful is the 64G VRAM/128G System RAM config, even the large MoE models only need 20B for the router, the rest of the VRAM is essentially wasted (Mixing experts between VRAM and/System RAM has basically no performance benefit).
  [-]
  - syntaxing 19 minutes ago
    Split RAM and GPU impacts it more than you think. I would be surprised if the red box doesn’t outperform you by 2-3X for both PP and TG
- zozbot234 1 hour ago
  > And there's no room for kv, so you'll OOM around 4k of context.
  Can't you offload KV to system RAM, or even storage? It would make it possible to run with longer contexts, even with some overhead. AIUI, local AI frameworks include support for caching some of the KV in VRAM, using a LRU policy, so the overhead would be tolerable.
  [-]
  - tcdent 1 hour ago
    Not worth it. It is a very significant performance hit.
    With that said, people are trying to extend VRAM into system RAM or even NVMe storage, but as soon as you hit the PCI bus with the high bandwidth layers like KV cache, you eliminate a lot of the performance benefit that you get from having fast memory near the GPU die.
  - ranger_danger 1 hour ago
    I know llama.cpp can, it certainly improved performance on my RAM-starved GPU.
adrianwaj 16 minutes ago
Perhaps this company should think about acting as a landlord for their hardware. You buy (or lease) but they also offer colocation hosting. They could partner with crypto miners who are transitioning to AI factories to find the space and power to do this. I wonder if the machines require added cooling, though, in what would otherwise be a crypto mining center. CoreWeave made the transition and also do colocation. The switchover is real.
I think Tinygrad should think about recycling. Are they planning ahead in this regard? Is anyone? My thought is if there was a central database of who own what and where, at least when the recycling tech become available, people will know where to source their specific trash (and even pay for it.) Having a database like that in the first place could even fuel the industry.
ekropotin 1 hour ago
IDK, I feel it’s quite overpriced, even with the current component prices.
I almost sure it’s possible to custom build a machine as powerful as their red v2 within 9k budget. And have a lot of fun along the way.
[-]
- lostmsu 1 hour ago
  AMD now has 32 GiB Radeon AI Pro 9700. 4 of these (just under 2k each) would put you at 128 GiB VRAM
  [-]
  - ekropotin 1 hour ago
    VRAM is not everything - GPU cores also matter (a lot) for inference
    [-]
    - lostmsu 1 hour ago
      4x Radeon will have significantly more GPU power than say Mac Studio or DGX Spark.
ilaksh 33 minutes ago
I thought the most interesting thing about tinygrad was that theoretically you could render a model all the way into hardware similar to Taalas (tinygrad might be where Taalas got the idea for all I know).
I could swear I filed a GitHub issue asking about the plans for that but I don't see it. Anyway I think he mentioned it when explaining tinygrad at one point and I have wondered why that hasn't got more attention.
As far as boxes, I wish that there were more MI355X available for normal hourly rental. Or any.
mmoustafa 55 minutes ago
I would love to see real-life tokens/sec values advertised for one or various specific open source models.
I'm currently shopping for offline hardware and it is very hard to estimate the performance I will get before dropping $12K, and would love to have a baseline that I can at least always get e.g. 40 tok/s running GPT-OSS-120B using Ollama on Ubuntu out of the box.
[-]
- hpcjoe 3 minutes ago
  Look for llmfit on github. This will help with that analysis. I've found it reasonably accurate. If you have Ollama already installed, it can download the relevant models directly.
comrade1234 2 hours ago
Cool that you have a dual power supply model. It says rack mountable or free standing. Does that mean two form factors? $65K is more than we can afford right now but we are definitely eventually in the market for something we can run in our own colo.
It's funny though... we're using deepseek now for features in our service and based on our customer-type we thought that they would be completely against sending their data to a third-party. We thought we'd have to do everything locally. But they seem ok with deepseek which is practically free. And the few customers that still worry about privacy may not justify such a high price point.
[-]
- hrmtst93837 1 hour ago
  Most privacy talk folds on contact with a quote. Latency and convenience beat philosophy fast once someone wants a dashboard next week, and a lot of "data sensitivity" talk is just the corporate version of buying "organic" food until the price tag shows up.
  If private inference is actually non-negotiable, then sure, put GPUs in your colo and enjoy the infra pain, vendor weirdness, and the meeting where finance learns what those power numbers meant.
  [-]
  - zozbot234 1 hour ago
    The real case for private inference is not "organic", it's "slow food". Offering slow-but-cheap inference is an afterthought for the big model providers, e.g. OpenRouter doesn't support it, not even as a way of redirecting to existing "batched inference" offerings. This is a natural opening for local AI.
    [-]
    - selectodude 1 hour ago
      But how slow is too slow (faster than you’d think) and even then, you’re in for $25,000 for even the most basic on-premise slow LLM.
- aplomb1026 32 minutes ago
  [dead]
operatingthetan 1 hour ago
The incremental price increases between products is funny.
$12,000, $65,000, $10,000,000.
[-]
- sudo_cowsay 1 hour ago
  I mean the difference in performance is quite big too. However, the 10,000,000 is a little bit too much (imo).
- znpy 1 hour ago
  I was more worried by the 600kW power requirement... that's 200 houses at full load (3kw) in southern europe... which likely means 400 houses at half load.
  the town near my hometown has 650 – 800 houses (according to chatgpt).
  crazy.
  [-]
  - dist-epoch 39 minutes ago
    Your hometown also has public lightning, water pumps, and probably some other stuff.
wongarsu 2 hours ago
Sound like solid prebuilt with well balanced components and a pretty case
Not revolutionary in any way, but nice. Unless I'm missing something here?
[-]
- eurekin 2 hours ago
  It's pretty close to what people have been frankenbuilding on r/LocaLLaMa... It's nice to have a prebuild option.
  [-]
  - speedgoose 1 hour ago
    You could also order such configurations from a classic server reseller as far as I know. The case is a bit original there.
  - nextlevelwizard 1 hour ago
    Tiny boxes are already several years old IIRC
zahirbmirza 19 minutes ago
10 mil today... 1k in 10 years. Are OpenAI and Anthropic overvalued?
mayukh 1 hour ago
What’s the most effective ~$5k setup today? Interested in what people are actually running.
[-]
- cco 10 minutes ago
  Biggest Mac Studio you can get. The DGX Spark may be better for some workflows but since you're interested in price, the Mac will maintain it's value far longer than the Spark so you'll get more of your money out of it.
- BobbyJo 1 hour ago
  Depends. If token speed isn't a big deal, then I think strix halo boxes are the meta right now, or Mac studios. If you need speed, I think most people wind up with something like a gaming PC with a couple 3090 or 4090s in it. Depending on the kinds of models you run (sparse moe or other), one or the other may work better.
- bensyverson 1 hour ago
  Sadly $5k is sort of a no-man's land between "can run decent small models" and "can run SOTA local models" ($10k and above). It's basically the difference between the 128GB and 512GB Mac Studio (at least, back when it was still available).
- EliasWatson 1 hour ago
  The DGX Spark is probably the best bang for your buck at $4k. It's slower than my 4090 but 128gb of GPU-usable memory is hard to find anywhere else at that price. It being an ARM processor does make it harder to install random AI projects off of GitHub because many niche Python packages don't provide ARM builds (Claude Code usually can figure out how to get things running). But all the popular local AI tools work fine out of the box and PyTorch works great.
- zozbot234 1 hour ago
  > What’s the most effective ~$5k setup today?
  Mac Studio or Mac Mini, depending on which gives you the highest amount of unified memory for ~$5k.
- kristopolous 1 hour ago
  Fully aware of the DGX spark I've actually been looking into AMD Ryzen AI Max+ 395/392 machines. There's some interesting things here like https://www.bee-link.com/products/beelink-gtr9-pro-amd-ryzen... and https://www.amazon.com/GMKtec-5-1GHz-LPDDR5X-8000MHz-Display... ... haven't pulled the trigger yet but apparently inferencing on these chips are not trash.
  Machines with the 4xx chips are coming next month so maybe wait a week or two.
  It's soldered LPDDR5X with amd strix halo ... sglang and llama.cpp can do that pretty well these days. And it's, you know, half the price and you're not locked into the Nvidia ecosystem
  [-]
  - ejpir 36 minutes ago
    unfortunately the bigger models are pretty slow in token speed. The memory is just not that fast.
    You can check what each model does on AMD Strix halo here:
    https://kyuz0.github.io/amd-strix-halo-toolboxes/
- borissk 1 hour ago
  With $5k you have to make compromises. Which compromises you are willing to make depends on what you want to do - and so there will be different optimal setup.
- oofbey 1 hour ago
  DGX Spark is a fantastic option at this price point. You get 128GB VRAM which is extremely difficult to get at this price point. Also it’s a fairly fast GPU. And stupidly fast networking - 200gbps or 400gbps mellanox if you find coin for another one.
  [-]
  - ekropotin 1 hour ago
    I’m not very well versed in this domain, but I think it’s not going to be “VRAM” (GDDR) memory, but rather “unified memory”, which is essentially RAM (some flavour of DDR5 I assume). These two types of memory has vastly different bandwidth.
    I’m pretty curious to see any benchmarks on inference on VRAM vs UM.
    [-]
    - oofbey 1 hour ago
      I’m using VRAM as shorthand for “memory which the AI chip can use” which I think is fairly common shorthand these days. For the spark is it unified, and has lower bandwidth than most any modern GPU. (About 300 GB/s which is comparable to an RTX 3060.)
      So for an LLM inference is relatively slow because of that bandwidth, but you can load much bigger smarter models than you could on any consumer GPU.
  - BobbyJo 1 hour ago
    Internet seems to think the SW support for those is bad, and that strix halo boxes are better ROI.
    [-]
    - oofbey 1 hour ago
      Meh. DGX is Arm and CUDA. Strix is X86 and ROCm. Cuda has better support than ROCm . And x86 has better support than Arm.
      Nowadays I find most things work fine on Arm. Sometimes something needs to be built from source which is genuinely annoying. But moving from CUDA to ROCm is often more like a rewrite than a recompile.
      [-]
      - BobbyJo 1 hour ago
        CUDA != Driver support. Driver support seems to be what's spotty with DGX, and iirc Nvidia jas only committed to updates for 2 years or something.
  - borissk 1 hour ago
    Can even network 4 of these together, using a pretty cheap InfiniBand switch. There is a YouTube video of a guy building and benchmarking such setup.
    For 5K one can get a desktop PC with RTX 5090, that has 3x more compute, but 4x less VRAM - so depending on the workload may be a better option.
    [-]
    - ekropotin 1 hour ago
      VRAM vs UM is not exactly apples to apples comparison.
andai 1 hour ago
Can someone explain the exabox? They say it "functions as a single GPU". Is there anything like that currently existing?
[-]
- wmf 45 minutes ago
  An NVL72 rack or Helios rack also "functions as a single GPU".
- progbits 46 minutes ago
  TPU pods
vlovich123 2 hours ago
Surprising to see this with AMD GPUs considering how George famously threw up his hands as AMD not being worth working with.
[-]
- embedding-shape 1 hour ago
  Yeah, and labeling AMD "Driver Quality" as "Good" (for comparison, they label nvidia's driver quality as "Great").
  [-]
  - lostmsu 1 hour ago
    Things changed. On my new Ryzen Strix Halo laptop I was able to run training experiments with PyTorch on Windows day 1: https://news.ycombinator.com/item?id=46052535
aabaker99 40 minutes ago
> Can I pay with something besides wire transfer? In order to keep prices low and quality high, we don't offer any customization to the box or ordering process. Wire transfer is the only accepted form of payment.
Sorry, what? Is this just a scam?
[-]
- 101008 36 minutes ago
  Wire transfer has no comission or extra costs associated to it, so I find it very honest.
- ejpir 38 minutes ago
  man, cmon. a little more effort.
  [-]
  - aabaker99 35 minutes ago
    Sure thing. For those who don’t know, wiring money like this is a good way to lose your money.
    https://consumer.ftc.gov/articles/what-know-you-wire-money
operatingthetan 1 hour ago
Are we at the point where 2x 9070XT's are a viable LLM platform? (I know this has 4, just wondering for myself).
[-]
- oceanplexian 1 hour ago
  These things don’t have Flash Attention or either have a really hacked together version of it. Is it viable for a hobby? Sure. Is it viable for a serious workload with all the optimizations, CUDA, etc.. Not really.
sudo_cowsay 1 hour ago
I always wonder about these expensive products: Does the company make them once its ordered or do they just make them beforehand?
ppap3 56 minutes ago
I thought there was a typo in the price
himata4113 1 hour ago
exabox reads as if it was making a joke of something or someone. if it's real then it's really interesting!
orliesaurus 2 hours ago
I wonder if this is frontpage right now because of the other tiiny (the names are similar) video that went viral ... which turns out wasn't an actual product by the tinygrad linked in this post[1]
[1]https://x.com/ShriKaranHanda/status/2035284883384553953
droidjj 1 hour ago
Adding this to my list of ~beautifully~ designed things to buy when I win the lottery.
heinternets 2 hours ago
exabox -
720x RDNA5 AT0 XL 25,920 GB VRAM 23,040 GB System RAM
~ $10 Million
Who is the target market here?
[-]
- LorenDB 1 hour ago
  I can't find sources but I think they are building it for Comma.ai (geohot's other company) so that Comma can scale up their training datacenter.
- orochimaaru 2 hours ago
  And... what about 20k lbs and 1360 cubic feet screams "tiny" :)
  [-]
  - smoyer 2 hours ago
    That is very close to a half-length shipping container.
- mayukh 1 hour ago
  A non-trivial share of this market won’t show up in public data. That makes most estimates unreliable by default
- dist-epoch 37 minutes ago
  A company which doesn't want the big LLM providers to see it's prompts or data - military, health, finance, research
- spiderfarmer 2 hours ago
  VC funded startups
throwatdem12311 1 hour ago
Finally, a computer that should be able to run Monster Hunter Wilds with decent performance.
But let’s be real, 12k is kinda pushing it - what kind of people are gonna spend $65k or even $10M (lmao WTAF) on a boutique thing like this. I dont think these kinds of things go in datacenters (happy to be corrected) and they are way too expensive (and probably way too HOT) to just go in a home or even an office “closet”.
[-]
- oofbey 1 hour ago
  It’s not for people to buy. It’s for companies to buy. Compare to salary, and it’s cheap.
  [-]
  - aziaziazi 1 hour ago
    > What's the goal of the tiny corp? To accelerate. We will commoditize the petaflop and enable AI for everyone.
    I had the same feeling as throwadem when reading this. Your comment clarify what they meant by "everyone"
  - throwatdem12311 1 hour ago
    What companies are buying this instead of like a Dell server or whatever?
    [-]
    - flumpcakes 1 hour ago
      These specs look enormously cheaper than doing it with dell servers. The last quote I had for a bog standard dell server was $50k and only if bought in the next few days or so. The prices are going up weekly.
      [-]
      - throwatdem12311 53 minutes ago
        So what’s the catch? If it seems too good to be true it probably is.
  - lostmsu 1 hour ago
    Hm, I compared my salary with $10M and it doesn't feel cheap. I guess skill issue.
    [-]
    - throwatdem12311 1 hour ago
      But how will I make ad-supported youtube videos about how I automated my life with OpenClaw using a $10M boutique AI server to make a few thousand in ad revenue while burning tens of thousands per month on API cost.
flykespice 29 minutes ago
"tiny" and it's 20k lbs and cost about 10k...
Since when did our perception of tiny blow out of size in tech? Is it the influence of "hello world" eletron apps consuming 100mb of mem while idle setting the new standard? Anyway being an AI bro seems like an expensive hobby...
jauntywundrkind 2 hours ago
My interest in anything associated with geohot took a colossal nose dive today after seeing this post against democracy, quoting frelling M*ncius M*ldbug: Democracy is a Liability. https://news.ycombinator.com/item?id=47469543 https://geohot.github.io//blog/jekyll/update/2026/03/21/demo...
Theres a lot there that makes sense & I think needs to be considered. But a lot just seems to be out of the blue, included without connection, in my view. Feels like maybe are in-grouo messages, that I don't understand. How this is headered as against democracy is unclear to me, and revolting. I both think we must grapple with the world as it is, and this post is in that area, strongly, but to let fear be the dominant ruling emotion is one of the main definitions of conservativism, and it's use here to scare us sounds bad.
[-]
- kelvinjps10 1 hour ago
  He was always defending democracy and freedom before, and that was his argument for the local AI thing? What changed?
- fragmede 1 hour ago
  Damn, that's a take.
- pencilheads 1 hour ago
  Geohot has always been an arrogant cunt who thinks he's better than everyone else. That blog post is totally on brand.
- tadfisher 1 hour ago
  For those unaware, Mencius Moldbug is the pen name of Curtis Yarvin, thought leader for the Silicon Valley branch of right-wing technofascist weirdos which includes Peter Thiel and apparently half of a16z.
- stale2002 1 hour ago
  Geohotz's politics are fairly straightforward once you understand his background. Geohotz is the prodigy child who, at the age of ~16 accomplished amazing technical feats on his own.
  And his politics are a derivative of Great Man Theory, and his positions on things like democracy follow from that. This idea, and those espoused by some of the VC/tech elite like Peter Theil are that singular hardworking genius individuals can change the world on their own, and everyone who not in this top 0.1% are borderline NPCs.
  They do this both because of their genius/hardwork, and also because they are willing to break the rules that are set forth by this bottom 99.9%.
  I'm starting to call this ideology Authoritarian techno-Libertarianism. Its a delibriately oxymoronic name that I use, because these "Great Men" are definitely trying to change the world. IE, they are trying to impose their goals and values on the world without getting the buyin of other people.
  Thats the "authoritarian" part. And then the "libertarian" part is that they are going about this imposition of their will on the world by doing it all themselves, through their own hard work.
  Think "Person invents a world changing technology, that some people thing is bad, and just releases it open source for anyone to use". AI models are a great example, in fact. Once that technology is out there the genie cannot be put back into the bottle and a ton of people are going to lose their jobs, ect.
  A distain for democracy follows directly from things like this. You dont wait for people to vote to allow you to change the world by inventing something new. You just do and watch the results.
  [-]
  - SilverElfin 19 minutes ago
    What makes it “Libertarianism” still? To me it feels like they’re taking away freedom, control, and influence from everyone who is not them. Even the concentration of wealth is itself taking away everyone else’s places in the world.
baibai008989 58 minutes ago
[dead]
Heer_J 1 hour ago
[dead]
pink_eye 1 hour ago
[flagged]
fhn 11 minutes ago
"but if you haven't contributed to tinygrad your application won't be considered" this company expects people to work for free?
[-]
- paxys 7 minutes ago
  > See our bounty page to judge if you might be a good fit. Bounties pay you while judging that fit.
  Literally the line above that