Vibe-Coded Ext4 for OpenBSD

(lwn.net)

48 points | by corbet 2 hours ago

18 comments

kgeist 26 minutes ago
Binaries are copyrightable in both the US and the EU, and they are not technically produced by a human either, they're produced by a computer program. I honestly don't understand why this isn't extended to AI-generated code. Isn't it the same thing? One could argue that compilers merely transform source code into binaries "as is," while AI models have some "knowledge" baked in that they extract and paste as code. But there are compilers that also generate binaries by selecting ready-to-use binary patches authored by compiler developers and combining them into a program. One could also argue that, in the case of compilers, at least the input source code is authored by a human. But why can't we treat prompts as "source code in natural language" too? Where is the line between authorship and non-authorship, and how is the line defined? "Your prompt was too basic to constitute authorship" doesn't sound like an objectibe criterion.
Maybe for lawyers, AI is some kind of magical thing on its own. But having successfully created a working inference engine for Qwen3, and seeing how the core loop is just ~50 lines of very simple matrix multiplication code, I can't see LLMs as anything more than pretty simple interpreters that process "neural network bytecode," which can output code from pre-existing templates just like some compilers. And I'm not sure how this is different from transpilers or autogenerated code (like server generators based on an OpenAPI schema)
Sure, if an LLM was trained on GPL code, it's possible it may output GPL-licensed code verbatim, but that's a different matter from the question of whether AI-generated code is copyrightable in principle.
Interestingly, I found an opinion here [0] that binaries technically shouldn't be copyrightable, and currently they are because:
```
  the copyright office listened to software publishers, and they wanted binaries protected by copyright so they could sell them that way
```
[0] https://freesoftwaremagazine.com/articles/what_if_copyright_...
FeepingCreature 1 hour ago
> So as of today, the Copyright system does not have a way for the output of a non-human produced set of files to contain the grant of permissions which the OpenBSD project needs to perform combination and redistribution.
This seems extremely confused. The copyright system does not have a way to grant these permissions because the material is not covered under copyright! You can distribute it at will, not due to any sort of legal grant but simply because you have the ability and the law says nothing to stop you.
[-]
- plorg 38 minutes ago
  This all relies, as the article points out, on everyone looking directly at code that both looks like and works like the only extant codebase for EXT4 and nonetheless concluding that in fact the computer conjured it from the aether. If I wrote a program that zipped up the Linux kernel source, unzipped it, and grepped -v for comments it would not then be magically transformed into unattributable public domain software.
- jagged-chisel 1 hour ago
  Eh … the argument will likely be things created by Thing at the behest of Author is owned by the Author. It’ll take a few cases going through the courts, or an Act of Congress to solidify this stuff.
  [-]
  - wongarsu 1 hour ago
    Just like we settled on photographers havin copyright on the works created by their camera. The same arguments seem to apply
    The US Copyright Office has published a piece that argues otherwise, but a) unless they pass regulation their opinion doesn't really matter, and b) there is way too much money resting on the assumption code can be copyrighted despite AI involvement.
    [-]
    - fragmede 1 hour ago
      It's not settled. The monkey selfie copyright dispute ruled that a monkey that pressed the button to take a selfie, does not and cannot open the copyright to that photo, and neither does the photographer who's camera it was. How that extends to AI generated code is for the courts to decide, but there are some parallels to that case.
      https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...
      [-]
      - wongarsu 47 minutes ago
        But with the monkey there are two levels of separation from the artist: the human makes the creative decision to hand the camera to a monkey, who presses the trigger, and the camera makes the picture. Compared to the single layer of separation of a photographer choosing framing and camera parameters, pressing the trigger and the camera taking the picture. Or the zero levels of separation when the artist paints the picture.
        A programmer writing code would be like the painter, and the programmer writing a prompt for Claude looks a lot like the photographer. The prompt is the creative work that makes it copyrightable, just like the artistic choices of the photographer make the photo copyrightable
        You could argue that the prompt is more like a technical description than a creative work. But then the same should probably be true of the code itself, and consequently copyright should not apply to code at all
        The copyright office's argument is that the AI is more like a freelancer than like a machine like a camera. Which you might equate to the monkey, who's also a bit freelancer like. But I have my doubts that holds up in court. Monkeys are a lot more sentient than AIs
      - KallDrexx 45 minutes ago
        The copyright office is pretty clear on this if you read: https://www.copyright.gov/ai/Copyright-and-Artificial-Intell....
        There is case law surrounding the fact that just because you commission a work to another entity doesn't give you co-authorship, the entity doing the work and making creative decisions is the entity that gets copyright.
        In order for you to have co-authorship of the commissioned work you have to be involved and pretty much giving instruction level detail to the real author. The opinion shows many cases that its not the case with how LLM prompts work.
        The monkey selfie case is relevant also because since it also solidifies that non-persons cannot claim copyright, that means the LLM cannot claim copyright, and therefore it does not have copyright that can be passed onto the LLM operator.
      - michaelmrose 43 minutes ago
        The law is whatever it needs to be to satisfy monied interests with the degree of acceptable of adaptation being a function of the unity of those interests and the political ascendancy of those in favor.
        Overwhelmingly this is in favor of treating ai as a tool like Photoshop.
        Even those against AI disagree on different matters and will overwhelmingly want a cut not a different interpretation.
      - charcircuit 48 minutes ago
        This filesystem driver was made by a human using AI, not a monkey.
  - HappySweeney 1 hour ago
    Haven't there already been a few cases, each of which found that mechanically-produced works are not copywritable?
    [-]
    - senko 1 hour ago
      no
- themafia 44 minutes ago
  Just because you can distribute something doesn't mean you aren't violating someone else's copyright. You cannot assume that just because a language model popped out some code for you that it is clear of any other claims.
  This is just lazy copyright whitewashing.
LeFantome 1 hour ago
The article is largely about the copyright concerns of LLM generated code that was almost certainly trained on the GPL original.
Also, it is essentially an ext2 filesystem as it does not support journaling.
ethin 1 hour ago
> Lacking Copyright (or similarily a Public Domain declaration by a human), we don't receive sufficient rights grants which would permit us to include it into the aggregate body of source code, without that aggregate body becoming less free than it is now.
Can someone explain this to me? I was under the impression that if a work of authorship was not copyrightable because it was AI generated and not authored by a human, it was in the public domain and therefore you could do whatever you wanted with it. Normal copyright restrictions would not apply here.
[-]
- Joel_Mckay 6 minutes ago
  Data theft of service or piracy from the web and "AI" users content are used in the model training sets, and when codified the statistical saliency is significant if popular content is present.
  For example, when an LLM does a vector search, there is a high probability of pirated content bleed-though and isomorphic plagiarism in the high dimensional vector space results. Thus, often when you coincidentally type in "name a cartoon mouse", there is a higher probability Disney "Micky Mouse" will pop out in the output rather than "Mighty Mouse". Note Trademarks never expire if the fees are paid, and Disney can still technically sue anyone that messes with their mouse.
  Much like em dashes "--", telling the current set of models to stop using them inappropriately often fails. Also, activation capping is used to improve the models behavioral vector, and have nothing to do with the Anthropic CEO developing political ethics.
  LLM are useful for context search, but can't function properly without constantly stealing from actual humans. Thus, will often violate copyright, trademark, and patents. In a commercial context it is legally irrelevant how the output has misappropriated IP, and one can bet your wallet the lawyers won't care either. No, IP is not public domain for a long time (17 to 78 years) regardless of peoples delusions, even if some kid in a place like India (no software patents) thinks it is..
  This channel offers several simplified explanations of the work being done with models, and Anthropic posts detailed research papers on its website.
  https://www.youtube.com/watch?v=YDdKiQNw80c
  https://www.youtube.com/watch?v=Xx4Tpsk_fnM
  https://www.youtube.com/watch?v=JAcwtV_bFp4
  Many YC bots are poisoning discourse -- so this thread will likely get negative karma. Some LLM users seem to develop emotional or delusional relationships with the algorithms. The internet is already >52% generated nonsense and growing. =3
joshstrange 32 minutes ago
> Who is the copyright holder in this case? It clearly draws heavily from an existing work, and it's clear the human offering the patch didn't do it. It's not the AI, because only persons can own copyright. Is it the set of people whose work was represented in the training corpus? Was the it the set of people who wrote ext4 and whose work was in the training corpus? The company who own the AI who wrote the code? Someone else?
I don't love this take. Specifically:
> it's clear the human offering the patch didn't do it
I find it hard to believe that there wasn't a good bit of "blood, sweat, and tears" invested by a human directing the LLM to make this happen. Yes, LLMs can spit out full projects in 1 prompt but that's not what happened here. From his blog the work on this spanned 5 months at least. And while he probably wasn't working on it exclusively during that time, I find it hard to believe it was him sending "continue" periodically to an LLM.
Anyone who has built something large or complicated with LLM assistance knows that it takes more than just asking the LLM to accomplish your end goal, saying "it's clear the human offering the patch didn't do it" is insulting.
I've done a number of things with the help of LLMs, in all but the most contrived of cases it required knowledge, input from me, and careful guidance to accomplish. Multiple plans, multiple rollbacks, the knowledge of when we needed to step back and when to push forward. The LLM didn't bring that to the table. It brought the ability to crank out code to test a theory, to implement a plan only after we had gone 10+ rounds, or to function as grep++ or google++.
LLMs are tools, they aren't a magic "Make me ext4 for OpenBSD"-button (or at least they sure as hell aren't that today, or 5 months ago when this was started).
g0xA52A2A 1 hour ago
Wow that thread just kept going. Whilst the LWN article covered most of the "highlights" I think this reply from Theo is pretty suscient on the topic at large [1].
[1] https://marc.info/?l=openbsd-tech&m=177425035627562&w=2
[-]
- bt1a 1 hour ago
  > Lacking Copyright (or similarily a Public Domain declaration by a human), we don't receive sufficient rights grants which would permit us to include it into the aggregate body of source code, without that aggregate body becoming less free than it is now.
  Thats awesome lmao
  [-]
  - raggi 1 hour ago
    that's not a statement from a lawyer, and it's confused. there is one true thing in there which is that at least under US considerations the LLM output may not be copyrightable due to insufficient human involvement, but the rest of the implications are poorly extrapolated.
    there are lots of portions of code today, prior to AI authorship, that are already not copyrightable due to the way they are produced. the existence of such code does not decimate the copyright of an overall collective work.
LeFantome 1 hour ago
Vibe coding and OpenBSD. The perfect combination.
[-]
- croes 1 hour ago
  Vibe coding and file systems are even better
  [-]
  - himata4113 1 hour ago
    trying to load with linux ext4 hmm doesn't load, but it works with my version!
    Must be a bug in the linux kernel, let me git clone and build an out-of-tree module...
  - LeFantome 1 hour ago
    Kent Overstreet has already blazed that trail.
  - api 1 hour ago
    It's clearly an experiment.
- whalesalad 1 hour ago
  I vibe-configured an Edgerouter 4 as a hot-drop box that would establish a secure tunnel and create a fake WAN for some servers that had to be temporarily pulled from service but remain operational in someones home garage. I overnight shipped it to them with two of the ports labeled, they plugged in home internet on one port, the rack on the other port, and it secure tunneled to a Linode VPS to get a public IP, circumventing all the Verizon home internet crap. I used OpenBSD. Claude did most of the work.
cachius 53 minutes ago
I'd like to see it AFL fuzzed and compared to the original. Took 2 hours to first bug ten years ago in 2016.
Discussion then https://news.ycombinator.com/item?id=11469535
Mirror of the slides https://events.static.linuxfound.org/sites/events/files/slid...
throwatdem12311 1 hour ago
Can someone just copyright wash Windows already.
[-]
- wongarsu 1 hour ago
  The Windows 2000 and Windows XP sources are readily available and must have made it into the training data. But most software has dropped XP support. You really need at least some of the Win 8 and Win 10 APIs to claim compatibility with modern software, and I doubt claude has seen those from the inside
- greyface- 1 hour ago
  ReactOS did this without any need for an LLM.
  [-]
  - ziml77 45 minutes ago
    No they didn't. It would be copyright washing if someone contributed to ReactOS who remembered large portions of the Windows code and wrote the ReactOS implementations based on that.
longislandguido 1 hour ago
~20 years ago, the Linux camp accused OpenBSD of importing GPL'd code (a wireless driver IIRC) and cried foul. The code was removed.
Fast forward to 2026, Theo says no to vibe-coded slop, prove to me your magic oracle LLM didn't ingest gobs of GPL code before spitting out an answer.
People are big mad of course, but you want me to believe Theo is the bad guy here for playing it conservatively?
[-]
- ksherlock 30 minutes ago
  The history is a bit backwards but the point is good. OpenBSD atheros wireless code was imported into linux, the BSD attributions were removed, and it was re-declared as GPL. That was later changed back.
ptidhomme 24 minutes ago
I liked this reply in the thread :
There's another issue surrounding developer skill atrophy or stunting that I find \ particularly concerning on an existential level.
If we allow people to use LLMs to write code for a given project/platform, experience \ in that platform will potentially atrophy or under develop as contributors \ increasingly rely on out sourcing their applicable skills and decisions to "AI".
Even if you believe out sourcing the minutia of coding is a net positive, the \ "enshitification" principal in general should give you pause; as soon as the net \ developer skill for a project has degraded to a point of reliance, even somewhat, I \ think we can be confident those AI tools will NOT get less expensive.
I'd rather be independently less productive, than dependent on some MegaCorp(TM)'s \ good will to rent us back access to our brains at a fair price.
- achaean
https://marc.info/?l=openbsd-tech&m=177430829313972&w=2
nurettin 1 hour ago
It is amusing to see that the only concern seems to be about a confusion around licensing, not the validity or maintainability of the code itself.
[-]
- tolciho 1 hour ago
  Eh, well, if your guns are trained on the "copyright" portion of the ship and you can sink it from there, no need to waste ammo or time trying to figure out if code bits are as explosive as the copyright bits are. Probably the code is just as sinkable, e.g. here's a recent response to some other AI slop:
```
  I didn't look closely at most of the code but one thing that caught my eye, pid is not safe for tempfile name generation, another user of the system can easily generate files that conflict with this. Functions like mktemp and mkstemp are there for a reason. Some of the other "safety" checks make no sense. If the LLM code generator is coming up with things which any competent unix sysadmin (let alone programmer) can tell are obviously wrong, it doesn't bode well for the rest.
```
  https://marc.info/?l=openbsd-ports&m=177460682403496&w=2
  The next AI winter can't come soon enough…
- kvuj 1 hour ago
  How is that different than a human writing the code? Whether an AI or a human wrote it, I would expect the same bar of validity/maintainability.
  [-]
  - nurettin 1 hour ago
    To me, SOTA is just bad at DRY, KISS, succint, well architected, top down, easy to test code and has to be constantly steered to come close. Even the article suggests that. YMMV.
  - scuff3d 1 hour ago
    Because humans make design decisions, AI just bangs it's head against the problem until it gets something that "works".
- g0xA52A2A 1 hour ago
  Is it worth the effort to review until such implications are understood?
  [-]
  - nurettin 1 hour ago
    No of course not, bike shedding licenses is where it is at.
charcircuit 1 hour ago
>incorporate knowledge carrying an illiberal license.
Copyright prevents copying. It doesn't prevent using knowledge.
[-]
- bigfishrunning 38 minutes ago
  Good luck proving an LLM has "Knowledge", and isn't just a statistical model that tries to form outputs as a copy of it's training data...
hypeatei 1 hour ago
> This obsession with copyrights between different free software ecosystems - who put the lawyers in charge?
This comment on the article is spot on. I don't vibe code or care about AI really, but it's so exhausting to see people playing lawyer in threads about LLM-generated code. No one knows, a ton of people are using LLMs, the companies behind these models torrented content themselves, and why would you spend your time defending copyright / use it as a tool to spread FUD? Copyright is a made up concept that exists to kill competition and protect those who suck at executing on ideas.
hulitu 9 minutes ago
> Vibe-Coded Ext4 for OpenBSD
Who wants to test it ? Preferably on real hardware. /s
bitwizeshift 1 hour ago
Paywalled article on something vibe-coded? That seems like a bold strategy.
[-]
- dana321 1 hour ago
  click to continue
CodeWriter23 1 hour ago
Well this is ironic, GPL advocate(s) declaring a clean implementation based on specifications infringing due to someone/something reading specs provided under license. Didn't Oracle lose that argument in court as pertains to Android implementation of Java libraries?
[-]
- corbet 1 hour ago
  I'm not sure what you're reading; there is a distinct lack of GPL advocates in that conversation.