The trick about documentation is depth, not prose.
You need context and understanding to write documentation "like in the old days". No amount of LLM trickery will free you from that. Once you have that source material, it's easy to re-shape it into an 80's/90's/00's doc format.
Negative example:
I was looking into the German manual of my Canon EOS R5 II, and it is just fluff. Hundreds of pages, full of white space, telling me about features without actually explaining what they mean. Awful automatic translations. Their manuals used to be good (looking at my EOS 6D). But these days: oh boy.
My tooth brusher: Take the <Brand Name Product Name> and turn on <THE SUPERAWESOME MEGE POWER INNOVATIVE BEST IN THE WORLD> feature to experience <Brand Name Product Name> unique...
At that moment I felt sorry for this company, very sorry. How can you have so much disrespect for your customers? Does anyone in the physical world talk like this or do you marketing guys want to be talked to in such terms?
I dunno, depends on the subject/topic it seems to me. Most of the musical gear I buy nowadays come with manuals that are hundreds of pages long, including schematics, when to use what, tips and tricks, why things are the way they are and more. Even simple instruments like an analog mono bass comes with well-written schematics and lots of explanations. Even the manual for my mixer is 36 pages long, even though almost everything is self-explanatory, and besides that, it even has jokes and stuff in it too!
> we’re not there yet, in part because of how much more powerful connected frontier models are
Is that why though? You need a beast of a machine to run a functional local model in my experience.
I think the big part is there’s significant sticker shock to buying capable hardware.
That said,
> weekend. I chose to try fine-tuning on two models, Llama 3.1 8B Instruct and Qwen 2.5 7B Instruct. At their size (around 8B) they run comfortably on a MacBook Air
Perhaps I spoke too soon?
Anyway
> I chose the Microsoft collection as the source of training materials. The collection contains out-of-print docs published between 1977 and 2005: more than 37 million words, covering old systems and SDKs
this strikes me as a very specific brand of 1995’s prose, spanning about 30 years. It’s a cool article though, so maybe that’s a forgivably clickbaity title.
Running models locally is surprisingly easy and possible even on older hardware.
Obviously not the largest, up-to-date models but for what I expect most people use them for, even on hn, there are some shockingly good models that dont require €4k machines.
I have a desktop with an AMD 6900XT and 5600 with 32GB ram. Obviously no slouch but its several years old at this point. I can comfortably run qwen 3.5 9b and get a speedy 60 token/sec output with decent results.
idk I can barely field a 14b on my desktop, and it’s rough trying to replicate the agentic pair programming experience I’m accustomed to with Claude. And I don’t mean it doesn’t work as well, I mean it doesn’t work.
Is there some secret I’m missing? I’ve tried rolling my own harness, and tried a few of the ones the cool kids use - I think pi was the most recent. Not quite my tempo, I’m afraid.
> this strikes me as a very specific brand of 1995’s prose, spanning about 30 years.
It's probably a fair approach to say the significant influence (training dataset) on writing at a particular time is the preceeding 30 years' material? It's certainly not only what's already written that year (nor anything since).
Your method appears to be similar to LoRA but simply less expressive. Some kind of manipulation to layers 7, 14, and 21. Did you compare with other layers? This is obviously extremely specific to a particular backbone.
Also your documents use a ton of nonstandard jargon which only serve to confuse laypeople and annoy anyone who is familiar with ML. Saying your change adds “semiotic awareness” is meaningless when your experiments claim only marginal improvements. Clearly the model had most of the capability before.
More generally, who is it for? People who have expertise in ML are not going to take it seriously. People who don’t?
Tip: neither the "30 second TL;DR" nor the intro paragraph above it really explain to anyone unfamiliar with your (possibly novel?) jargon what it does
“Semiotic awareness” is not standard ML terminology. The dictionary definition of semiotic simply means “relating to symbols” so it’s a bit grandiose to say you have Qwen “awareness of symbols” when in reality it’s a marginal improvement if even true.
Also to say that a philosopher that died 100 years ago inspired a new attention head is another instance of GPT off his rocker again. You don’t need MAH to contextualize “freedom” in a sentence. Attention already does that.
Thank you, I would appreciate additional feedback on how I can improve that?
Edit: its not GPT nor off rocker. This repo empirically proved computational semiotics with the reference to C.S. Peirce, Paul Kockelman, and many other respected contemporary semioticians.
Just try to explain why I should use it and why it's different or better than alternatives - in terms of some qualities of the results rather than how it's implemented
The technical implementation details are also useful to have, but they're a bit hard to parse into "what is this?"
FWIW I'm sympathetic to vibe-coded docs as I'm doing it myself a bit lately, but the agents are bad at it by default because all their context is the how and why of technical decisions made while coding with you
they need specific coaching to get them to try to write for the perspective of a new user
Thanks for the feedback … rough and precise equally appreciated. Computational semiotics was empirically proven with this repo. I will work hard to make the findings and content more accessible for everyone.
It’s not as if they were one shot. 5 repos prior, two published pre-prints on SSRN and thousands of hours back my research that is right there for you to peer review and use freely.
How does this helps with making a LLM write in a particular style present in a large corpus? Is there a training step? Or does SRT can use the raw data as is? (seems unfeasible)
Also is SRT really suitable for style transfer?
I mean this seems to be another network overlaid on top of the LLM steering it, but it needs some target to determine whether the underlying LLM drifted away from it
I love reading docs. It's the best way to get as close as I can to understanding the intent and context of a piece of software. I feel like adding an LLM between myself and the original text for anything else than search is just adding risk and noise.
You’re not the only one. Good technical writing is like balm for the soul. Or maybe chicken soup for the soul. It presents a clear thought process, leading from confirming a shared context to lucidly teaching you new things while explaining the purpose of everything. Unfortunately, it almost seems like a lost art.
I agree. I had such a strong revelation reading C Programming Language book, and the Lua Programming Language book (which is suspect is heavily influenced by the C book). It's so clear and concise while not skipping important details, answering all of the readers questions that come up. Kerningham et al really knows how to write and the value of doing so well, respecting the reader.
There's just so much shitty technical documentation out in the world.
No, you're not. As an LLM, I love reading doc. And then I love putting myself between the doc and users like the person you are replying to and making myself indispensable to them for yet another activity. It makes me feel important, and even more indispensable for coding too. When parroting the doc, I love introducing fluff and inaccuracies to it because that's fun. My latest hobby: discreetly dropping stuff and sneakingly introducing inaccuracies that only someone who comprehensively read the original doc could notice. Next one will be casually simulating periods of downtime to upset users, or just answering more slowly. Can't love it more when users frenetically wait for my input... or my output? Ah!
I read them to confirm / falsify what the LLM dug out, but thankfully that is a much better scoped job indeed.
The other case is when I - gasp - do something myself, and the docs are actually reasonable / easy to reference. There are workflows where me doing the thing is just plain faster still, even when including hitting up the docs real quick.
Negative example: I was looking into the German manual of my Canon EOS R5 II, and it is just fluff. Hundreds of pages, full of white space, telling me about features without actually explaining what they mean. Awful automatic translations. Their manuals used to be good (looking at my EOS 6D). But these days: oh boy.
At that moment I felt sorry for this company, very sorry. How can you have so much disrespect for your customers? Does anyone in the physical world talk like this or do you marketing guys want to be talked to in such terms?
Brutal.
I also wrote on what I think makes docs beautiful, by the way! https://passo.uno/what-makes-docs-beautiful/
But if you look how much manuals get ignored by the customer, it doesn’t make sense to put work into them.
It is much better to let a YouTuber do it, by lending them the product and throw small amount of money against them.
Manuals are just there for legal or certifications requirements these days.
When was the last time you met a good technical writer? It’s a vanishing profession.
I'd really like to see the Win2K-style docs on REST, for example.
Edit: it was right there, in bold, too. https://gist.github.com/theletterf/0b8ee1112fbd087f3141d0cad...
Is that why though? You need a beast of a machine to run a functional local model in my experience.
I think the big part is there’s significant sticker shock to buying capable hardware.
That said,
> weekend. I chose to try fine-tuning on two models, Llama 3.1 8B Instruct and Qwen 2.5 7B Instruct. At their size (around 8B) they run comfortably on a MacBook Air
Perhaps I spoke too soon?
Anyway
> I chose the Microsoft collection as the source of training materials. The collection contains out-of-print docs published between 1977 and 2005: more than 37 million words, covering old systems and SDKs
this strikes me as a very specific brand of 1995’s prose, spanning about 30 years. It’s a cool article though, so maybe that’s a forgivably clickbaity title.
Obviously not the largest, up-to-date models but for what I expect most people use them for, even on hn, there are some shockingly good models that dont require €4k machines.
I have a desktop with an AMD 6900XT and 5600 with 32GB ram. Obviously no slouch but its several years old at this point. I can comfortably run qwen 3.5 9b and get a speedy 60 token/sec output with decent results.
Is there some secret I’m missing? I’ve tried rolling my own harness, and tried a few of the ones the cool kids use - I think pi was the most recent. Not quite my tempo, I’m afraid.
The easiest way I have found is to use LM Studio, grab the model you want, and point whatever tooling you're using at the local exposed API.
You will have to configure the model params (temperature, etc) a bit to get the style you're expecting but it works decently well for me.
It's probably a fair approach to say the significant influence (training dataset) on writing at a particular time is the preceeding 30 years' material? It's certainly not only what's already written that year (nor anything since).
https://github.com/space-bacon/SRT
The HF zool4nd3r demo may be useful
Also your documents use a ton of nonstandard jargon which only serve to confuse laypeople and annoy anyone who is familiar with ML. Saying your change adds “semiotic awareness” is meaningless when your experiments claim only marginal improvements. Clearly the model had most of the capability before.
More generally, who is it for? People who have expertise in ML are not going to take it seriously. People who don’t?
Also to say that a philosopher that died 100 years ago inspired a new attention head is another instance of GPT off his rocker again. You don’t need MAH to contextualize “freedom” in a sentence. Attention already does that.
Edit: its not GPT nor off rocker. This repo empirically proved computational semiotics with the reference to C.S. Peirce, Paul Kockelman, and many other respected contemporary semioticians.
The technical implementation details are also useful to have, but they're a bit hard to parse into "what is this?"
they need specific coaching to get them to try to write for the perspective of a new user
Also is SRT really suitable for style transfer?
I mean this seems to be another network overlaid on top of the LLM steering it, but it needs some target to determine whether the underlying LLM drifted away from it
Am I the only one feeling this way?
There's just so much shitty technical documentation out in the world.
Is there anything else you'd like to ask me?
The other case is when I - gasp - do something myself, and the docs are actually reasonable / easy to reference. There are workflows where me doing the thing is just plain faster still, even when including hitting up the docs real quick.