IT/Software career thread: Invert binary trees for dollars.

Khane

Got something right about marriage
21,976
16,004
To add to that, as a separate post just to talk specifically about how Claude Code operates. It does things you never asked it to do. It needs access to a repo to work so it can pretend it needs to also write test harnesses and cases to check its own work and then refactor 2, 3, 10 fucking times for simple tasks you give it for the sake of "perfection". Its a huge waste of compute and I stopped using it altogether after about a month and instead stick to cowork because I can give it a task and it just performs that specific task.
 

Control

Golden Baronet of the Realm
5,746
15,996
I mean, as long as you're happy paying the price for what you're getting out of it, that seems reasonable enough. Isn't that how all businesses ultimately work? In my case (at least the one I outlined above), I could compare models by having them write the same thing and at different thinking levels. I could probably optimize by having opus outline, haiku research and do the initial pass, and then test the results using different models on the different review/audit passes. or I can let opus do it all and end up with around 150 articles/500k words for $50 that was going to evaporate anyway since I hadn't used my quota this week. Of course, if I optimized, I might be able to get double the output with a similar quality, but the work of optimizing has a cost too. Probably worth doing if you're going to run a content mill, not so much if you're just experimenting though.
 

Khane

Got something right about marriage
21,976
16,004
I mean, as long as you're happy paying the price for what you're getting out of it, that seems reasonable enough. Isn't that how all businesses ultimately work? In my case (at least the one I outlined above), I could compare models by having them write the same thing and at different thinking levels. I could probably optimize by having opus outline, haiku research and do the initial pass, and then test the results using different models on the different review/audit passes. or I can let opus do it all and end up with around 150 articles/500k words for $50 that was going to evaporate anyway since I hadn't used my quota this week. Of course, if I optimized, I might be able to get double the output with a similar quality, but the work of optimizing has a cost too. Probably worth doing if you're going to run a content mill, not so much if you're just experimenting though.

This is shortsighted, you're speaking in today's pricing terms. This is the point I am making.
 

Control

Golden Baronet of the Realm
5,746
15,996
This is shortsighted, you're speaking in today's pricing terms. This is the point I am making.
Well yeah, companies that are marrying themselves to arbitrarily priced workflows that they can't readily un-marry themselves from are playing with fire. Otoh, as I posted in the AI thread, locally ran Qwen today is better than 1-year old Claude, so advancement just has to outrun any looming price explosion. Risky proposition though of course.
 

Noodleface

A Mod Real Quick
39,728
18,395
That's the problem I have is that there's no good way to judge which model is appropriate for a task other than feels. So I just crank it up to the max.
Oh even better. We've found the 30x token models actually perform worse than 6x models, etc. it's a huge fucking grift
 

ShakyJake

<Donor>
8,675
21,397
To add to that, as a separate post just to talk specifically about how Claude Code operates. It does things you never asked it to do. It needs access to a repo to work so it can pretend it needs to also write test harnesses and cases to check its own work and then refactor 2, 3, 10 fucking times for simple tasks you give it for the sake of "perfection". Its a huge waste of compute and I stopped using it altogether after about a month and instead stick to cowork because I can give it a task and it just performs that specific task.
I can't say this has been my experience, but it may be due to my workflow.

I follow a structured process: I first have the model fetch the ADO story and write it to `story.md`. The story already includes Given-When-Then scenarios, which provide a solid foundation. It then reads the story, investigates the relevant code, and documents its findings in `investigation.md`. I review this investigation to confirm it is on the right track. Next, I ask it to create a multi-step implementation plan with human-verifiable outcomes, which it writes to `plan.md`. Only then do I have it begin executing the plan step-by-step, reviewing each step myself. This is not a massive "go do it all and I hope it’s right at the end" approach.

This method keeps the model firmly anchored to the task and prevents it from wandering off course. I am not certain how efficient the process is overall, but I suspect I could use a less capable model for the actual implementation work, since the plan already spells out exactly what needs to be done.

The key benefit is that I avoid working continuously within the same context window or conversation history. By writing everything to Markdown files, I can easily resume with a completely fresh session later.

This is just something I personally came up with. No clue if I’m "doing it wrong."
 
  • 1Like
Reactions: 1 user

Khane

Got something right about marriage
21,976
16,004
I can't say this has been my experience, but it may be due to my workflow.

I follow a structured process: I first have the model fetch the ADO story and write it to `story.md`. The story already includes Given-When-Then scenarios, which provide a solid foundation. It then reads the story, investigates the relevant code, and documents its findings in `investigation.md`. I review this investigation to confirm it is on the right track. Next, I ask it to create a multi-step implementation plan with human-verifiable outcomes, which it writes to `plan.md`. Only then do I have it begin executing the plan step-by-step, reviewing each step myself. This is not a massive "go do it all and I hope it’s right at the end" approach.

This method keeps the model firmly anchored to the task and prevents it from wandering off course. I am not certain how efficient the process is overall, but I suspect I could use a less capable model for the actual implementation work, since the plan already spells out exactly what needs to be done.

The key benefit is that I avoid working continuously within the same context window or conversation history. By writing everything to Markdown files, I can easily resume with a completely fresh session later.

This is just something I personally came up with. No clue if I’m "doing it wrong."

This is technically how we are "supposed" to do it but thats not what im talking about. I'm saying that Claude cowork will do exactly that with less token usage and less "meandering" than Claude Code will.
 
Last edited:

ShakyJake

<Donor>
8,675
21,397
This is technically how we are "supposed" to do it but thats now what im talking about. I'm saying that Claude cowork will do exactly that with less token usage and less "meandering" than Claude Code will.
Yeah, I've never used Cowork. Just Claude Code CLI and, for work, we're stuck with Copilot CLI.