I tried Claude Code for a month…

Large Language Models are a pretty fascinating piece of technology. Back in December I saw some chatter about Claude Code and I wanted to try it out myself. It is really fun, it's jQuery in 2010 levels of fun. And, more concerning, it's fun even when it waste my time.

I tried it on a few things in the past 4 weeks:

The very first thing was a Firefox browser extension. I didn't want to spend time learning the whole extension system for my tiny use case. Claude Code spit it out first try, I followed it's instructions to get it set up on my browser and it just worked. Nice, as a first impression that's hard to beat.
Then I needed a few bash scripts to keep medias organized and optimized. Less easy to get it working but it managed with enough prompting. I think we need to thank the countless bash tutorials that were made over the years. My ego though I could have done it faster, I disagree.
After that I asked it to refactor some legacy part of the code supporting my Drupal bot. It looked good, then 3 weeks later I noticed it broke something and I didn't catch it (codebase lacks tests). At least now I'm not double-fetching every core issue updated in the last 20 minutes. That's another win.
With these few quick wins I tried something more ambitious: test how LLMs can help with Drupal credit attribution. When I commit a long issue on Drupal core, it can take up to 15 minutes to figure out who did what, and evaluate the activity against the core credit guidelines. How good can a LLM be at assigning credit when you give it our guidelines and the full history of an issue? not too bad it turns out. That's a subject for later, when the model choice and finetuning is done.
The mass update of GitLab merge requests to point to the new main branch of Drupal core was generated by Claude Code in a few minutes. It used the same architecture of my other commands and got that working faster than I could have.
In the few hours it took to finish this article, I had Claude Code generate a working implementation of the npm cli in PHP. To test if we can use it to manage contrib JS dependencies without making node a hosting requirement of Drupal. It works to manage core dependencies at least!

The LLM finetuning was the point where I hit diminishing returns. The time I spent trying to make things work with Claude Code would have been better suited to learn the domain. If you already know the domain, LLMs can be useful. If you don't you'll get lost very quickly and waste time instead of learning something.

It's happy to add duplicate code, go down rabbit holes and output a massive amount of unnecessary code, put unrelated features in the same file. It gets messy real quick if you don't frame things early on. There are ways to mitigate this of course, it's yet another layer to keep in mind. If you don't have discipline LLMs won't make things better.

A few things I liked about working with LLMs:

I have a hard time starting things. this just spit out whatever and it's enough to get things started and actually make progress. Less procrastinating is a win.
The code generated tend to have a more finished feel to it, like adding a --dry-run options to bash script. I know it's by necessity so that the tool can test its work, it's still nice to have.
It implements some edge cases I wouldn't have bothered with first, until I actually need it.

I have reservations about AI/LLMs, how they were made, trained, and the whole philosophy of the folks with the money in this space. If that's your case too, check out the work of the DAIR Institute. For now I think I'll keep cautiously working with this type of tooling. I'll probably move to OpenCode soon to be able to use local models when (not if) AI providers raise prices to unsustainable amount.

Wordpress just released a benchmark for AI tools, we could go that way but I don't think we have the market share to make anyone care about it. There is a risk of being overrun by AI slop code contributions, as a maintainer and community member I don't want that to happen. What we can do right now to prevent this is to add a couple of AGENTS.md files to core (and contrib?). We can't tell people to not use a tool, but we can definitely help them help us.

A few minutes after posting this I ran into a mastodon thread that sort of calls out LLM users for their addict behaviors, and you know what, the whole post I wrote kinda validate their point. As with everything, proceed with care and humanity.

Second part: …and now I'm recovering