Codebase understanding and LLMs driven code

I've been maintaining the Drupal Core needs-review-queue-bot for a few years now. This started as a bunch of PHP files in a folder, upgraded into proper Symfony console commands, and eventually ended up as a large Python codebase… I used LLMs to knock out some overdue features, at first it helped me sort out some memory issues and infrastructure problems I was having, great. Then code started to diverge from what I was doing and now I have a half PHP half Python mess on my hand. I'm not going to mention the state of my DB, thankfully I move all the LLM-generated data in it's own DB.

The Python part is used to crawl over all of contrib modules to find interesting pieces of data. That's where the data for the first two blog posts came from. I've started working on this mid-april. I still do not have an intimate knowledge of my own codebase. I'm certain there are some pretty badly optimized steps that cost me a lot of time and processing power, but I can't pinpoint them since I glanced over the code while it was being done. Now that I want to make this reliable and run periodically I need to get a better understanding. LLM got me there rather quickly for a basic proof of concept, now that accuracy and reliability become important it doesn't really scale.

Every time I use LLM to work on something it's the same thing. I need to remind myself that for anything I need to maintain for more than a month, relying on LLMs is going to be a liability. I feel like a lot of my LLM posts are simply confirming what's already well known, anyway here is my data point.