Takeaways from leaning into agentic coding
I’m having an absolute blast playing the game of figuring out how to utilize coding agents to speed up software development at work and in my personal time. It’s been a wild ride witnessing this technology evolve so much over the past year, and discovering all the ways it can supercharge my own work.
Cursor has been my main driver for the past two years. Tab completion was a great introduction to what the power of AI could be. Their model was a great improvement over VSCode’s Copilot at the time. It felt like magic when it was able to crank out several lines of exactly what I needed, or saved those valuable seconds making a repeating change.
Then the agentic ability to modify code in the same file given some instructions showed the real power. Fast forward a few months and they were able to work across a few files, but poorly. Nowadays the LLMs’ large context windows and efficiency at using tools are so powerful at making wide changes across any size of codebase. What used to require careful manual coordination across multiple files now happens in a few dozen seconds. Building up that mental muscle to reach for an agent to make a change or go investigate how some code works is only going to pay off as these models get more capable with better access to tools and autonomy at making the right changes.
Cursor has been my daily driver for my professional and personal projects but the way I’ve been using it has had some notable inflection points in the last few months as I pushed myself to offload even more work to the agents and find the balance of quality vs speed. Here are several work and personal takeaways of using agentic coding, as well as powerful workflows I’ve been exploring.
Experience from work
Recently I’ve been on an absolute tear building out a contact centre product. It has been a great opportunity to get a lot of experience of what to do and not to do when it comes to AI-driven development. I’ve learned some hard lessons, but the productivity gains have been worth it.
The good
Given an existing contact centre UI but nothing in the backend existing, I had a big paint by numbers exercise to tackle. I started off with creating the core database models with the help of some AI, getting a thumbs up from some colleagues, then dived into wiring up each feature one at a time. I’d give an agent a line or two of instructions and tell it to wire up some frontend component to the backend models. This worked out very well since each feature at this point was just doing some CRUD work and needed an API, database manipulation, and some React logic. I still had to manually test each feature after adding it of course, but the implementation speed was incredible. What would have taken weeks of grunt work happened in days.
Agents are incredible at debugging and fixing things. It makes bugfixing all the easier and quicker to do. Several times I’ve had a wow moment when pasting in a link to a bug from Sentry and having the Sentry MCP fetch everything about that issue. Then the agent just goes to town investigating and solving it. For most easy to medium difficulty bugs it solves it most of the time. Sometimes it nails the investigation but the fix it came up with wasn’t in the proper layer, so I tell it the proper way to fix things. What a time saver. No more context switching between Sentry, your editor, and multiple files trying to track down the issue.
Another surprising thing is realizing the amount of code you can ship in a single day. 500-1000 lines a day initially felt wild, but now that’s table stakes. That doesn’t even count the tests, which can easily be 2-3 times more than that. Raising this number even higher does mean though that you’re spending less and less time looking at all the generated code. It’s a combination of trust that the AI won’t write a backdoor or superbly slow code, learning how to better prompt the AI, and continually improving the AGENTS.md file.
The name of the game these days is how much functionality can you reliably ship? This is turning out to be quite a fun challenge, and the way forward is pushing more and more of our own tasks to an agent. Agents don’t just need to be the ones writing the code, they can do the bugfinding, planning and investigation, and even be a sounding board for ourselves.
The bad
When building an embeddable chat widget for this help centre product, I had a similar situation where the frontend was all complete, and just needed to have the backend created for it. Right out of the gate I got agents to wire up the API endpoints and other logic needed for this widget. Over time though the frontend’s state management became an issue, where the data fetching and triggering of mutations just wasn’t using the well tested pattern of SWR and hooks. Haphazardly adding those in after the fact helped, but there still was this toplevel state being passed around to a lot of components. My own lack of attention to the choices the agents were making, and the signs of trouble they were running into with all the bugs they would introduce, meant adding new features and functionality slowed down a lot. With a fully functional support widget, I had several refactoring moments where lots of the state was cleaned up. My aha! moment was when the state of which page the user was on became too duplicated and cumbersome that I got an agent to rip it all out and replace it with react-router. No longer did we have a global state object to modify, all the necessary global state was nicely stored in react-router’s URL. Then the rest of the widget’s code could decide which component to render based on the URL. It all became much simpler and easier for the agent to reason about once this refactor was complete. In hindsight I should have decided on and added in these technology choices before building out the backend for this support widget, but I take all the trouble and anguish it’s caused me as a great learning opportunity. Agents are great at adding features quickly, but it won’t necessarily tell you when your architecture is becoming a mess.
Some agent-written tests are absolute trash. Existing code smells relating to testing do apply here as well, such as some code just doesn’t need to be tested, or simple code is overly tested and should just be deleted. Each crappy, invaluable test just slows your entire test suite down. So with the speed of agents being able to write code and tests, a slow test suite can happen much quicker. I’ve seen agents write tests for code that tests the old, deprecated way of calling some code alongside the new proper way. When I saw this, this made me question why there’s still an old way to call this code, even when I want everything updated to use the new way. This is the agent unfortunately taking shortcuts at its work. Ideally it should properly refactor and change the codebase to keep the code simple.
Experience from personal projects
In my personal time, I’ve been working on some fun projects I otherwise wouldn’t have gotten to. One of them is a calculator to determine how much dog food to feed my dog. This project was enjoyable because it solves the real problem of properly feeding a growing puppy the right amount of food based on their weight. Having AI hack together the React UI and calculations made its development so much quicker. I spent a respectable amount of time double-checking the calculations, since that’s the important part, and lo and behold it did get it wrong. My partner and I still use it every day. Getting the AI to design a cool but silly user interface, something that would have taken me forever to get the right Tailwind and other CSS working correctly, took just seconds. I likely would just never have added those design flourishes if it wasn’t for the AI building it.
Another project I worked on was a server status page. I wanted a way to view all my home server’s metrics and status at a glance without having to SSH in and remember all the commands. My goal was to create a script that would automatically generate and upload a webpage to Cloudflare Workers, secured behind a page accessible only to me. After having to manually fight Cloudflare Workers and fetching API keys, I used AI to help plan the script’s development. I had it SSH into the server to identify where everything was located, what commands could be used, and which versions of software were available. Beyond spot-checking the results, I didn’t look at the code at all. And it works! This is the kind of project that would have languished in my todo list forever without AI assistance.
Planning out changes
Since Cursor and the other agentic development tools introduced the mode to create plans, I’ve been using it regularly for planning out features and larger changes. It works quite well at investigating and figuring out what to do. Plus, presenting me with its plan before doing anything is a great opportunity to validate it understands the problem and proper changes. I often ask for changes to the plan to use a different technology or tweak the logic it was proposing. Checklists of steps are useful for both me and the AI to know what to do, but most of the time the AI either misses a step, doesn’t factor in a requirement, or missed investigating something and later on something doesn’t integrate well - all meaning there’s a bunch of cleanup or follow-up to do. If most of this can be caught while planning, ideally by the agent thinking and researching hard enough, the resulting code ideally would be more correct and simple.
Recently the name of the game is how much code can you ship since LLM costs are so reasonable given the value you get out of it. Projects having good AGENT.md files documenting the preferred code style, technologies, and where things should be located are all massive multipliers for getting quality code out of your agents. Manually modifying code is becoming the antipattern since it’s all about getting the AI to do more and more of your work much quicker and accurately, so any time spent doing things manually is time away from getting agents to do more work and improving the agent workflow.
Most recently I’m getting into slash commands for doing things like committing, code review, bug finding, and investigating tasks. These are major speedups, especially when moving towards running multiple agents in terminals rather than one or two in Cursor IDE. I’m still using Cursor for viewing full files and reviewing diffs, but it’s becoming less and less. cursor-agent, which is Cursor’s agentic CLI tool competitor to Claude Code, is pretty decent. It’s behind in a few areas where Claude Code is bleeding edge such as plugins, hooks and subagents, but that’s okay for now since I don’t need most of that.
Even better planning
One of the major productivity boosts from the past couple of months is upgrading my use of the agent planning mode. After chatting with the agent to plan out a change or feature, instead of getting the agent to then go build all of that, I get the entire plan put into an issue tracker, where multiple agents later on can go and work from. That issue tracker is Beads, a lightweight issue tracking system that the agents know how to use, and actually want to use. The big benefit from using this issue tracker is that it decouples the agent from the work to be done, therefore work can be picked up by multiple agents and the work can be iterated on over time instead of being lost in a large markdown file. The primary workflow I have with beads is that aforementioned planning, but also for tracking all the other bugs and things to do when I’m using agents to build for me.
One paradigm shift that Beads doesn’t explain that well is only interacting with Beads through talking with an agent. There’s no real need to learn the Bead commands or manually type out the issues to create - just tell an agent to create Bead issues for whatever you want, and it’ll do it better and faster than you can. Also tell it to work on the next available Bead issue - it’ll start plowing through tasks very fast.
My new workflow for when I have a new idea for a feature is working with an agent in planning mode to come up with a plan, but then ask it to refine its plan, adding in more detail, checking that things would integrate properly, etc. so that later on the agent picking up that piece of work would know exactly what to do. Once the markdown file is looking good - mostly I don’t look too closely at it, just verifying the important logic, architecture, and database models look right - I tell it to go ahead and create a Beads epic for this entire plan with each piece of work being broken down into subtasks. The agent then spits everything into Beads. I then ask the agent to continue refining those tasks, verifying it’s the right solution, have it add more context, figure out dependencies between tasks so that the ones that can’t be worked on aren’t started, and split out larger ones into smaller ones. Then the best part: spinning up a new agent and telling it to start working on that epic. That new agent then should have all the detail it needs and will implement it exactly as told. After that agent is done building, I get it to see if there’s anything else to add or update to the epic’s backlog, e.g., bugs, potential issues, then commit it. Rinse and repeat several times with a new agent for each task until the entire epic is completed and you basically have a fully working feature. I then go manually test the feature to make sure everything works, try to break it, and enter another handful of bugfixing or tweak tasks. Lastly, I run a few prompts to perform code review and bug-finding and we’re good to ship it. That’s an easy couple hundred to thousand lines added, even more with tests.
This workflow of bringing the rough idea to an agent, having it iterate on the plan, verifying the plan and its design, then telling it to go wild and build it is empowering. Being able to automate these loops is something that is definitely on the minds of many and will make many developers unstoppable code machines. We’re already seeing it with many people able to commit magnitudes more code a day.
I really resonated with one of Steve Yegge’s recent posts where he rightly says that rewriting code as an antipattern is a thing of the past. AI can rewrite swaths of code so quickly, so the cost is close to zero compared to adapting or refactoring inflexible code. Getting past the concept of reviewing all of your code is going to be considered a slowdown too. For most things you can get by with having the AI review the code for you - as long as you are also focusing on good architecture and fixing bugs and code smells, you’re golden.
Scripting these workflows
After using Beads for a few weeks, now I’ve figured out a few useful prompts to help with the process of idea -> Beads issues -> refine -> build -> bugfinding and cleanup. Each step involves a specific prompt that would be iterated on potentially several times to introduce enough certainty into the process that the thing was built correctly and with the necessary quality. Manually rerunning these prompts for each step is very tedious, so I had some shell scripts built that automate calling cursor-cli with a prompt or a series of prompts. They’re very scrappy right now, but prove their usefulness by doing things like looping several times to fetch the next Bead issue to tackle, then go build it, find bugs, and commit. It’s so simple, but does take the tedious commands out of the process. After the looping runs and does big chunks of work, I always go and manually test what it’s built and potentially even doing a quick check on the code. This means I’m reviewing larger chunks of work, and if there’s something that needs fixing, I’ll just tell an agent to go fix it in that moment, or get an agent to file a new Bead issue for fixing later.
Steve Yegge’s multi-agent orchestration tool, Gas Town, has just been released and that gets me excited for the future of where AI-driven development is going. It’s these shell scripts that automate the tedious work of software development process, but supercharged. Besides his writing being very engaging, I’m looking forward to trying it out since it sounds like another large productivity boost. I don’t think I’m even there yet with getting in enough hours using Beads and multiple agents at the same time though, so baby steps. Over the next few months, Gas Town gaining the ability to use other agent harnesses instead of Claude Code, like Cursor CLI or OpenAI Codex, would be a great addition to get more adoption, and make it even easier for me to try it out.
What next?
If we’ve seen anything from 2025, it’s that these new agentic models and tools have taken over like wildfire. The gains in power and productivity will happen even faster. This year will likely continue that trend and I’m excited to have a front-row seat as it redefines our world.