Drawing from my previous experiences, I started again from scratch. But this time, I was more methodical with the AI coding agent.
I’ve been exploring AI-assisted coding tools and methodologies, and stumbled upon Spec Kit, a toolkit for spec-driven development using AI agents. The idea is that for every feature you want to implement, you go through a specification → planning → tasking → implementation workflow fully supported by the AI agent (read more). As an ex-Product Manager (and a geek for software development methodologies), I loved this idea and was eager to use it.
AI agent shine in this process because they can create extremely detailed specifications that are mostly consistent as the project grows. I’ve seen projects grow and maintaining specs without contradictions can be challenging – but AI agents do this very well. They still miss things, but humans also miss things. And if I ask the AI agent enough questions, and use separate agents to do the validations, the AI agent has the upper hand here.
So, for the past two months, in between meetings and for an hour here and there, I’ve specified, tested, and reviewed the work of the AI agent towards creating the first part of my project – a working backgammon web application and playing server. No manual coding allowed.
GitHub Copilot has advanced rapidly since I last tried it, so I decided to use all the time, with the “Auto” model. I chose this simply because I believe AI should simplify my decisions, not add more. And since it is my default at work, I also made it my default here.
This is what I’ve learned this time around:
- Roughly 60%-70% of the time was spent doing manual validations. Give the AI agent testing tools, and even better, let it generate them. The percentage of time won’t be reduced, but total time will be.
- Follow-up on the first point, the most critical code to review is testing code. And by “tests” I mean anything that the AI agent can use after each coding session to validate their own work – unit, component, E2E tests, etc. For web apps, tools like playwright (or similar) are a must.
- Many people have written already, but I will repeat it: keep the context small. Large contexts confuse AI agents.
- Use source control (this should be obvious). You will have long conversations with the AI agent doing changes in the middle (because we get carried away), which is bad (see previous point), generating code that you will have to throw away. So make sure you always commit your latest working product.
- Customize your AI agent with instructions and prompts. There are many sources for great instructions in GitHub. I’ve used cline/prompts and github/awesome-copilot for inspiration. The two instructions that I can’t do without are thought logging and self-improvement. They are great to help the AI agent know why things were done in the past.
The AI agent also had a funny (and frustrating) behavior. In one instruction, it was told to maintain a chronological changelog. It rebelled: sometimes it would add the changes at the top, sometimes at the bottom, sometimes it will update previous changes (!!!). Tweaked instructions, tested with different models – kept on happening. Not always, but sometimes. Go figure.
Third time’s the charm: this attempt was much more successful than the previous two, and I now have a working backgammon application that actually plays by the rules (most of the time). The next step of the project is creating the backgammon playing agent, which I’m sure will be very fun. So stay tuned :-).