Home Posts I Let AI Write My Pull Requests for a Week: Here’s What...

I Let AI Write My Pull Requests for a Week: Here’s What My Team Said

10
0

I have always treated pull request descriptions as a necessary chore. You write the code, you test it, and then you stare at the blank text box trying to summarize what you did and why anyone should care. I wondered if AI could make that part faster. I had been using GitHub Copilot for code completions for months, and its suggestions sometimes felt like magic. One day I noticed a button inside the GitHub interface that offered to generate a pull request description from the diff. I hesitated, then decided to try an experiment. For an entire workweek, I would let AI write the first draft of every pull request I opened. I would not edit the descriptions before posting, only review them quickly for obvious errors. Then I would wait and see how my team reacted. This is the story of what happened, what my teammates actually said, and why I will never fully hand over the communication part of my code to a machine again.

The Setup: How the AI Drafted My Pull Requests

The tool I used was GitHub’s built-in Copilot feature that generates a pull request summary directly from the branch diff. It appeared as a small sparkle icon next to the description editor. Clicking it would produce a few paragraphs of text describing the changes, sometimes with bullet points about added files, modified functions, and the apparent intent of the code. I worked on a small engineering team of six developers. We reviewed each other’s pull requests thoroughly, often leaving comments about architecture, naming, and potential edge cases. Our culture valued clear communication. A good PR description explained the problem, the chosen solution, any tradeoffs, and how to test the change. A bad description was one that said “Fixed bug” and left the reviewer to decipher the diff. I was about to see which category the AI fell into.

On Monday morning, I finished a small refactoring task: I had extracted a duplicated validation function from two services into a shared utility module. I clicked the sparkle icon, and the AI generated a description that read, “This pull request extracts the common validation logic for user input into a new utility file. The changes remove duplicate code in UserService and OrderService and update the import paths accordingly.” It was accurate and concise. I posted the PR without touching a single word. The first response came within an hour from my teammate, Jess. She wrote, “Nice cleanup. PR description is super clear, thanks for that.” I felt a small thrill. Maybe this experiment would be a triumph.

When the AI Got It Right and Made Me Look Good

The early wins continued through Tuesday. I made a small CSS tweak to fix a misaligned button on the dashboard. The AI wrote, “This PR adjusts the padding and flex properties on the dashboard button to resolve a visual misalignment reported in issue #247. The change affects only the dashboard.css file.” The description was accurate and even referenced the issue number from my branch name, which the AI had picked up automatically. Another teammate, Marcus, approved the PR with a simple “LGTM” and no additional comments. I was starting to believe that AI-generated pull request descriptions were an unalloyed good. They saved me the mental effort of summarizing my own work, and they seemed to give reviewers exactly what they needed.

On Wednesday, I shipped a more complex change: I added a new API endpoint for exporting user data as CSV, complete with rate limiting and a new query parameter for date ranges. The AI generated a long description that listed every changed file, the new endpoint path, and the rate-limiting middleware I had used. It even attempted to describe the testing steps. I scanned it quickly. It looked impressive. I posted it. Within an hour, my team lead, David, left a comment that made my stomach drop. He wrote, “This description says the rate limiting is based on IP address, but the code actually uses the user ID from the auth token. Which is correct? Please clarify.” The AI had hallucinated a key implementation detail. It sounded confident but was wrong, and I had let it through because I had stopped reading carefully after the first few sentences. I had to reply immediately, apologize for the confusion, and explain the actual behavior. The damage was small, but the trust I had placed in the machine was now visibly cracked.

The Hallucination That Wasted a Reviewer’s Time

The low point came on Thursday. I was working on a database migration that added an index to a slow query. The AI generated a description that claimed the migration also “adds a new column for tracking last login timestamps.” No such column existed in my branch. The AI had seen other migration files in the repository that touched login timestamps and had conflated them with my change. A junior developer, Priya, spent about twenty minutes trying to find the column in the schema and asking me questions about it before I realized the description was fiction. I felt embarrassed and a little angry at the tool. Priya was gracious about it, but I could tell she was frustrated. I had wasted her time because I had outsourced the thinking to a model that had no real understanding of my code.

That incident made me examine every AI-generated description I had posted that week with fresh eyes. I went back and re-read them. Several contained small inaccuracies: file names that were slightly wrong, a mention of a “refactored helper function” that was actually deleted, and a description that said I had “improved performance” when the change was purely cosmetic. The AI was a fluent writer, but it was also a serial confabulator. Its summaries were prose, not truth. I had been lulled by the coherence of the sentences into assuming the content was correct. That was a dangerous mistake.

My Team’s Unfiltered Reactions

At the end of the week, I came clean to my team during our Friday retro call. I told them about the experiment and asked for their honest feedback. The reactions were mixed and instructive.

Jess, who had praised the Monday description, said she had suspected something was off because my usual writing style was more casual and included small jokes. The AI descriptions were technically correct in structure but felt impersonal. She said, “It felt like reading a changelog generated by a bot, which it was. I missed your voice.” Marcus admitted he had not really read the descriptions closely. He skimmed them, looked at the diff, and moved on, so the AI’s output neither helped nor hurt him. That was a sobering reminder that many developers barely read descriptions at all.

David, the team lead, was the most critical. He said a pull request description is a communication act, not just a summary of changes. When I let the AI write it, I was signaling that I could not be bothered to explain my own work. He said, “If you can’t take five minutes to write why you made a change, it makes me wonder if you really thought it through.” That stung, but I knew he was right. The description is a sign of respect for the reviewer’s time and intelligence. Automating it entirely felt disrespectful, even if that was not my intention.

Priya, the junior developer, said the incorrect description on Thursday had made her doubt her own understanding of the codebase. She spent time chasing a phantom column because she assumed the PR author (me) had written the description and therefore it must be accurate. Her trust in the PR description as a source of truth was broken. I apologized again, and she accepted, but I could see that the experiment had a real cost on team cohesion.

The Uncomfortable Ethical Question

The experiment raised a question I had not anticipated. Is it ethical to use AI to write your pull request descriptions without telling your reviewers? My team’s answer was a clear no. They expected that the description was written by the person who wrote the code, because that person is accountable for it. If I delegated that accountability to an AI, I was misrepresenting my own work. Several teammates said they would not mind if I used AI to generate a draft that I then heavily edited and verified, because then the final product was still mine. But the raw, unedited output, posted under my name, felt like a small deception. I had not thought of it that way, but once they said it, I could not unsee it. The PR description is a promise that the author has understood and can explain the change. An AI cannot make that promise.

One teammate raised a practical point. If the description is inaccurate, it creates a risk that a future developer will blame the reviewer for missing something, or that the inaccuracy will be cited in a post-mortem. The pull request description becomes part of the project’s history. Hallucinations in that history are worse than no description at all. That made me realize that using AI for documentation without careful review is a form of technical debt, one that can compound when you look back months later and cannot trust your own team’s records.

What I Would Do Differently Now

After the experiment, I changed my approach. I still use the AI’s sparkle icon to generate a rough summary of the changed files. But I treat that summary like a first draft that I must rewrite. I delete the parts that are generic or speculative, I add the actual reasoning behind the change, I mention any tradeoffs I considered, and I write the testing steps myself. The AI’s output is a prompt, not a product. That hybrid approach saves me a few minutes of typing without sacrificing the quality of communication. My teammates have noticed the difference and said my descriptions are better than ever, because I now have a clear structure to fill in rather than a blank page.

I also now tell my team when I have used AI to draft something. A small note at the bottom of the description that says “Initial draft generated by Copilot, then reviewed and edited” is a simple courtesy that maintains trust. It signals that I am still accountable, and it reminds the reviewer to apply extra scrutiny to any AI-sourced claims. The practice has been well received, and I wish I had done it from day one of the experiment.

The Larger Lesson About AI and Developer Communication

My week of letting AI write my pull request descriptions taught me that communication is not just about transmitting information. It is about demonstrating understanding, building trust, and showing respect for your colleagues. An AI can produce a grammatically correct summary of code changes, but it cannot explain why you chose one approach over another, or what you are uncertain about, or which part you want the reviewer to focus on most. Those are the things that make pull request reviews effective and collaborative. When I tried to automate that part of my work, I inadvertently signaled that I did not care about those things, even though I deeply do.

If you are considering using AI for your own pull request descriptions, my advice is to use it sparingly and transparently. Let it generate a scaffold, but fill that scaffold with your own voice and your own reasoning. Never let the machine’s output reach your team without your careful eyes on it. The seconds you save by skipping that review will be lost tenfold when your teammate chases a hallucination or doubts your commitment. Your code is yours. Your words to your team should be too.

LEAVE A REPLY

Please enter your comment!
Please enter your name here