AI code provenance: how to prove you own your app's code
If you build with AI and never document the human authorship, you can't prove copyright when it matters — at acquisition diligence, in a competitor-cloning dispute, or when you want to enforce your code against a copier. The fix is cheap if you do it from day one and very expensive to retrofit. Here's the documentation trail to build proactively.
Why provenance matters
The US Copyright Office's January 2025 guidance (Copyright and Artificial Intelligence, Part 2) clarifies that copyright requires 'sufficient human authorship.' Pure AI output — generated from a prompt with no further human modification — is not copyrightable. The implications for vibe-coded apps:
- If your code is entirely AI-generated and unmodified, you may not own it.
- If a competitor copies your code, you may have no legal grounds to stop them.
- If an acquirer asks 'is this code yours?' the honest answer requires documentation, not just intuition.
- If your investors ask the same question during due diligence, same problem.
You don't have to abandon AI assistance. The Copyright Office position is that AI tools are like any other tool — Photoshop is a tool, not an author. What matters is whether a human exercised meaningful creative control. The bar is real but achievable; you mostly need evidence that you did.
What 'sufficient human authorship' looks like
The Copyright Office hasn't published a bright-line test, but multiple post-2024 cases and rulings cluster around these factors:
- Human-authored architecture decisions — you decided what the app does, what features it has, what the data model is. Documented in a spec or product doc.
- Iterative refinement — you didn't accept the AI's first output; you iterated, rejected, adjusted, rewrote. Documented in conversation history and commit log.
- Substantial human modification — non-trivial changes to AI output. Documented by diff: what came out of the AI vs. what shipped to production.
- Selection and arrangement — even if individual components are AI-generated, your selection and integration of them shows authorship. Documented in code review notes.
- Original creative additions — anything you wrote yourself that's central to the work. Documented by attribution in commits.
The provenance trail to build
1. Save your prompts
Every significant AI interaction should be saved somewhere durable. The chat history in Cursor / Lovable / Bolt / Claude / ChatGPT counts if it's stored on your account. Better: export weekly to your repo (a /docs/prompts/ folder or similar). This shows the human direction behind the output.
2. Commit with attribution
Use commit messages to document what the human contributed: 'add checkout flow (architecture: human, code: ai-generated, review: human)'. Some teams use Git Trailers (Co-authored-by: AI <ai@example.com>) to formally distinguish. The exact format is less important than consistency.
3. Maintain an authorship log
For significant features, keep a one-paragraph note: what problem you were solving, what approach you chose, what alternatives you rejected, what the AI did. This is exactly the kind of evidence that demonstrates 'human creative control.'
4. Generate an SBOM
A Software Bill of Materials lists every dependency in your project with version, license, and source. CycloneDX and SPDX are the two standard formats. CISA published SBOM guidance in 2023 making this expected at acquisition diligence in 2026. We have a dedicated article on SBOM for AI-built apps.
SBOM for AI-built apps — full guide →5. Document AI-specific risks
Maintain an /AI_NOTICE.md file in your repo describing: which AI tools you use, which parts of the codebase are heavily AI-assisted, any known concerns (potential copyleft contamination, etc.). This is candor — and candor at acquisition diligence converts into trust, not penalty.
Acquisition diligence — what acquirers actually ask
Based on representations-and-warranties insurance underwriting questionnaires (the most rigorous form of diligence) we've seen since AI-built apps became acquisition targets:
- What percentage of the codebase was AI-generated vs hand-written?
- Which AI tools were used? On what dates? Under what license terms (most AI tools have specific IP terms)?
- Is there a chain of custody — prompts, responses, modifications — that demonstrates human authorship?
- Are you aware of any open-source code in your bundle? If so, which licenses, and is your use compliant?
- Have you scanned your bundle for copyleft contamination (AGPL, GPL, LGPL)?
- Do you have an SBOM?
- Has any third party claimed your code copies theirs? Any pending or threatened disputes?
An acquirer wants to know they're buying something defensible. A clear authorship narrative + SBOM + scan reports + clean licenses + a transparent AI_NOTICE.md beats an evasive answer with no documentation, even if the underlying code is identical. Investors fund teams that can answer the question; they don't fund teams who can't.
If you're already deep into a project with no provenance
You can backfill, partially. It's not as clean as starting fresh but it's better than nothing.
- Write a retrospective AI_NOTICE.md describing which tools you used, in what time periods, on which features. Be honest. Acquirers see through embellishment.
- Pull conversation history from your AI tools (Cursor, Claude, ChatGPT all support export). Archive it in /docs/prompts/.
- Re-review the most critical parts of the codebase yourself. Make substantive human changes — refactor, add tests, improve error handling. Document the changes in commit messages.
- Generate a current SBOM with cyclonedx-cli or syft.
- Run a copyleft scan against your bundle to surface any inherited licenses you weren't aware of. Comply Code does this; so do FOSSA, Snyk, and other commercial tools.
Common misconceptions
- 'AI-generated code can't be copyrighted at all' — wrong. AI-assisted code with human creative control IS copyrightable per current US guidance.
- 'I should remove all AI assistance' — wrong. The bar is human creative direction, not zero AI. Use AI as a tool, document your direction.
- 'My terms of service say I own everything I generate' — partly wrong. The TOS gives you what the provider can give you. The provider can't grant you copyright that doesn't legally exist.
- 'GitHub Copilot's TOS gives me copyright' — wrong. Microsoft's TOS gives you a license to use, plus IP indemnification. It doesn't (and can't) confer copyright that hasn't vested in a human author.
Common questions.
Is my entire app uncopyrightable if I built it with AI?
Almost certainly not. Per the Copyright Office's 2025 guidance, AI-assisted work with sufficient human authorship IS copyrightable. The question is which specific elements. Your architecture decisions, your selection of features, your iterative refinement, your modifications — these are typically copyrightable contributions. The raw AI output, unmodified, may not be. The defensible posture is documenting the human contributions.
Should I tell investors I built with AI?
Yes, candidly. Investors in 2026 expect AI-assisted development; hiding it is worse than disclosing it. The conversation acquirers and investors actually want is 'how do you maintain IP defensibility?' — which is the topic this article addresses. Have the documentation ready and the conversation becomes a confidence-builder, not a red flag.
What about AI tools that claim to give you full IP rights?
They can grant you any license they have. They can't grant you copyright protection that doesn't legally vest in a human author. The 2025 Copyright Office position applies to copyright registration regardless of any agreement. The agreement is useful for contractual purposes (e.g., the provider won't claim rights against you) but doesn't create copyright where none exists.
How does this interact with GitHub Copilot's IP indemnification?
Microsoft's GitHub Copilot business plans indemnify you against IP claims from third parties for code Copilot suggests, subject to filters being enabled. This protects you from being sued by a copyright holder whose code Copilot reproduced — but it doesn't grant you ownership of your output. Both questions matter; they're different.
Is there a simple test for 'sufficient human authorship'?
No bright-line test exists yet. Cases since 2024 have clustered around evidence of human creative control: prompts that show direction, iterative refinement, selection and arrangement, substantive modification. The Copyright Office's case-by-case approach means the documentation matters more than a specific percentage of human-vs-AI code.
What's the minimum viable provenance documentation?
An AI_NOTICE.md in your repo listing the tools you use, conversation-history archives where possible, commit messages that distinguish AI-assisted vs hand-written work, and an SBOM. That's about 2 hours of setup and 10 minutes per week of upkeep. Worth doing on day one.