On Grimoires and Dark Factories

A little over a month ago, I built and launched Grimoire CLI as an open source tool to give my AI agents a shared “Second Brain” they could use. I had originally started to use Obsidian for this, and I still do actually happily use Obsidian for organizing my own personal notes (mostly written by me, not agents). However, for my own personal taste, I found the Obsidian MCP to be a bit too slow, and not working as well as I’d like with my Claude Code skills generally. I decided this was a good excuse to build personal software that was made exactly to the feature set I wanted, and that the act of building it would give me some interesting insights into how far AI coding could get pushed. This post is mostly about what I learned from building the Grimoire.

What is the Grimoire, and how did I build it?

Grimoire CLI bootstraps a folder on your local filesystem (usually ~/.grimoire) in which agents can write Markdown files for whatever you want. I use it for Deep Research results, organizing my personal tasks/calendar/reviews, tracking my task plate, organizing all documents I write, and more. The CLI provides commands to create/update/delete metadata around these files, and Claude Skills that teach my agents how to use these CLI commands. I also can search my Grimoire files by tag and by vector similarity search, all managed by a local SQLite database.

To build it, I leveraged my AI Production Coding Workflow with one key change: After I finished the low level design phase (about 4 hours from ideation to complete and review requirements, high level design, low level design, and then to generate a task graph) I told Claude Code essentially to “complete this task graph, running all parallelizable tasks with /implement-task-code, and then for each batch looping /review-code in subagents and addressing feedback until code reviews pass. Then commit changes to a branch and continue until all tasks are complete.” This differs from my production workflow in that I never looked at the code, and to this day I still haven’t. Once I was done with specifications, I let Claude Opus take the wheel and trusted my pre-commit hook and skill definitions to get the job done right. It was complete the same day I started on the design.

The final result was ~18,500 lines of Go code, of which ~5,500 lines are implementation code and ~13,000 lines are unit and integration tests.

How well did it work?

Overall, I’m actually quite happy with the quality of what I built. I’ve been using the Grimoire with my personal agents extensively at work and at home for over a month, with hundreds of indexed documents at work especially. It’s created a positive feedback loop for some skills like design agents and deep research agents, because they are both encouraged to search existing Grimoire documents by tag and by vector similarity to leverage previous work. At the end of the day, it’s useful software and solved my personal use case well.

It was not, however, flawless. The installation integration tests would mess with my local environment, an issue that probably would have been caught with code review. Gaps in the specification led to large documents not being properly vector indexed, requiring further feature iteration - though once again, I spent my time only on new requirements and design, letting Claude Code use my production skills to fully drive the actual implementation without human review. Since that first major issue, it’s worked well ever since.

What did I learn?

Dan Shapiro (small world fun fact, my first ever boss in a tech job when I was freshman intern) has written about the Dark Factory pattern. Building the Grimoire was, without me explicitly thinking about it this way, temporarily living at somewhere between Level 4 and Level 5 on his scale. At work, writing production code, I’m pretty clearly living at Level 3 - as discussed in my production coding workflow post, I’m still reviewing batches of code, though the batches are quickly getting bigger and bigger as models and skills have improved.

Overall, living temporarily at the higher levels of this pattern went better than I expected from the outset. My expectation was that there would be a lot more bugs to iron out, that the code review process was too important to trust to an automated code review cycle. While I don’t know if I would release Grimoire CLI to the world as production software that I take money for and carry a pager for, it’s fit for purpose, and the total time spent - debugging and iterations included - was far less than I expected going in.

The Future

While this result does not have me immediately running to Dark Factory patterns at my job, I’ve seen enough to believe that this type of pattern is going to be reliable enough to use for production code at some point in the future. Given recent advances in model quality and available skills, that day could come very soon indeed.

I think everyone should give this way of building software a try. Take a personal software problem you’d like to see, write up the specifications and requirements in plain English, and throw a couple 5 hour blocks of Claude Code tokens at the problem. It’s worth trying this out because it’s possibly the future, and like AI coding in general, the best way to see your way there is to try it.