CODEX RFC/Proposal

Benjamin (Mako) Hill

Version 0.1.1 | Sun, 29 Sep 2002 16:51:28 -0400

The Problem

There is no good, free, version control system for documents or documentation. The creation of any piece of literature, especially one that involves multiple authors working on the document concurrently, involves adding, removing, and changing text constantly. There is no good way to keep track of these changes in the way that there is for source code. Existing solutions are proprietary and/or kludgey.

The Consequences

Current Solutions

Current solutions seem to fall into a couple major categories. Each has his own benefits and shortcomings. Some of these include:

My Solution

I propose a robust, free version control system specifically designed for working with documents--especially in a asynchronous collaborative environment. I'll refer to this (non-existent) system as CODEX, or the Collaborative Online Documentation (D)ifference Extractor. The software will be free software and will be distributed under the terms of the GNU GPL. The core engine will be written in either Perl or Python.

Since my software will be free software, I will seek to not duplicate effort where-ever possible. I think that building off a system like CVS or subversion will be the logical first step. Since a diff will show every changed line, it will by default show every changed word and piece of white space. A contextual diff (which both CVS and subversion can provide) will include even more information. Either of these programs will be able to provide information useful for resolving conflicts and will provide the ability to commit, checkout, release or watch a project. They also both provide servers with several methods to use interface over a LAN or the Internet. A future version of subversion will allow for different client-side diff programs.

I do NOT want this project to involve creating a new word processor. There are more than enough of them, most of them bad. I would almost certainly create another bad one. I want my software to be able to work with many other word processors so that it might be picked up an incorporated as a back-end to existing pieces of software.

Taking this method, my software will act as interface between the user (or their word processing software) and the VCS.

Along the lines I'm considering right now, the software might:

To accomplish this, my software will actually need to be two distinct pieces.

In this way, what I aim to create in this project will be more of a framework for creation, transmission, and handling of this type of data. I will aim to define this framework and get example code written as a proof of concept. Hopefully, with this out there, other developers will be able to contribute and expand the scope, and usability of the software.

This diagram shows how some of the internals of the CODEX engine might work.

In creating my proof of concept this year, I'll aim to create (in this order):

This is what I have so far and a lot it is right off the top of my head. This is a RFC. Please email me back at

Mako Hill
Last modified: Fri Sep 27 18:55:16 EDT 2002