To Fork Or Not To Fork: Lessons From Ubuntu and Debian

Author:Benjamin Mako Hill
Date:Thursday, 28 July 2005
Affiliation:Ubuntu Project / Debian Project


This talk was delivered (in slightly different forms) at Linuxtag 2005 in Karlsruhe, Germany, at Libre Software Meeting in Dijon, France and at What The Hack near Boxtel, the Netherlands.

More information on this talk my other talks is available at



SLIDE 1: Title and Two Forks Picture

Intro Joke: The forks in that picture look like they are getting along quite nice, right? That's going to try to set the mood for the rest of the talk

Ask for hands:

Now Debian people may have specific knowledge and experience. Some more representative than others. I want to avoid falling down rat-holes around particularly issues. Talk to me afterwards.

To Debian folks: Debconf Round-Table.



SLIDE 2: Overview 1

  • Big Questions and Context
    • Why derive?
    • "Fork" is a four-letter word
    • Difficulties of forking and derivation
  • Case Studies
    • Debian and Ubuntu
    • Applicability


SLIDE 3: Overview 2

  • Approaches/Solutions
    • Derivation and Problem Analysis
    • Distributed Source Control
    • Problem Specific Tools
    • Social Solutions

Big Questions and Context


SLIDE 4: World of Debian Customizers

There are at least 115 derived from Debian.

Why Derive?

  • There is a tendency in free software development community around migrating toward super-projects.
  • The work of these communities is becoming increasingly difficult to recreate.
  • Single projects end up being asked to serve the needs of large communities with diverse needs.


SLIDE 5: One Size Does Not Fit All

There are 115 different distributions because there are 129 different needs.

Some distributions may be redundant in their implementation but they are not redundant in their needs. Derivations, in one way or another, must exist to fit a diverse group of needs from a large group.

The result:

  • Derivation (ironically) becomes both increasingly important and increasingly difficult to do (or at least do right).

What Is Forking?


SLIDE 6: Fork is a Four Letter Word

  • Define 4 letter word

  • Define "Fork" (bifurcation in a project)

    Fork are not merely, or even primarily, technical.

    Forks happen on many levels (political, code, social, all of the above).

  • Examples of forks (emacs, gcc, etc)

Difficulties of forking and derivation?

Historical view: "Forks are Bad"

From the Free Software Project Management HOWTO:

The short version of the fork section is, don't do them. Forks force developers to choose one project to work with, cause nasty political divisions, and redundancy of work.

In the best situations: competition, redundancy, tracking outside project in addition. Using poor merge tools

In the worst (common) situations: things get dropped on the floor.

Forking has historically been so bad that a threat can keep the fork from happening.

Case Studies



SLIDE 7: Debian

Debian is, for the purpose of this discussion, very big:

  • The most packages
  • The most volunteers
  • The most derivations
    • Internal
    • External

Everyone here understands Debian so I won't spend too much time on it.



SLIDE 8: Ubuntu

Joke: To Scale Drawing

Ubuntu is a Debian derivation. I'm not going to spend too much time explaining things.

The key points for this conversation:

  • Debian Derivation
  • Regular and predictable releases
  • An emphasis on free software that will maintain the derivability of the distribution.
  • An emphasis on usability and a consistent desktop vision.

Derivation is significant:

  • Code level changes (mostly trivial) to 1300 packages.

Derivation is also different.


SLIDE 9: Ubuntu Derivation Model

(Explain process.)

Mark Shuttleworth has said, "every line of code in our delta that must maintain has a cost. It's in our interests to minimize this."

This means getting code into Debian or -- in whatever way -- making sure that we don't go in different directions.



SLIDE 10: Applicability

While distributions and other large projects are being forced to confront this idea of balancing the benefits of forking and collaboration first, any project of any size can harness this power to make a better distribution right away.

Clearly, the amount of code and people is on a different scale.

Clearly, the solutions that projects of radically different sizes embrace will be different.1

I believe that in the next decade, the free software community is going to see a shift toward a development methodology where forking is not bad. Through this shift and through many other developments in the community, free software will be faster, better, and and ultimately successful on a scale we can only imagine now.

The way this will happen will be different in different projects.


Derivation and Problem Analysis


SLIDE 11: Look at the Type of Problems

Break down the problem into a set of component parts. The example in deriving distributions can be:

  1. Selection of individual pieces of software

    main, universe, multiverse -- e.g., UserLinux

  2. Changes to the way that packages are installed or run (e.g., in a Live CD type environment or using a different installer)

    e.g., Anaconda, a Live CD -- also low impact

  3. Configuration of different pieces of software

    Configuration changes can be handled different because they can be organized through a configuration system framework (e.g., Debconf, cfengine). CDDs approach this

  4. Changes made to the actual software package (made on the level of changes to the packages code);

    Most invasive.

By breaking down the problem in this way. Debian derivers have been able to approach derivation in ways that focus energy on the less intrusive problems first.

Smaller teams can limit themselves to less intrusive types of changes to be successful.

Distributed Source Control


SLIDE 12: Distributed Version Control

5-minute intro to distributed version control

Distributed version control aims to solve a number of problems introduced by CVS and alluded to above by:

  • Allowing people to work disconnected from each other and to sync with each other, in whole or in part, in an arbitrary and ad-hoc fashion.
  • Allowing deltas to be maintained over time.

Recently, Linus Torvalds said:

In fact, one impact BK has had is to very fundamentally make us (and me in particular) change how we do things. That ranges from the fine-grained changeset tracking to just how I ended up trusting sub-maintainers with much bigger things, and not having to work on a patch-by-patch basis any more

Distributed systems include Arch, TLA, Bazaar, Bazaar-NG, SVK, Darcs, Monotone, Bitkeeper, others.

While Ubuntu uses this heavily to maintain it's changes -- and will use it more in the future, this is even more useful for small projects.

Distributed version control allows people to maintain deltas over time.

Problem Specific Tools


SLIDE 13: Problem Specific Tools

Because there are a number of projects associated with branching a distribution (e.g., different patch system, upstream vs. non-upstream, etc), Canonical is building a front-end to Arch/VCS specifically designed for distributions.

It's called HCT.

Social Solutions


SLIDE 14: Social Solutions (Picture of Hotel Key)

"Technical Solution to a Social Problem" -- unknown

Things we've run into so far:

  • Keeping changelog entries
  • Working in "the right way" with projects and trying to work on their terms.
  • Maintainer field issues (giving credit but not giving too much credit.
  • Maintaining a good and open relationship with the project
  • Constructive engagement

This is the hard part and this is where a derivation is made or broken. It is has where Ubuntu has suffered most.



SLIDE 15: Conclusions

On pragmatic grounds, Free Software succeeds because it harnesses the power of collaboration toward software production in a very deep and meaningful way.

Through allowing people to share while diverging, free software will gain a benefit that proprietary competitors can't emulate.