Cooperation in Parallel: Lessons From Ubuntu and Debian

Author:Benjamin Mako Hill
Contact:mako@atdot.cc
Date:Monday, 26 Nov 2007 19:00
Affiliation:MIT / Ubuntu Project / Debian Project

Note

This talk was given at Kibepipe in Ljubljana, Slovenia.

It is based on a talk delivered (in slightly different forms) at Linuxtag 2005 in Karlsruhe, Germany, at Libre Software Meeting in Dijon, France and at What The Hack near Boxtel, the Netherlands.

More information on this talk my other talks is available at http://mako.cc/

Introduction

Note

SLIDE 1: Title and Two Forks Picture

Ask for hands:

Overview

  • Big Questions and Context
    • Derivation:Why derive?
    • Forking: Benefits and Difficulties
  • Case Studies: Debian and Ubuntu
  • The Answer: Cooperation in Parallel: Joint work in groups working toward divergent ends.
  • Approaches/Solutions
    • Strategic Divergence
    • Distributed Source Control
    • Problem Specific Tools
    • Social Solutions

Big Questions and Context

Note

SLIDE 2: World of Debian Customizers

There are over 200 distributions derived from Debian.

Why Derive?

  • The work of these communities is becoming increasingly difficult to recreate;
  • Single projects end up being asked to serve the needs of large communities with diverse needs;

There are 200 different distributions because there are 200 different needs.

Some distributions may be redundant in their implementation but they are not redundant in their needs. Derivations, in one way or another, must exist to fit a diverse group of needs from a large group.

The result:

  • Derivation (ironically) becomes both increasingly important and increasingly difficult to do (or at least do right).

We're seeing it in distributions first, because distributions are bigger and more complex, but we're seeing it other places as well.

What Is Forking?

Note

SLIDE 3: Fork is a Four Letter Word

  • Define 4 letter word

  • Define "Fork" (bifurcation in a project)

    Fork are not merely, or even primarily, technical;

    Forks happen on many levels (political, code, social, all of the above);

  • Examples of forks (emacs, gcc, etc)

Difficulties of forking and derivation?

Historical view: "Forks are Bad"

From the Free Software Project Management HOWTO:

The short version of the fork section is, don't do them. Forks force developers to choose one project to work with, cause nasty political divisions, and redundancy of work.

In the best situations: competition, redundancy, tracking outside project in addition. Using poor merge tools

In the worst (common) situations: things get dropped on the floor.

Forking has historically been so bad that a threat can keep the fork from happening.

Case Studies

Debian

Note

SLIDE 4: Debian

Debian is, for the purpose of this discussion, very big:

  • The most packages
  • The most volunteers
  • The most derivations
    • Internal
    • External

Everyone here understands Debian so I won't spend too much time on it.

Ubuntu

Note

SLIDE 5: Ubuntu

Joke: To Scale Drawing

Ubuntu is a Debian derivation. I'm not going to spend too much time explaining things.

The key points for this conversation:

  • Debian Derivation
  • Regular and predictable releases
  • An emphasis on free software that will maintain the derivability of the distribution.
  • An emphasis on usability and a consistent desktop vision.

Derivation is significant:

  • Code level changes (mostly trivial) to ~1300 packages.

Derivation is also different.

Note

SLIDE 6: Ubuntu Derivation Model

(Explain process.)

Mark Shuttleworth has said, "every line of code in our delta that must maintain has a cost. It's in our interests to minimize this."

This means getting code into Debian or -- in whatever way -- making sure that we don't go in different directions.

Cooperation in Parallel

Note

SLIDE 7: Cooperation in Parallel

This new model of cooperative work, cooperation in parallel (CIP), describes joint work in groups working toward divergent ends. The result is that groups working toward separate goals can collaborate and contribute to each others projects in ways that strengthen and bolster their individual projects.

Criteria:

Note

SLIDE 8: Resonant Divergence

The goal of CIP, when done right, is what I call Resonant Divergence: people achieve much more than they would have before.

The trick, in resonant divergence, is to reduce the cost of maintaining a delta. This is done in a variety of ways, some of which we are still figuring out. These include:

Approaches/Solutions

Strategic Divergence

Note

SLIDE 9: Strategic Divergence

Break down the problem into a set of component parts. The example in deriving distributions can be:

  1. Selection of individual pieces of software

    main, universe, multiverse -- e.g., UserLinux

  2. Changes to the way that packages are installed or run (e.g., in a Live CD type environment or using a different installer)

    e.g., Anaconda, a Live CD -- also low impact

  3. Configuration of different pieces of software

    Configuration changes can be handled different because they can be organized through a configuration system framework (e.g., Debconf, cfengine). CDDs approach this

  4. Changes made to the actual software package (made on the level of changes to the packages code);

    Most invasive.

By breaking down the problem in this way. Debian derivers have been able to approach derivation in ways that focus energy on the less intrusive problems first.

Smaller teams can limit themselves to less intrusive types of changes to be successful.

Distributed Source Control

Note

SLIDE 10: Distributed Version Control

5-minute intro to distributed version control

Distributed version control aims to solve a number of problems introduced by CVS and alluded to above by:

  • Allowing people to work disconnected from each other and to sync with each other, in whole or in part, in an arbitrary and ad-hoc fashion.
  • Allowing deltas to be maintained over time.

Recently, Linus Torvalds said:

In fact, one impact BK has had is to very fundamentally make us (and me in particular) change how we do things. That ranges from the fine-grained changeset tracking to just how I ended up trusting sub-maintainers with much bigger things, and not having to work on a patch-by-patch basis any more

Distributed systems include Arch, TLA, Bazaar, Bazaar-NG, SVK, Darcs, Monotone, Bitkeeper, others.

While Ubuntu uses this heavily to maintain it's changes -- and will use it more in the future, this is even more useful for small projects.

Distributed version control allows people to maintain deltas over time.

Merge Tools

Note

SLIDE 11: Merge Tools

  • Merging still has a high cost
  • Merge modes address this

Problem Specific Tools

Note

SLIDE 12: Problem Specific Tools

Because there are a number of projects associated with branching a distribution (e.g., different patch system, upstream vs. non-upstream, etc), Canonical is building a front-end to Arch/VCS specifically designed for distributions.

I've built my own system for documents that solves the particular problems of document management.

Social Solutions

Note

SLIDE 13: Social Solutions

"Technical Solution to a Social Problem" -- unknown

Things we've run into so far:

  • Keeping changelog entries
  • Working in "the right way" with projects and trying to work on their terms.
  • Maintainer field issues (giving credit but not giving too much credit.
  • Maintaining a good and open relationship with the project
  • Constructive engagement

This is the hard part and this is where a derivation is made or broken. It is has where Ubuntu has suffered most.

Applicability

Note

SLIDE 14: Applicability

While distributions and other large projects are being forced to confront this idea of balancing the benefits of forking and collaboration first, any project of any size can harness this power to make a better distribution right away.

Clearly, the amount of code and people is on a different scale.

Clearly, the solutions that projects of radically different sizes embrace will be different.

I believe that in the next decade, the free software community is going to see a shift toward a development methodology where forking is not bad. Through this shift and through many other developments in the community, free software will be faster, better, and and ultimately successful on a scale we can only imagine now.

The way this will happen will be different in different projects.

Conclusions

On pragmatic grounds, Free Software succeeds because it harnesses the power of collaboration toward software production in a very deep and meaningful way.

Through allowing people to share while diverging, free software will gain a benefit that proprietary competitors can't emulate.