My Geekhouse Bike Frame

In 2011, Mika and I bought in big at the Boston Red Bones party’s charity raffle — supporting MassBike and NEMBA — and came out huge. I won $500 off a custom frame at Geekouse Bikes.

For years, Mika and I have been planning to do the Tour d’Afrique route (Capetown to Cairo), unsupported, on bike. People that do this type of ride sometimes use an expedition touring frame. I worked with Marty Walsh at Geekhouse to design a bike based on this idea. The concept was a rugged steel touring frame, built for my body and comfortable over long distances, with two quirks:

  1. It’s designed for 26 inch mountain bike wheels and mountain bike components to ensure that the bike is repairable with parts from the kinds of cheap mountain bikes that can be found almost everywhere in the world.
  2. It includes S&S torque couplers that let me split the frame in half to travel with the bike as standard luggage.

As our pan-Africa trip kept getting pushed back, so did the need for the bike. Last week, I finally picked up the finished bike from Marty’s shop in Boston. It is gorgeous. I absolutely love it.

Picture of Geekhouse frame (1)Picture of Geekhouse frame (2)Picture of Geekhouse frame (4) Picture of Geekhouse frame (3)

I’m looking forward to building up the bicycle over the next couple months and I’ll post more pictures when it’s finished. I am blown away by Marty’s craftsmanship and attention to detail. I am psyched that his donation made this bike possible and that I was able to get the frame while helping cycling in Massachusetts!

“When Free Software Isn’t Better” Talk

In late October, the FSF posted this video of a talk called When Free Software Isn’t (Practically) Better that I gave at LibrePlanet earlier in the year. I noticed it was public when, out of the blue, I started getting both a bunch of positive feedback about the talk as well as many people pointing out that my slides (which were rather important) were not visible in the video!

Finally, I’ve managed to edit together a version that includes the slides and posted it online and on Youtube.

The talk is very roughly based on this 2010 article and I argue that, despite our advocacy, free software isn’t always (or even often) better in practical terms. The talk moves beyond the article and tries to be more constructive by pointing to a series of inherent practical benefits grounded in software freedom principles and practice.

Most important to me though, the talk reflects my first serious attempt to bring together some of the findings from my day job as a social scientist with my work as a free software advocate. I present some nuggets from my own research and talk about about what they mean for free software and its advocates.

In related news, it also seems worth noting that I’m planning on being back at LibrePlanet this March and that the FSF annual fundraiser is currently going on.

Settling in Seattle

Seattle from the airI defended my dissertation three months ago. Since then, it feels like everything has changed.

I’ve moved from Somerville to Seattle, moved from MIT to the University of Washington, and gone from being a graduate student to a professor. Mika and I have moved out of a multi-apartment cooperative into into a small apartment we’re calling Extraordinary Least Squares. We’ve gone from a broad and deep social network to (almost) starting from scratch in a new city.

As things settle and I develop a little extra bandwidth, I am trying to take time to get connected to my community. If you’re in Seattle and know me, drop me a line! If you’re in Seattle but don’t know me yet, do the same so we can fix that!

Doctor of Philosophy

On Wednesday, I successfully defended my PhD dissertation in front of a ridiculously packed house at the MIT Media Lab. I am humbled by the support shown by the MIT Sloan, Media Lab, and Harvard communities. Earlier today, I finished up paperwork and submitted my archival copies. I’m done.

Although I’ve often heard PhDs described as emotional roller coasters, I feel enormously blessed in that I honestly can’t relate. My eight years at MIT and Harvard have been almost universally positive and I have learned and grown indescribably. As excited as I am about my next chapter at the University of Washington, I’m going to miss my life here. Deeply.

My dissertation was three essays on volunteer mobilization in peer production. Once I have a chance to catch up and recover, I’ll be posting the previously unpublished pieces. The Remixing Dilemma was included in the dissertation and is already online. The Media Lab AV team shot professional video of the talk. When I get a copy of the video, I’ll post that too.

But because I think it’s important, I’ve formatted and published the acknowledgments section of the dissertation today. Although there are too many folks to thank, I’ve highlighted the contributions of my co-authors, and friends, Aaron Shaw and Andrés Monroy Hernández and my almost unbelievably incredible group of advisors: Eric von Hippel, Yochai Benkler, Mitch Resnick, and Tom Malone.

The Wikipedia Gender Gap Revisited

In a new paper, recently published in the open access journal PLOSONE, Aaron Shaw and I build on new research in survey methodology to describe a method for estimating bias in opt-in surveys of contributors to online communities. We use the technique to reevaluate the most widely cited estimate of the gender gap in Wikipedia.

A series of studies have shown that Wikipedia’s editor-base is overwhelmingly male. This extreme gender imbalance threatens to undermine Wikipedia’s capacity to produce high quality information from a full range of perspectives. For example, many articles on topics of particular interest to women tend to be under-produced or of poor quality.

Given the open and often anonymous nature of online communities, measuring contributor demographics is a challenge. Most demographic data on Wikipedia editors come from “opt-in” surveys where people respond to open, public invitations. Unfortunately, very few people answer these invitations. Results from opt-in surveys are unreliable because respondents are rarely representative of the community as a whole. The most widely-cited estimate from a large 2008 survey by the Wikimedia Foundation (WMF) and UN University in Maastrict (UNU-MERIT) suggested that only 13% of contributors were female. However, the very same survey suggested that less than 40% of Wikipedia’s readers were female. We know, from several reliable sources, that Wikipedia’s readership is evenly split by gender — a sign of bias in the WMF/UNU-MERIT survey.

In our paper, we combine data from a nationally representative survey of the US by the Pew Internet and American Life Project with the opt-in data from the 2008 WMF/UNU-MERIT survey to come up with revised estimates of the Wikipedia gender gap. The details of the estimation technique are in the paper, but the core steps are:

  1. We use the Pew dataset to provide baseline information about Wikipedia readers.
  2. We apply a statistical technique called “propensity scoring” to estimate the likelihood that a US adult Wikipedia reader would have volunteered to participate in the WMF/UNU-MERIT survey.
  3. We follow a process originally developed by Valliant and Dever to weight the WMF/UNU-MERIT survey to “correct” for estimated bias.
  4. We extend this weighting technique to Wikipedia editors in the WMF/UNU data to produce adjusted estimates of the demographics of their sample.

Using this method, we estimate that the proportion of female US adult editors was 27.5% higher than the original study reported (22.7%, versus 17.8%), and that the total proportion of female editors was 26.8% higher (16.1%, versus 12.7%). These findings are consistent with other work showing that opt-in surveys tend to undercount women.

Overall, these results reinforce the basic substantive finding that women are vastly under-represented among Wikipedia editors.

Beyond Wikipedia, our paper describes a method online communities can adopt to estimate contributor demographics using opt-in surveys, but that is more credible than relying entirely on opt-in data. Advertising-intelligence firms like ComScore and Quantcast provide demographic data on the readership of an enormous proportion of websites. With these sources, almost any community can use our method (and source code) to replicate a similar analysis by: (1) surveying a community’s readers (or a random subset) with the same instrument used to survey contributors; (2) combining results for readers with reliable demographic data about the readership population from a credible source; (3) reweighting survey results using the method we describe.

Although our new estimates will not help us us close the gender gap in Wikipedia or address its troubling implications, they give us a better picture of the problem. Additionally, our method offers an improved tool to build a clearer demographic picture of other online communities in general.

Iceowl’s Awesome New Icon

If you’re a Debian user, you are probably already familiar with some of the awesome icons for IceWeasel (rebranded Mozilla Firefox), IceDove (rebranded Mozilla Thunderbird) and IceApe (rebranded Mozilla SeaMonkey).

iceweasel_icon-200pxicedove_icon-200px    iceape_icon-200px

I was pretty ambivalent about the decision to rebrand Firefox until I saw some of proposed the IceWeasel icons which — in my humble opinion — were just too cute, and awesome, to pass up.

iceweasel-old

Until very recently however, IceOwl (rebranded Mozilla Sundbird) had no such awesome icon. Quite a while ago, I filed bug #658664 in Debian complaining that “iceowl does not include awesome icy owl icons.” I wrote:

I was extremely disappointed when I installed Iceowl and discovered that it does not ship with an awesome logo or icons showing a picture of an “IceOwl.” Instead, it seems to be represented by picture of a (boring) paper calendar which is very generic and not awesome at all.

IceWeasel, IceDove, and IceApe each include extremely awesome logos/icons that have really cool looking white illustrations of “icy” weasels, doves, and apes. IceOwl needs a similarly awesome logo to use as its icon.

This bug seems particularly egregious because owls actually live in icy climates and come in white versions! For example:

https://commons.wikimedia.org/wiki/File:Snowy_Owl_-_Schnee-Eule.jpg

While illustrators need to imagine what an “ice ape” or “ice weasel” might look like, there is no such need for imagination in the case of an ice owl!

As far as I’m concerned, this bug should be release critical. Hopefully, someone will upload a patch quickly!

Finally, after many months of all of us suffering in silence, Nick Morrott came along and fixed the bug with the creation of this new, incredibly awesome, icy owl logo!

iceowl_icon-350px

Job Market Materials

Last year, I applied for academic, tenure track, jobs at several communication departments, information schools, and in HCI-focused computer science programs with a tradition of hiring social scientists.

Being “on the market” — as it is called — is both scary and time consuming. Like me, many candidates have never been on the market before. Candidates are asked to produce documents in genres — e.g., cover letters, research statements, teaching statements, diversity statements — that most candidates have never written, read, or even heard of.

Candidates often rely on their supervisors for advice. I did so and my advisors were extremely helpful. The reality, however, is that although candidates’ advisors may sit on hiring committees, most have not been on candidates’ side of job market themselves for years or even decades.

The Internet is full of websites, like the academic jobs wiki, Academia StackExchange, and the Chronicle of Higher Education forums for people on the market. Confused and insecure candidates ask questions of the form, “Does blank matter?” and the answer is usually, “Doing/having blank may help/hurt, but it is only one factor of many.” The result is that candidates worry about everything. Then they worry about what they should be worrying about, but are not.

The most helpful thing, for me, was to read and synthesize the material submitted by recent successful job market candidates. For example, Michael Bernstein — a friend from MIT, now at Stanford — published his research and teaching statements on his website and I found both useful as I prepared mine. That said, I was surprised by how little material like this I could find on the web. For example, I could not find any examples of recent job market cover letters from successful candidates in fields close to mine.

So to help fill this gap, I am publishing all of my job market material. I’ve posted both the PDFs of the material I submitted as well as the LaTeX templates I used to generate the documents in my packet. My packet included:

  • Research Statement (TeX) — A description of my research to date and my current trajectory. Following a convention I have seen others follow, I “cited” my own work (but only my work) to form a a curated bibliography of my own publications and working papers.
  • Teaching Statement (TeX) — A two-page description of my approach to teaching, a list of my teaching experience, and a description of sample courses.
  • Diversity Statement (TeX) — A description of how I think about diversity and how I have, and will, engage with it in my teaching and research.
  • Cover Letter (TeX) — Each application I sent had a customized cover letter. I wrote mine on MIT letter head. Since each letter is different, I have published the letter I sent to the department that I took the job in (UW Communication). Because my new department did not request research and teaching statements, the cover letter includes material taken from both. For departments that requested separate statements, I limited myself to a shorter (1.5 pages) version of the letter with a similar structure.
  • Writing Samples — I included three or four of my papers to every job I applied to. The selection of articles changed a bit depending on the department but I included at least one single-authored paper in each packet.
  • Letters of Recommendation — Because I didn’t write these and haven’t seen them, I can’t share them. I requested letters from my four committee members: Eric von Hippel, Yochai Benkler, Mitch Resnick, and Tom Malone.
  • Curriculum Vitae (TeX) — I have tried to keep my CV up-to-date during graduate school. I keep my CV in git and have a little CGI script automatically rebuild the published version whenever an update is committed.

I hope people going “on the market” will find these materials useful. Obviously, you should not copy or reuse the text of any of my material. It is your application, after all. That said, please do help yourself to the formatting and structure.

Finally, I would encourage anyone who builds on my material to republish their own material to help other candidates. If you do, I’d appreciate a link back or comment on this blog post so that my readers can find your improvements.

Resurrecting Debian Seattle

seattle_skyline_night      debian_logo

When I last lived in Seattle, nearly a decade ago, I hosted the “Debian Seattle Social” email list. When I left the city, the mailing list eventually fell victim to bitrot.

When Allison Randall asked me about the list a couple months ago, I decided that moving back to Seattle was a good excuse to work with Allison and some others to revive the community. Toward that end, I’ve put up a little website and created a new mailing list. It’s hosted on Alioth this time which will be reliable than me. Since it has been years, we have not moved over the old subscriber list so you’ll have to sign up again if you were on it before.

If you’re a Debian developer or user and you’d like to hear about infrequent Debian social gatherings in the Seattle area, you should sign up on the list!

London and Michigan

I’ll be spending the week after next (June 17-23) in London for the annual meeting of the International Communication Association where I’ll be presenting a paper. This will be my first ICA and I’m looking forward to connecting with many new colleagues in the discipline. If you’re one of them, reading this, and would like to meet up in London, please let me know!

Starting June 24th, I’ll be in Ann Arbor, Michigan for four weeks of the ICPSR summer program in applied statistics at the Institute for Social Research. I have been wanting to sign up for some of their advanced methods classes for years and am planning to take the opportunity this summer before I start at UW. I’ll be living with my friends and fellow Berkman Cooperation Group members Aaron Shaw and Dennis Tennen.

I would love to make connections and meet people in both places so, if you would like to meet up, please get in contact.

The Cost of Inaccessibility at the Margins of Relevance

I use RSS feeds to keep up with academic journals. Because of an undocumented and unexpected feature (bug?) in my (otherwise wonderful) free software newsreader NewsBlur, many articles published over the last year were marked as having been read before I saw them.

Over the last week, I caught up. I spent hours going through abstracts and downloading papers that looked interesting or relevant to my research. Because I did this for hundreds of articles, it gave me an unusual opportunity to reflect on my journal reading practices in a systematic way.

On a number of occasions, there were potentially interesting articles in non-open access journals that neither MIT nor Harvard subscribes to and that were otherwise not accessible to me. In several cases where the research was obviously important to my work, I made an interlibrary request, emailed the papers’ authors for copies, or tracked down a colleague at an institution with access.

Of course, articles that look potentially interesting from the title and abstract often end up being less relevant or well executed on closer inspection. I tend to cast a wide net, skim many articles, and put them aside when it’s clear that the study is not for me. This week, I downloaded many of these possibly relevant papers to, at least, give a skim. But only if I could download them easily. On three or four occasions, I found inaccessible articles at this margin of relevance. In these cases, I did not bother trying to track down the articles.

Of course, what appear to be marginally relevant articles sometimes end up being a great match for my research and I will end up citing and building on the work. I found several suprisingly interesting papers last week. The articles that were locked up have no chance at this.

When people suggest that open access hinders the spread of scholarship, a common retort is that the people who need the work have or can finagle access. For the papers we know we need, this might be true. As someone with access to two of the most well endowed libraries in academia who routinely requests otherwise inaccessible articles through several channels, I would have told you, a week ago, that locked-down journals were unlikely to keep me from citing anybody.

So it was interesting watching myself do a personal cost calculation in a way that sidelined published scholarship — and that open access publishing would have prevented. At the margin of relevance to ones research, open access may make a big difference.

Sounds Like a Map

Colored visualization of the puzzle.

I love maps — something that became clear to me when I was looking at the tag cloud of my bookmarks a few years back. One of my favorite blogs (now a book) is Frank Jabobs’ Strange Maps.

So it’s no coincidence that a number of my favorite MIT Mystery Hunt puzzles are map based. Trying to connect the two worlds, I sent Jacobs a write-up of the hunt and of a particularly strange sound-based map puzzle called White Noise that I worked with Don Armstrong to solve in the 2006 hunt. While I wasn’t paying attention, Jacobs did a very nice writeup of my writeup of the puzzle for Strange Maps!

The Remixing Dilemma: The Trade-off Between Generativity and Originality

This post was written with Andrés Monroy-Hernández. It is a summary of a paper just published in American Behavioral Scientist. You can also read the full paper: The remixing dilemma: The trade-off between generativity and originality. It is part of a series of papers I have written with Monroy-Hernández using data from Scratch. You can find the others on my academic website.

Remixing — the reworking and recombination of existing creative artifacts — represents a widespread, important, and controversial form of social creativity online. Proponents of remix culture often speak of remixing in terms of rich ecosystems where creative works are novel and highly generative. However, examples like this can be difficult to find. Although there is a steady stream of media being shared freely on the web, only a tiny fraction of these projects are remixed even once. On top of this, many remixes are not very different from the works they are built upon. Why is some content more attractive to remixers? Why are some projects remixed in deeper and more transformative ways?
Remix Diagram
We try to shed light on both of these questions using data from Scratch — a large online remixing community. Although we find support for several popular theories, we also present evidence in support of a persistent trade-off that has broad practical and theoretical implications. In what we call the remixing dilemma, we suggest that characteristics of projects that are associated with higher rates of remixing are also associated with simpler and less transformative types of derivatives.

Our study is focused on two interrelated research questions. First, we ask why some projects shared in remixing communities are more or less generative than others. “Generativity” — a term we borrow from Jonathan Zittrain — describes creative works that are likely to inspire follow-on work. Several scholars have offered suggestions for why some creative works might be more generative than others. We focus on three central theories:

  1. Projects that are moderately complicated are more generative. The free and open source software motto “release early and release often” suggests that simple projects will offer more obvious opportunities for contribution than more polished projects. That said, projects that are extremely simple (e.g., completely blank slates) may also uninspiring to would-be contributors.
  2. Projects by prominent creators are more generative. The reasoning for this claim comes from the suggestion that remixing can act as a form of cultural conversation and that the work of popular creators can act like a common medium or language.
  3. Projects that are remixes themselves are more generative. The reasoning for this final claim comes from the idea that remixing thrives through the accumulation of contributions from groups of people building on each other’s work.

Our second question focuses on the originality of remixes and asks when more or less transformative remixing occurs. For example, highly generative projects may be less exciting if the projects produced based on them are all near-identical copies of antecedent projects. For a series of reasons — including the fact that increased generativity might come by attracting less interested, skilled, or motivated individuals — we suggest that each of the factors associated with generativity will also be associated with less original forms of remixing. We call this trade-off the remixing dilemma.

We answer both of our research questions using a detailed dataset from Scratch, where young people build, share, and collaborate on interactive animations and video games. The community was built to support users of the Scratch programming environment, a desktop application with functionality similar to Flash created by the Lifelong Kindergarten Group at the MIT Media Lab. Scratch is designed to allow users to build projects by integrating images, music, sound, and other media with programming code. Scratch is used by more than a million users, most of them under 18 years old.

To test our three theories about generativity, we measure whether or not, as well as how many times, Scratch projects were remixed in a dataset that includes every shared project. Although Scratch is designed as a remixing community, only around one tenth of all Scratch projects are ever remixed. Because more popular projects are remixed more frequently simply because of exposure, we control for the number of times each project is viewed.

Our analysis shows at least some support for all three theories of generativity described above. (1) Projects with moderate amounts of code are remixed more often than either very simple or very complex projects. (2) Projects by more prominent creators are more generative. (3) Remixes are more likely to attract remixers than de novo projects.

To test our theory that there is a trade-off between generativity and originality, we build a dataset that includes every Scratch remix and its antecedent. For each pair, we construct a measure of originality by comparing the remix to its antecedent and computing an “edit distance” (a concept we borrow from software engineering) to determine how much the projects differ.

We find strong evidence of a trade-off: (1) Projects of moderate complexity are remixed more lightly than more complicated projects. (2) Projects by more prominent creators tend to be remixed in less transformative ways. (3) Cumulative remixing tends to be associated with shallower and less transformative derivatives. That said, our support for (1) is qualified in that we do not find evidence of the increased originality for the simplest projects as our theory predicted.

Two plots of estimated values for prototypical projects. Panel 1 (left) display predicted probabilities of being remixed. Panel 2 (right) display predicted edit distances. Both panels show predicted values for both remixes and de novo projects from 0 to 1,204 blocks (99th percentile).
Two plots of estimated values for prototypical projects. Panel 1 (left) displays predicted probabilities of being remixed. Panel 2 (right) displays predicted edit distances. Both panels show predicted values for both remixes and de novo projects from 0 to 1,204 blocks (99th percentile).

We feel that our results raise difficult but important challenges, especially for the designers of social media systems. For example, many social media sites track and display user prominence with leaderboards or lists of aggregate views. This technique may lead to increased generativity by emphasizing and highlighting creator prominence. That said, it may also lead to a decrease in originality of the remixes elicited. Our results regarding the relationship of complexity to generativity and originality of remixes suggest that supporting increased complexity, at least for most projects, may have fewer drawbacks.

As supporters and advocates of remixing, we feel that although highly generative works that lead to highly original derivatives may be rare and difficult for system designers to support, understanding remixing dynamics and encouraging these rare projects remain a worthwhile and important goal.

Benjamin Mako Hill, Massachusetts Institute of Technology
Andrés Monroy-Hernández, Microsoft Research

For more, see our full paper, “The remixing dilemma: The trade-off between generativity and originality.” Published in American Behavioral Scientist. 57-5, Pp. 643—663. (Official Link, Pay-Walled ).

Students for Free Culture Conference FCX2013

FCX2013 Logo

On the weekend of April 20-21, Students for Free Culture is going to be holding its annual conference, FCX2013, at New York Law School in New York City. As a long-time SFC supporter and member, I am enormously proud to be giving the opening keynote address.

Although the program for Sunday is still shaping up, the published Saturday schedule looks great. If previous years are any indication, the conference can serve as an incredible introduction to free culture, free software, wikis, remixing, copyright, patent and trademark reform, and participatory culture. For folks that are already deeply involved, FCX is among the best places I know to connect with other passionate, creative, people working on free culture issues.

I’ve been closely following and involved with SFC for years and I am particularly excited about the group that is driving the organization forward this year. If you will be in or near New York that weekend — or if you can make the trip — you should definitely try to attend.

FCX2013 is pay what you can with a $15 suggested donation. You can register online now. Travel assistance — especially for members of active SFC chapters — may still be available. I hope to see you there!