In late October, the FSF posted this video of a talk called When Free Software Isn’t (Practically) Better that I gave at LibrePlanet earlier in the year. I noticed it was public when, out of the blue, I started getting both a bunch of positive feedback about the talk as well as many people pointing out that my slides (which were rather important) were not visible in the video!
The talk is very roughly based on this 2010 article and I argue that, despite our advocacy, free software isn’t always (or even often) better in practical terms. The talk moves beyond the article and tries to be more constructive by pointing to a series of inherent practical benefits grounded in software freedom principles and practice.
Most important to me though, the talk reflects my first serious attempt to bring together some of the findings from my day job as a social scientist with my work as a free software advocate. I present some nuggets from my own research and talk about about what they mean for free software and its advocates.
As things settle and I develop a little extra bandwidth, I am trying to take time to get connected to my community. If you’re in Seattle and know me, drop me a line! If you’re in Seattle but don’t know me yet, do the same so we can fix that!
On Wednesday, I successfully defended my PhD dissertation in front of a ridiculously packed house at the MIT Media Lab. I am humbled by the support shown by the MIT Sloan, Media Lab, and Harvard communities. Earlier today, I finished up paperwork and submitted my archival copies. I’m done.
My dissertation was three essays on volunteer mobilization in peer production. Once I have a chance to catch up and recover, I’ll be posting the previously unpublished pieces. The Remixing Dilemma was included in the dissertation and is already online. The Media Lab AV team shot professional video of the talk. When I get a copy of the video, I’ll post that too.
In a new paper, recently published in the open access journal PLOSONE, Aaron Shaw and I build on new research in survey methodology to describe a method for estimating bias in opt-in surveys of contributors to online communities. We use the technique to reevaluate the most widely cited estimate of the gender gap in Wikipedia.
A series of studies have shown that Wikipedia’s editor-base is overwhelmingly male. This extreme gender imbalance threatens to undermine Wikipedia’s capacity to produce high quality information from a full range of perspectives. For example, many articles on topics of particular interest to women tend to be under-produced or of poor quality.
Given the open and often anonymous nature of online communities, measuring contributor demographics is a challenge. Most demographic data on Wikipedia editors come from “opt-in” surveys where people respond to open, public invitations. Unfortunately, very few people answer these invitations. Results from opt-in surveys are unreliable because respondents are rarely representative of the community as a whole. The most widely-cited estimate from a large 2008 survey by the Wikimedia Foundation (WMF) and UN University in Maastrict (UNU-MERIT) suggested that only 13% of contributors were female. However, the very same survey suggested that less than 40% of Wikipedia’s readers were female. We know, from several reliable sources, that Wikipedia’s readership is evenly split by gender — a sign of bias in the WMF/UNU-MERIT survey.
We use the Pew dataset to provide baseline information about Wikipedia readers.
We apply a statistical technique called “propensity scoring” to estimate the likelihood that a US adult Wikipedia reader would have volunteered to participate in the WMF/UNU-MERIT survey.
We follow a process originally developed by Valliant and Dever to weight the WMF/UNU-MERIT survey to “correct” for estimated bias.
We extend this weighting technique to Wikipedia editors in the WMF/UNU data to produce adjusted estimates of the demographics of their sample.
Using this method, we estimate that the proportion of female US adult editors was 27.5% higher than the original study reported (22.7%, versus 17.8%), and that the total proportion of female editors was 26.8% higher (16.1%, versus 12.7%). These findings are consistent with other work showing that opt-in surveys tend to undercount women.
Overall, these results reinforce the basic substantive finding that women are vastly under-represented among Wikipedia editors.
Beyond Wikipedia, our paper describes a method online communities can adopt to estimate contributor demographics using opt-in surveys, but that is more credible than relying entirely on opt-in data. Advertising-intelligence firms like ComScore and Quantcast provide demographic data on the readership of an enormous proportion of websites. With these sources, almost any community can use our method (and source code) to replicate a similar analysis by: (1) surveying a community’s readers (or a random subset) with the same instrument used to survey contributors; (2) combining results for readers with reliable demographic data about the readership population from a credible source; (3) reweighting survey results using the method we describe.
Although our new estimates will not help us us close the gender gap in Wikipedia or address its troubling implications, they give us a better picture of the problem. Additionally, our method offers an improved tool to build a clearer demographic picture of other online communities in general.
I was extremely disappointed when I installed Iceowl and discovered that it does not ship with an awesome logo or icons showing a picture of an “IceOwl.” Instead, it seems to be represented by picture of a (boring) paper calendar which is very generic and not awesome at all.
IceWeasel, IceDove, and IceApe each include extremely awesome logos/icons that have really cool looking white illustrations of “icy” weasels, doves, and apes. IceOwl needs a similarly awesome logo to use as its icon.
This bug seems particularly egregious because owls actually live in icy climates and come in white versions! For example:
Last year, I applied for academic, tenure track, jobs at several communication departments, information schools, and in HCI-focused computer science programs with a tradition of hiring social scientists.
Being “on the market” — as it is called — is both scary and time consuming. Like me, many candidates have never been on the market before. Candidates are asked to produce documents in genres — e.g., cover letters, research statements, teaching statements, diversity statements — that most candidates have never written, read, or even heard of.
Candidates often rely on their supervisors for advice. I did so and my advisors were extremely helpful. The reality, however, is that although candidates’ advisors may sit on hiring committees, most have not been on candidates’ side of job market themselves for years or even decades.
The Internet is full of websites, like the academic jobs wiki, Academia StackExchange, and the Chronicle of Higher Education forums for people on the market. Confused and insecure candidates ask questions of the form, “Does blank matter?” and the answer is usually, “Doing/having blank may help/hurt, but it is only one factor of many.” The result is that candidates worry about everything. Then they worry about what they should be worrying about, but are not.
The most helpful thing, for me, was to read and synthesize the material submitted by recent successful job market candidates. For example, Michael Bernstein — a friend from MIT, now at Stanford — published his research and teaching statements on his website and I found both useful as I prepared mine. That said, I was surprised by how little material like this I could find on the web. For example, I could not find any examples of recent job market cover letters from successful candidates in fields close to mine.
So to help fill this gap, I am publishing all of my job market material. I’ve posted both the PDFs of the material I submitted as well as the LaTeX templates I used to generate the documents in my packet. My packet included:
Research Statement (TeX) — A description of my research to date and my current trajectory. Following a convention I have seen others follow, I “cited” my own work (but only my work) to form a a curated bibliography of my own publications and working papers.
Teaching Statement (TeX) — A two-page description of my approach to teaching, a list of my teaching experience, and a description of sample courses.
Diversity Statement (TeX) — A description of how I think about diversity and how I have, and will, engage with it in my teaching and research.
Cover Letter (TeX) — Each application I sent had a customized cover letter. I wrote mine on MIT letter head. Since each letter is different, I have published the letter I sent to the department that I took the job in (UW Communication). Because my new department did not request research and teaching statements, the cover letter includes material taken from both. For departments that requested separate statements, I limited myself to a shorter (1.5 pages) version of the letter with a similar structure.
Writing Samples — I included three or four of my papers to every job I applied to. The selection of articles changed a bit depending on the department but I included at least one single-authored paper in each packet.
Curriculum Vitae (TeX) — I have tried to keep my CV up-to-date during graduate school. I keep my CV in git and have a little CGI script automatically rebuild the published version whenever an update is committed.
I hope people going “on the market” will find these materials useful. Obviously, you should not copy or reuse the text of any of my material. It is your application, after all. That said, please do help yourself to the formatting and structure.
Finally, I would encourage anyone who builds on my material to republish their own material to help other candidates. If you do, I’d appreciate a link back or comment on this blog post so that my readers can find your improvements.
When I last lived in Seattle, nearly a decade ago, I hosted the “Debian Seattle Social” email list. When I left the city, the mailing list eventually fell victim to bitrot.
When Allison Randall asked me about the list a couple months ago, I decided that moving back to Seattle was a good excuse to work with Allison and some others to revive the community. Toward that end, I’ve put up a little website and created a new mailing list. It’s hosted on Alioth this time which will be reliable than me. Since it has been years, we have not moved over the old subscriber list so you’ll have to sign up again if you were on it before.
If you’re a Debian developer or user and you’d like to hear about infrequent Debian social gatherings in the Seattle area, you should sign up on the list!
I’ll be spending the week after next (June 17-23) in London for the annual meeting of the International Communication Association where I’ll be presenting a paper. This will be my first ICA and I’m looking forward to connecting with many new colleagues in the discipline. If you’re one of them, reading this, and would like to meet up in London, please let me know!
I use RSS feeds to keep up with academic journals. Because of an undocumented and unexpected feature (bug?) in my (otherwise wonderful) free software newsreader NewsBlur, many articles published over the last year were marked as having been read before I saw them.
Over the last week, I caught up. I spent hours going through abstracts and downloading papers that looked interesting or relevant to my research. Because I did this for hundreds of articles, it gave me an unusual opportunity to reflect on my journal reading practices in a systematic way.
On a number of occasions, there were potentially interesting articles in non-open access journals that neither MIT nor Harvard subscribes to and that were otherwise not accessible to me. In several cases where the research was obviously important to my work, I made an interlibrary request, emailed the papers’ authors for copies, or tracked down a colleague at an institution with access.
Of course, articles that look potentially interesting from the title and abstract often end up being less relevant or well executed on closer inspection. I tend to cast a wide net, skim many articles, and put them aside when it’s clear that the study is not for me. This week, I downloaded many of these possibly relevant papers to, at least, give a skim. But only if I could download them easily. On three or four occasions, I found inaccessible articles at this margin of relevance. In these cases, I did not bother trying to track down the articles.
Of course, what appear to be marginally relevant articles sometimes end up being a great match for my research and I will end up citing and building on the work. I found several suprisingly interesting papers last week. The articles that were locked up have no chance at this.
When people suggest that open access hinders the spread of scholarship, a common retort is that the people who need the work have or can finagle access. For the papers we know we need, this might be true. As someone with access to two of the most well endowed libraries in academia who routinely requests otherwise inaccessible articles through several channels, I would have told you, a week ago, that locked-down journals were unlikely to keep me from citing anybody.
So it was interesting watching myself do a personal cost calculation in a way that sidelined published scholarship — and that open access publishing would have prevented. At the margin of relevance to ones research, open access may make a big difference.
Remixing — the reworking and recombination of existing creative artifacts — represents a widespread, important, and controversial form of social creativity online. Proponents of remix culture often speak of remixing in terms of rich ecosystems where creative works are novel and highly generative. However, examples like this can be difficult to find. Although there is a steady stream of media being shared freely on the web, only a tiny fraction of these projects are remixed even once. On top of this, many remixes are not very different from the works they are built upon. Why is some content more attractive to remixers? Why are some projects remixed in deeper and more transformative ways?
We try to shed light on both of these questions using data from Scratch — a large online remixing community. Although we find support for several popular theories, we also present evidence in support of a persistent trade-off that has broad practical and theoretical implications. In what we call the remixing dilemma, we suggest that characteristics of projects that are associated with higher rates of remixing are also associated with simpler and less transformative types of derivatives.
Our study is focused on two interrelated research questions. First, we ask why some projects shared in remixing communities are more or less generative than others. “Generativity” — a term we borrow from Jonathan Zittrain — describes creative works that are likely to inspire follow-on work. Several scholars have offered suggestions for why some creative works might be more generative than others. We focus on three central theories:
Projects that are moderately complicated are more generative. The free and open source software motto “release early and release often” suggests that simple projects will offer more obvious opportunities for contribution than more polished projects. That said, projects that are extremely simple (e.g., completely blank slates) may also uninspiring to would-be contributors.
Projects by prominent creators are more generative. The reasoning for this claim comes from the suggestion that remixing can act as a form of cultural conversation and that the work of popular creators can act like a common medium or language.
Projects that are remixes themselves are more generative. The reasoning for this final claim comes from the idea that remixing thrives through the accumulation of contributions from groups of people building on each other’s work.
Our second question focuses on the originality of remixes and asks when more or less transformative remixing occurs. For example, highly generative projects may be less exciting if the projects produced based on them are all near-identical copies of antecedent projects. For a series of reasons — including the fact that increased generativity might come by attracting less interested, skilled, or motivated individuals — we suggest that each of the factors associated with generativity will also be associated with less original forms of remixing. We call this trade-off the remixing dilemma.
We answer both of our research questions using a detailed dataset from Scratch, where young people build, share, and collaborate on interactive animations and video games. The community was built to support users of the Scratch programming environment, a desktop application with functionality similar to Flash created by the Lifelong Kindergarten Group at the MIT Media Lab. Scratch is designed to allow users to build projects by integrating images, music, sound, and other media with programming code. Scratch is used by more than a million users, most of them under 18 years old.
To test our three theories about generativity, we measure whether or not, as well as how many times, Scratch projects were remixed in a dataset that includes every shared project. Although Scratch is designed as a remixing community, only around one tenth of all Scratch projects are ever remixed. Because more popular projects are remixed more frequently simply because of exposure, we control for the number of times each project is viewed.
Our analysis shows at least some support for all three theories of generativity described above. (1) Projects with moderate amounts of code are remixed more often than either very simple or very complex projects. (2) Projects by more prominent creators are more generative. (3) Remixes are more likely to attract remixers than de novo projects.
To test our theory that there is a trade-off between generativity and originality, we build a dataset that includes every Scratch remix and its antecedent. For each pair, we construct a measure of originality by comparing the remix to its antecedent and computing an “edit distance” (a concept we borrow from software engineering) to determine how much the projects differ.
We find strong evidence of a trade-off: (1) Projects of moderate complexity are remixed more lightly than more complicated projects. (2) Projects by more prominent creators tend to be remixed in less transformative ways. (3) Cumulative remixing tends to be associated with shallower and less transformative derivatives. That said, our support for (1) is qualified in that we do not find evidence of the increased originality for the simplest projects as our theory predicted.
We feel that our results raise difficult but important challenges, especially for the designers of social media systems. For example, many social media sites track and display user prominence with leaderboards or lists of aggregate views. This technique may lead to increased generativity by emphasizing and highlighting creator prominence. That said, it may also lead to a decrease in originality of the remixes elicited. Our results regarding the relationship of complexity to generativity and originality of remixes suggest that supporting increased complexity, at least for most projects, may have fewer drawbacks.
As supporters and advocates of remixing, we feel that although highly generative works that lead to highly original derivatives may be rare and difficult for system designers to support, understanding remixing dynamics and encouraging these rare projects remain a worthwhile and important goal.
On the weekend of April 20-21, Students for Free Culture is going to be holding its annual conference, FCX2013, at New York Law School in New York City. As a long-time SFC supporter and member, I am enormously proud to be giving the opening keynote address.
Although the program for Sunday is still shaping up, the published Saturday schedule looks great. If previous years are any indication, the conference can serve as an incredible introduction to free culture, free software, wikis, remixing, copyright, patent and trademark reform, and participatory culture. For folks that are already deeply involved, FCX is among the best places I know to connect with other passionate, creative, people working on free culture issues.
I’ve been closely following and involved with SFC for years and I am particularly excited about the group that is driving the organization forward this year. If you will be in or near New York that weekend — or if you can make the trip — you should definitely try to attend.
A few months late, perhaps, but I wanted to mention that my team (Codex) competed, once again, in the MIT Mystery Hunt. The prize for winning is the responsibility of writing the hunt next year. After being on the 2012 writing team I have mixed feelings about the fact that we did not win again.
Although I did not walk away with another coin, I did manage to make an appearance in a multimedia story that the MIT Tech shot about the hunt which provides nice introductions to those who are not familiar with it.
It was fun to reflect a little bit on why I find the hunt so fun. I said:
In my day job, I work on a lot of problems that maybe don’t have answers. It’s really awesome to work in a space where if you put in 1-5 hours of mental energy, you will have a solution. Someone has test-solved for it, you know that it’s solvable; you know that there is an answer and that there was designed to be an answer. There’s something really satisfying about being able to do that.
If you’re looking for a team to hunt on next year, feel free get in contact — codex is generally quite open to new members.