Tag Archives: academic

Community Data Science Workshops in Seattle

Photo from the Boston Python Workshop – a similar workshop run in Boston that has inspired and provided a template for the CDSW.

Photo from the Boston Python Workshop – a similar workshop run in Boston that has inspired and provided a template for the CDSW.

On three Saturdays in April and May, I will be helping run three day-long project-based workshops at the University of Washington in Seattle. The workshops are for anyone interested in learning how to use programming and data science tools to ask and answer questions about online communities like Wikipedia, Twitter, free  and open source software, and civic media.

The workshops are for people with no previous programming experience and the goal is to bring together researchers as well as participants and leaders in online communities.  The workshops will all be free of charge and open to the public given availability of space.

Our goal is that, after the three workshops, participants will be able to use data to produce numbers, hypothesis tests, tables, and graphical visualizations to answer questions like:

  • Are new contributors to an article in Wikipedia sticking around longer or contributing more than people who joined last year?
  • Who are the most active or influential users of a particular Twitter hashtag?
  • Are people who participated in a Wikipedia outreach event staying involved? How do they compare to people that joined the project outside of the event?

If you are interested in participating, fill out our registration form here. The deadline to register is Wednesday March 26th.  We will let participants know if we have room for them by Saturday March 29th. Space is limited and will depend on how many mentors we can recruit for the sessions.

If you already have experience with Python, please consider helping out at the sessions as a mentor. Being a mentor will involve working with participants and talking them through the challenges they encounter in programming. No special preparation is required.  If you’re interested,  send me an email.

Doctor of Philosophy

On Wednesday, I successfully defended my PhD dissertation in front of a ridiculously packed house at the MIT Media Lab. I am humbled by the support shown by the MIT Sloan, Media Lab, and Harvard communities. Earlier today, I finished up paperwork and submitted my archival copies. I’m done.

Although I’ve often heard PhDs described as emotional roller coasters, I feel enormously blessed in that I honestly can’t relate. My eight years at MIT and Harvard have been almost universally positive and I have learned and grown indescribably. As excited as I am about my next chapter at the University of Washington, I’m going to miss my life here. Deeply.

My dissertation was three essays on volunteer mobilization in peer production. Once I have a chance to catch up and recover, I’ll be posting the previously unpublished pieces. The Remixing Dilemma was included in the dissertation and is already online. The Media Lab AV team shot professional video of the talk. When I get a copy of the video, I’ll post that too.

But because I think it’s important, I’ve formatted and published the acknowledgments section of the dissertation today. Although there are too many folks to thank, I’ve highlighted the contributions of my co-authors, and friends, Aaron Shaw and Andrés Monroy Hernández and my almost unbelievably incredible group of advisors: Eric von Hippel, Yochai Benkler, Mitch Resnick, and Tom Malone.

The Wikipedia Gender Gap Revisited

In a new paper, recently published in the open access journal PLOSONE, Aaron Shaw and I build on new research in survey methodology to describe a method for estimating bias in opt-in surveys of contributors to online communities. We use the technique to reevaluate the most widely cited estimate of the gender gap in Wikipedia.

A series of studies have shown that Wikipedia’s editor-base is overwhelmingly male. This extreme gender imbalance threatens to undermine Wikipedia’s capacity to produce high quality information from a full range of perspectives. For example, many articles on topics of particular interest to women tend to be under-produced or of poor quality.

Given the open and often anonymous nature of online communities, measuring contributor demographics is a challenge. Most demographic data on Wikipedia editors come from “opt-in” surveys where people respond to open, public invitations. Unfortunately, very few people answer these invitations. Results from opt-in surveys are unreliable because respondents are rarely representative of the community as a whole. The most widely-cited estimate from a large 2008 survey by the Wikimedia Foundation (WMF) and UN University in Maastrict (UNU-MERIT) suggested that only 13% of contributors were female. However, the very same survey suggested that less than 40% of Wikipedia’s readers were female. We know, from several reliable sources, that Wikipedia’s readership is evenly split by gender — a sign of bias in the WMF/UNU-MERIT survey.

In our paper, we combine data from a nationally representative survey of the US by the Pew Internet and American Life Project with the opt-in data from the 2008 WMF/UNU-MERIT survey to come up with revised estimates of the Wikipedia gender gap. The details of the estimation technique are in the paper, but the core steps are:

  1. We use the Pew dataset to provide baseline information about Wikipedia readers.
  2. We apply a statistical technique called “propensity scoring” to estimate the likelihood that a US adult Wikipedia reader would have volunteered to participate in the WMF/UNU-MERIT survey.
  3. We follow a process originally developed by Valliant and Dever to weight the WMF/UNU-MERIT survey to “correct” for estimated bias.
  4. We extend this weighting technique to Wikipedia editors in the WMF/UNU data to produce adjusted estimates of the demographics of their sample.

Using this method, we estimate that the proportion of female US adult editors was 27.5% higher than the original study reported (22.7%, versus 17.8%), and that the total proportion of female editors was 26.8% higher (16.1%, versus 12.7%). These findings are consistent with other work showing that opt-in surveys tend to undercount women.

Overall, these results reinforce the basic substantive finding that women are vastly under-represented among Wikipedia editors.

Beyond Wikipedia, our paper describes a method online communities can adopt to estimate contributor demographics using opt-in surveys, but that is more credible than relying entirely on opt-in data. Advertising-intelligence firms like ComScore and Quantcast provide demographic data on the readership of an enormous proportion of websites. With these sources, almost any community can use our method (and source code) to replicate a similar analysis by: (1) surveying a community’s readers (or a random subset) with the same instrument used to survey contributors; (2) combining results for readers with reliable demographic data about the readership population from a credible source; (3) reweighting survey results using the method we describe.

Although our new estimates will not help us us close the gender gap in Wikipedia or address its troubling implications, they give us a better picture of the problem. Additionally, our method offers an improved tool to build a clearer demographic picture of other online communities in general.

Job Market Materials

Last year, I applied for academic, tenure track, jobs at several communication departments, information schools, and in HCI-focused computer science programs with a tradition of hiring social scientists.

Being “on the market” — as it is called — is both scary and time consuming. Like me, many candidates have never been on the market before. Candidates are asked to produce documents in genres — e.g., cover letters, research statements, teaching statements, diversity statements — that most candidates have never written, read, or even heard of.

Candidates often rely on their supervisors for advice. I did so and my advisors were extremely helpful. The reality, however, is that although candidates’ advisors may sit on hiring committees, most have not been on candidates’ side of job market themselves for years or even decades.

The Internet is full of websites, like the academic jobs wiki, Academia StackExchange, and the Chronicle of Higher Education forums for people on the market. Confused and insecure candidates ask questions of the form, “Does blank matter?” and the answer is usually, “Doing/having blank may help/hurt, but it is only one factor of many.” The result is that candidates worry about everything. Then they worry about what they should be worrying about, but are not.

The most helpful thing, for me, was to read and synthesize the material submitted by recent successful job market candidates. For example, Michael Bernstein — a friend from MIT, now at Stanford — published his research and teaching statements on his website and I found both useful as I prepared mine. That said, I was surprised by how little material like this I could find on the web. For example, I could not find any examples of recent job market cover letters from successful candidates in fields close to mine.

So to help fill this gap, I am publishing all of my job market material. I’ve posted both the PDFs of the material I submitted as well as the LaTeX templates I used to generate the documents in my packet. My packet included:

  • Research Statement (TeX) — A description of my research to date and my current trajectory. Following a convention I have seen others follow, I “cited” my own work (but only my work) to form a a curated bibliography of my own publications and working papers.
  • Teaching Statement (TeX) — A two-page description of my approach to teaching, a list of my teaching experience, and a description of sample courses.
  • Diversity Statement (TeX) — A description of how I think about diversity and how I have, and will, engage with it in my teaching and research.
  • Cover Letter (TeX) — Each application I sent had a customized cover letter. I wrote mine on MIT letter head. Since each letter is different, I have published the letter I sent to the department that I took the job in (UW Communication). Because my new department did not request research and teaching statements, the cover letter includes material taken from both. For departments that requested separate statements, I limited myself to a shorter (1.5 pages) version of the letter with a similar structure.
  • Writing Samples — I included three or four of my papers to every job I applied to. The selection of articles changed a bit depending on the department but I included at least one single-authored paper in each packet.
  • Letters of Recommendation — Because I didn’t write these and haven’t seen them, I can’t share them. I requested letters from my four committee members: Eric von Hippel, Yochai Benkler, Mitch Resnick, and Tom Malone.
  • Curriculum Vitae (TeX) — I have tried to keep my CV up-to-date during graduate school. I keep my CV in git and have a little CGI script automatically rebuild the published version whenever an update is committed.

I hope people going “on the market” will find these materials useful. Obviously, you should not copy or reuse the text of any of my material. It is your application, after all. That said, please do help yourself to the formatting and structure.

Finally, I would encourage anyone who builds on my material to republish their own material to help other candidates. If you do, I’d appreciate a link back or comment on this blog post so that my readers can find your improvements.

London and Michigan

I’ll be spending the week after next (June 17-23) in London for the annual meeting of the International Communication Association where I’ll be presenting a paper. This will be my first ICA and I’m looking forward to connecting with many new colleagues in the discipline. If you’re one of them, reading this, and would like to meet up in London, please let me know!

Starting June 24th, I’ll be in Ann Arbor, Michigan for four weeks of the ICPSR summer program in applied statistics at the Institute for Social Research. I have been wanting to sign up for some of their advanced methods classes for years and am planning to take the opportunity this summer before I start at UW. I’ll be living with my friends and fellow Berkman Cooperation Group members Aaron Shaw and Dennis Tennen.

I would love to make connections and meet people in both places so, if you would like to meet up, please get in contact.

The Cost of Inaccessibility at the Margins of Relevance

I use RSS feeds to keep up with academic journals. Because of an undocumented and unexpected feature (bug?) in my (otherwise wonderful) free software newsreader NewsBlur, many articles published over the last year were marked as having been read before I saw them.

Over the last week, I caught up. I spent hours going through abstracts and downloading papers that looked interesting or relevant to my research. Because I did this for hundreds of articles, it gave me an unusual opportunity to reflect on my journal reading practices in a systematic way.

On a number of occasions, there were potentially interesting articles in non-open access journals that neither MIT nor Harvard subscribes to and that were otherwise not accessible to me. In several cases where the research was obviously important to my work, I made an interlibrary request, emailed the papers’ authors for copies, or tracked down a colleague at an institution with access.

Of course, articles that look potentially interesting from the title and abstract often end up being less relevant or well executed on closer inspection. I tend to cast a wide net, skim many articles, and put them aside when it’s clear that the study is not for me. This week, I downloaded many of these possibly relevant papers to, at least, give a skim. But only if I could download them easily. On three or four occasions, I found inaccessible articles at this margin of relevance. In these cases, I did not bother trying to track down the articles.

Of course, what appear to be marginally relevant articles sometimes end up being a great match for my research and I will end up citing and building on the work. I found several suprisingly interesting papers last week. The articles that were locked up have no chance at this.

When people suggest that open access hinders the spread of scholarship, a common retort is that the people who need the work have or can finagle access. For the papers we know we need, this might be true. As someone with access to two of the most well endowed libraries in academia who routinely requests otherwise inaccessible articles through several channels, I would have told you, a week ago, that locked-down journals were unlikely to keep me from citing anybody.

So it was interesting watching myself do a personal cost calculation in a way that sidelined published scholarship — and that open access publishing would have prevented. At the margin of relevance to ones research, open access may make a big difference.

The Remixing Dilemma: The Trade-off Between Generativity and Originality

This post was written with Andrés Monroy-Hernández. It is a summary of a paper just published in American Behavioral Scientist. You can also read the full paper: The remixing dilemma: The trade-off between generativity and originality. It is part of a series of papers I have written with Monroy-Hernández using data from Scratch. You can find the others on my academic website.

Remixing — the reworking and recombination of existing creative artifacts — represents a widespread, important, and controversial form of social creativity online. Proponents of remix culture often speak of remixing in terms of rich ecosystems where creative works are novel and highly generative. However, examples like this can be difficult to find. Although there is a steady stream of media being shared freely on the web, only a tiny fraction of these projects are remixed even once. On top of this, many remixes are not very different from the works they are built upon. Why is some content more attractive to remixers? Why are some projects remixed in deeper and more transformative ways?
Remix Diagram
We try to shed light on both of these questions using data from Scratch — a large online remixing community. Although we find support for several popular theories, we also present evidence in support of a persistent trade-off that has broad practical and theoretical implications. In what we call the remixing dilemma, we suggest that characteristics of projects that are associated with higher rates of remixing are also associated with simpler and less transformative types of derivatives.

Our study is focused on two interrelated research questions. First, we ask why some projects shared in remixing communities are more or less generative than others. “Generativity” — a term we borrow from Jonathan Zittrain — describes creative works that are likely to inspire follow-on work. Several scholars have offered suggestions for why some creative works might be more generative than others. We focus on three central theories:

  1. Projects that are moderately complicated are more generative. The free and open source software motto “release early and release often” suggests that simple projects will offer more obvious opportunities for contribution than more polished projects. That said, projects that are extremely simple (e.g., completely blank slates) may also uninspiring to would-be contributors.
  2. Projects by prominent creators are more generative. The reasoning for this claim comes from the suggestion that remixing can act as a form of cultural conversation and that the work of popular creators can act like a common medium or language.
  3. Projects that are remixes themselves are more generative. The reasoning for this final claim comes from the idea that remixing thrives through the accumulation of contributions from groups of people building on each other’s work.

Our second question focuses on the originality of remixes and asks when more or less transformative remixing occurs. For example, highly generative projects may be less exciting if the projects produced based on them are all near-identical copies of antecedent projects. For a series of reasons — including the fact that increased generativity might come by attracting less interested, skilled, or motivated individuals — we suggest that each of the factors associated with generativity will also be associated with less original forms of remixing. We call this trade-off the remixing dilemma.

We answer both of our research questions using a detailed dataset from Scratch, where young people build, share, and collaborate on interactive animations and video games. The community was built to support users of the Scratch programming environment, a desktop application with functionality similar to Flash created by the Lifelong Kindergarten Group at the MIT Media Lab. Scratch is designed to allow users to build projects by integrating images, music, sound, and other media with programming code. Scratch is used by more than a million users, most of them under 18 years old.

To test our three theories about generativity, we measure whether or not, as well as how many times, Scratch projects were remixed in a dataset that includes every shared project. Although Scratch is designed as a remixing community, only around one tenth of all Scratch projects are ever remixed. Because more popular projects are remixed more frequently simply because of exposure, we control for the number of times each project is viewed.

Our analysis shows at least some support for all three theories of generativity described above. (1) Projects with moderate amounts of code are remixed more often than either very simple or very complex projects. (2) Projects by more prominent creators are more generative. (3) Remixes are more likely to attract remixers than de novo projects.

To test our theory that there is a trade-off between generativity and originality, we build a dataset that includes every Scratch remix and its antecedent. For each pair, we construct a measure of originality by comparing the remix to its antecedent and computing an “edit distance” (a concept we borrow from software engineering) to determine how much the projects differ.

We find strong evidence of a trade-off: (1) Projects of moderate complexity are remixed more lightly than more complicated projects. (2) Projects by more prominent creators tend to be remixed in less transformative ways. (3) Cumulative remixing tends to be associated with shallower and less transformative derivatives. That said, our support for (1) is qualified in that we do not find evidence of the increased originality for the simplest projects as our theory predicted.

Two plots of estimated values for prototypical projects. Panel 1 (left) display predicted probabilities of being remixed. Panel 2 (right) display predicted edit distances. Both panels show predicted values for both remixes and de novo projects from 0 to 1,204 blocks (99th percentile).

Two plots of estimated values for prototypical projects. Panel 1 (left) displays predicted probabilities of being remixed. Panel 2 (right) displays predicted edit distances. Both panels show predicted values for both remixes and de novo projects from 0 to 1,204 blocks (99th percentile).

We feel that our results raise difficult but important challenges, especially for the designers of social media systems. For example, many social media sites track and display user prominence with leaderboards or lists of aggregate views. This technique may lead to increased generativity by emphasizing and highlighting creator prominence. That said, it may also lead to a decrease in originality of the remixes elicited. Our results regarding the relationship of complexity to generativity and originality of remixes suggest that supporting increased complexity, at least for most projects, may have fewer drawbacks.

As supporters and advocates of remixing, we feel that although highly generative works that lead to highly original derivatives may be rare and difficult for system designers to support, understanding remixing dynamics and encouraging these rare projects remain a worthwhile and important goal.

Benjamin Mako Hill, Massachusetts Institute of Technology
Andrés Monroy-Hernández, Microsoft Research

For more, see our full paper, “The remixing dilemma: The trade-off between generativity and originality.” Published in American Behavioral Scientist. 57-5, Pp. 643—663. (Official Link, Pay-Walled ).

MIT LaTeX Stationery

Color MIT LetterHead Example

The MIT graphic identity website provides downloadable stationery templates for letterhead and envelopes. They provide both Microsoft Word and LaTeX templates. But although they provide both black and white and color templates for Word, they only provide the monochrome templates for LaTeX. When writing cover letters for the job market this year, I was not particularly interested in compromising on color and was completely unwilling to compromise on TeX.

As a result, I ended up modifying each of the three templates to include color. In the process, I fixed a few bugs and documented one tricky issue. I’ve published a git repository with my changes. It includes branches for each version of three of the “old” black and white templates as well as my my three new color templates. I hope others at MIT find it useful. I’ve tried to keep the changes minimal.

I’ve emailed the folks at MIT Communication Production Services to see if they want to publish my modified versions. Until then, anyone interested can help themselves to the git repository. LaTeX user that you are, you probably prefer that anyway.

Conversation on Freedom and Openness in Learning

On Monday, I was a visitor and guest speaker in a session on “Open Learning” in a class on Learning Creative Learning which aims to offer “a course for designers, technologists, and educators.” The class is being offered publicly by the combination — surprising but very close to my heart — of Peer 2 Peer University and the MIT Media Lab.

The hour-long session was facilitated by Philipp Schmidt and was mostly structured around a conversation with Audrey Watters and myself. The rest of the course materials and other video lectures are on the course website.

You can watch the video on YouTube or below. I thought it was a thought-provoking conversation!

If you’re interested in alternative approaches to learning and free software philosophy, I would also urge you to check out an essay I wrote in 2002: The Geek Shall Inherit the Earth: My Story of Unlearning. Keep in mind that the essay is probably the most personal thing I have ever published and I wrote it more than a decade ago it as a twenty-one year old undergraduate at Hampshire College. Although I’ve grown and learned enormously in the last ten years, and although I would not write the same document today, I am still proud of it.

The Cost of Collaboration for Code and Art

This post was written with Andrés Monroy-Hernández for the Follow the Crowd Research Blog. The post is a summary of a paper forthcoming in Computer-Supported Cooperative Work 2013. You read also read the full paper: The Cost of Collaboration for Code and Art: Evidence from Remixing. It is part of a series of papers I have written with Monroy-Hernández using data from Scratch. You can find the others on my academic website.

Does collaboration result in higher quality creative works than individuals working alone? Is working in groups better for functional works like code than for creative works like art? Although these questions lie at the heart of conversations about collaborative production on the Internet and peer production, it can be hard to find research settings where you can compare across both individual and group work and across both code and art. We set out to tackle these questions in the context of a very large remixing community.

Example of a remix in the Scratch online community, and the project it is based off. The orange arrows indicate pieces which were present in the original and reused in the remix.

Remixing platforms provide an ideal setting to answer these questions. Most support the sharing, and collaborative rating, of both individually and collaboratively authored creative works. They also frequently combine code with artistic media like sound and graphics.

We know that that increased collaboration often leads to higher quality products. For example, studies of Wikipedia have suggested that vandalism is detected and removed within minutes, and that high quality articles in Wikipedia, by several measures, tend to be produced by more collaboration. That said, we also know that collaborative work is not always better — for example, that brainstorming results in less good ideas when done in groups. We attempt to answer this broad question, asked many times before, in the context of remixing: Which is the better description, “the wisdom of crowds” or “too many cooks spoil the broth”? That, fundamentally, forms our paper’s first research question: Are remixes, on average, higher quality than single-authored works?

A number of critics of peer production, and some fans, have suggested that mass collaboration on the Internet might work much better for certain kinds of works. The argument is that free software and Wikipedia can be built by a crowd because they are functional. But more creative works — like music, a novel, or a drawing — might benefit less, or even be hurt by, participation by a crowd. Our second research question tries to get at this possibility: Are code-intensive remixes, higher quality than media-intensive remixes?

We try to answers to these questions using a detailed dataset from Scratch – a large online remixing community where young people build, share, and collaborate on interactive animations and video games. The community was built to support users of the Scratch programming environment: a desktop application with functionality similar to Flash created by the Lifelong Kindergarten Group at the MIT Media Lab. Scratch is designed to allow users to build projects by integrating images, music, sound and other media with programming code. Scratch is used by more than a million, mostly young, users.

Measuring quality is tricky and we acknowledge that there are many ways to do it. In the paper, we rely most heavily a measure of peer ratings in Scratch called loveits — very similar to “likes” on Facebook. We find similar results with several other metrics and we control for the number of views a project receives.

In answering our first research question, we find that remixes are, on average, rated as being of lower quality than works of single authorship. This finding was surprising to us but holds up across a number of alternative tests and robustness checks.

In answering our second question, we find rough support for the common wisdom that remixing tends to be more effective for functional works than for artistic media. The more code-intensive a project is, on average, the closer the gap is between a remix and a work of single authorship. But the more media-intensive a project is, the bigger the gap. You can see the relationships that our model predicts in the graph below.

Two plots of estimated values for prototypical projects showing the predicted number of loveits using our estimates. In the left panel, the x-axis varies number of blocks while holding media intensity at the sample median. The right panel varies the number of media elements while holding the number of blocks at the sample median. Ranges for each are from 0 to the 90th percentile.

Both of us are supporters and advocates of remixing. As a result, we were initially a little troubled by our result in this paper. We think the finding suggests an important limit to the broadest claims of the benefit of collaboration in remixing and peer production.

That said, we also reject the blind repetition of the mantra that collaboration is always better — for every definition of “better,” and for every type of work. We think it’s crucial to learn and understand the limitations and challenges associated with remixing and we’re optimistic that this work can influence the design of social media and collaboration systems to help remixing and peer production thrive.

For more, see our full paper, The Cost of Collaboration for Code and Art: Evidence from Remixing.

Heading West

University of Washington Quad in Cherry Blossom Season

This week, I accepted a job on the faculty of at the University of Washington Department of Communication. I’ve arranged for a post-doc during the 2013-2014 academic year which I will spend at UW as an Acting Assistant Professor. I’ll start the tenure-track Assistant Professor position in September 2014. The hire is part of a "big data" push across UW. I will be setting up a lab and research projects, as well as easing into a teaching program, over the next couple years.

I’m not going to try to list all the great people in the department, but UW Communication has an incredible faculty with a strong background in studying the effect of communication technology on society, looking at political communication, enagement, and collective action, and tracing out the implications of new communication technologies — in addition to very strong work in other areas. Years ago, I nearly joined the department as a graduate student. I am unbelievably happy that their faculty has invited me to join as a colleague.

Outside of my new department, the University of Washington has a superb group of folks working across the school on issues of quantitative and computational social science, human-computer interaction, and computer-supported cooperative work. They are hiring a whole bunch of folks, across the university, who specialize in data-driven social science. I already have a bunch of relationships with UW faculty and students and am looking forward to expanding and deepening those.

On a personal level, Mika and I are also very excited to return to Seattle. I grew up in the city and I’ve missed it, deeply, since I left — now nearly half my lifetime ago! It will be wonderful to be much closer to many of my family members.

But I know that I will miss the community of friends and colleagues that I’ve built in Boston over the last 7+ years just as deeply. I’m going to miss the intellectual resources, and the intellectual community, that folks in Cambridge get to take for granted. That said, I plan to maintain affiliations and collaborations with folks at Harvard and MIT and will have resources that let me spend time in Boston doing that.

If you are curious what I’m going to be up to — and what the future is likely to hold in terms of my research — you should check the material I’ve put online as part of the job market this year. I’ve posted just about everything on my academic website. This includes a little four page research statement which describes the work I’ve done and the directions I’ve been thinking about taking it.

The academic job market is challenging and confusing. But it’s given me a lot of opportunity to reflect, at length, on both the substance of my research and the academy and its structures and processes. I’ve got a list of blog topics queued up based on that thinking. I’ll be posting them here on my blog over the next few months.

A Model of Free Software Success

Last week I helped organize the Open and User Innovation Conference at Harvard Business School. One of many interesting papers presented there was an essay on Institutional Change and Information Production by Fabio Landini from the University of Siena.

At the core of the paper is an economic model of the relationship between rights protection and technologies that affects the way that cognitive labor can be divided and aggregated. Although that may sound very abstract (and it is in the paper), it is basically a theory that tries to explain the growth of free software.

The old story about free software and free culture (at least among economists and many other academics) is that the movements surged to prominence over the last decade because improvements in communication technology made new forms of mass-collaboration — like GNU/Linux and Wikipedia — possible. "Possible", for these types of models, usually means profit-maximizing for rational, profit-seeking, actors like capitalist firms. You can basically think of these attempts as trying to explain why open source claims that free licensing leads to "better quality, higher reliability, more flexibility, lower cost" are correct: new technology makes possible an open development process which leads to collaboration which leads to higher quality work which leads to profit.

Landini suggests there are problems with this story. One problem is that it treats technology as being taken for granted and technological changes as effectively being dropped in from outside (i.e., exogenous). Landini points out that software businesses build an enormous amount of technology to help organize their work and to help themselves succeed in what they see as their ideal property rights regime. The key feature of Landini’s alternate model is that it considers this possibility. What comes out the other end of the model is a prediction for a multiple equilibrium system — a situation where there are several strategies that can be stable and profitable. This can help explain why, although free software has succeeded in some areas, its success has hardly been total and usually has not led to change within existing proprietary software firms. After all, there are still plenty of companies selling proprietary software. In Landini’s model, free is just one of several winning options.

But Landini’s model raises what might be an even bigger question. If free software can be as efficient as proprietary software, how would anybody ever find out? If all the successful software companies out there are doing proprietary software, which greedy capitalist is going to take the risk of seeing if they could also be successful by throwing exclusive rights out the window? In the early days, new paths are always unclear, unsure, and unproven.

Landini suggests that ethically motivated free software hackers provide what he calls a "cultural subsidy." Essentially, a few hackers are motivated enough by the ethical principles behind free software that they are willing to contribute to it even when it isn’t clearly better than proprietary alternatives. And in fact, historically speaking, many free software hackers were willing to contribute to free software even when they thought it was likely less profitable than the proprietary alternative models. As Landini suggests, this group was able to build technological platforms and find new social and business arrangements where the free model actually is competitive.

I think that the idea of an "cultural subsidy" is a nice way to think about the important role that ethical arguments play in movements like free software and free culture. "Open source" style efficiency arguments persuade a lot of people. Especially when they are true. But those arguments are only ever true because a group of ethically motivated people fought to find a way to make them true. Free software didn’t start out as competitive with proprietary software. It became so only because a bunch of ethically motivated hackers were willing to "subsidize" the movement with their failed, and successful, attempts at free software and free culture projects and businesses.

Of course, the folks attracted by "open source" style superiority arguments can find the ethical motivated folks shrill, off-putting, and annoying. The ethically motivated folks often think the "efficiency" group is shortsighted and mercenary. But as awkward as this marriage might be, it has some huge upsides. In Landini’s model, the ethical folks can build their better world without convincing everyone else that they are right and by relying, at least in part, on the self-interest of others who don’t share their principles. Just as the free software movement has done.

I think that Landini’s paper is a good description of the critically important role that the free software movement, and the FSF in particular, can play. The influence and importance of individuals motivated by principles can go far beyond the groups of people who take an ethical stand. They can make involvement possible for large groups of people who do not think that taking a stand on a particular ethical issue is even a good idea.

User Innovation on NPR Radio

I was invited onto NPR in Boston this week for a segment on user innovation alongside Eric von Hippel (my advisor at MIT) and Carliss Baldwin from Harvard Business School.

I talked about innovation that has happened on the CHDK platform — a cool firmware hack for Canon cameras example I use in some of my teaching — plus a little bit about free software, the democratization of development and design tools, and a little bit about user communities that LEGO has cultivated.

I would have liked the conversation and terminology to do more to emphasize user freedom and free software, but I’m otherwise pretty happy with the result. The segment will be aired again on NPR in Boston this weekend and is available on the WGBH website.

Wiki Conferencing

I am in Berlin for the Wikipedia Academy, a very cool hybrid free culture community plus refereed academic conference organized, in part, by Wikimedia Deutschland. On Friday, I was very excited to have been invited to give the conference’s opening keynote based on my own hybrid take on learning from failures in peer production and incorporating a bunch of my own research. Today, I was on a panel at the conference about free culture and sharing practices. I’ll post talks materials and videos when the conference puts them online.

I will be in Berlin for the next week or so before I head to directly to Washington, DC for Wikimania between the 11th and 15th. I’ll be giving three talks there:

Between then and now, I’m taking the next week in Berlin to catch up on work, and with friends. If you’re in either place and want to meet up, please get in touch and lets try to arrange something.

Advice for Prospective Doctoral Students

There is tons of advice on the Internet (e.g., on the academic blogs I read) for prospective doctoral students. I am very happy with my own graduate school choices but I feel that I basically got lucky. Few people are saying the two things I really wish someone had told me before I made the decision to get a PhD:

  • Most people getting doctorates would probably be better off doing something else.
  • Evaluating potential programs can basically be done by looking at and talking with a program’s recent graduates.

Most People Getting Doctorates Probably Shouldn’t

In most fields, the only thing you need a PhD for is to become a professor — and even this requirement can be flexible. You can have almost any job in any company or non-profit without a PhD. You can teach without a PhD. You can write books without a PhD. You can do research and work in thinktanks without a PhD. You don’t even always need a PhD to grant PhDs to other people: two of my advisors at the Media Lab supervised PhD work but did not have doctorates themselves! Becoming a tenured professor is more difficult without a doctorate, but it is not impossible. There are grants and jobs outside of universities that require doctorates, but not nearly as many as most people applying for PhDs programs think.

Getting a doctorate can even hurt: If you want to work in a company or non-profit, you are usually better off with 4-6 years of experience doing the kind of work you want to do than with the doctorate and the less relevant experience of getting one. Starting salaries for people with doctorates are often higher than for people with masters degrees. But salaries for people with masters degrees and 5 years of experience are even higher — and that’s before you take into account the opportunity costs of working for relatively low graduate student wages for half a decade.

PhD take an enormous amount of time and, in most programs, you spend a huge amount of this time doing academic busy work, teaching, applying for grants or fellowships, and writing academic papers that very few people read. These are skills you’ll need to be a successful professor. They are useful skills for other jobs too, but not as useful as the experience of actually doing those other jobs for the time it takes to get the degree.

Evaluating Graduate Programs

If you are still convinced you need a doctorate, or any graduate degree for that matter, you will need to pick a program. Plenty of people will offer advice on how to pick the right program and trying to balance all the complicated and contradictory advice can be difficult. Although I love my program and advisors, I’ve known many less happy students. Toward that end, there are two pieces of meta-advice that I wish everybody was told before they applied:

  1. Find recent graduates of the program you are considering, and the faculty advisor(s) you are planning on working with, and look at where they are now. Are these ex-students doing the kind of work that you want to do? Are they at great programs at great universities?

    Chances are good that a PhD program and its faculty will prepare future students to be like, and do work like, the students they have trained in the past. Programs that consistently make good placements are preparing their students well, supporting them, making sure they have the resources necessary to do good work, and helping their students when they are on the job market. A program whose students do poorly, or just end doing work that isn’t like the kind you want to do, will probably fail you too.

  2. If recent graduates seem to be generally successful and doing the kind of work you want to do, find one who looks most like the kind of academic you want to become and talk to them about their experience. Chances are, your faculty advisors will overlap with theirs and your experience will be similar. Ex-students can tell you the strengths of weaknesses of the program you are considering and what to watch out for. If they had a horrible experience, there’s a decent chance you will too, and they will tell you so.

Doing these two things means you don’t have to worry about trying to think of all the axes on which you want to evaluate a program or pour through admissions material which is only tangentially connected to the reality you’ll live for a long time. What matters most is the outcomes, of course, because you’re be living the rest of your life for a lot longer than you’ll be in the PhD program.