planetwikimedia – copyrighteous

November 9, 2021

The Hidden Costs of Requiring Accounts

Should online communities require people to create accounts before participating?

This question has been a source of disagreement among people who start or manage online communities for decades. Requiring accounts makes some sense since users contributing without accounts are a common source of vandalism, harassment, and low quality content. In theory, creating an account can deter these kinds of attacks while still making it pretty quick and easy for newcomers to join. Also, an account requirement seems unlikely to affect contributors who already have accounts and are typically the source of most valuable contributions. Creating accounts might even help community members build deeper relationships and commitments to the group in ways that lead them to stick around longer and contribute more.

In a new paper published in Communication Research, I worked with Aaron Shaw provide an answer. We analyze data from “natural experiments” that occurred when 136 wikis on Fandom.com started requiring user accounts. Although we find strong evidence that the account requirements deterred low quality contributions, this came at a substantial (and usually hidden) cost: a much larger decrease in high quality contributions. Surprisingly, the cost includes “lost” contributions from community members who had accounts already, but whose activity appears to have been catalyzed by the (often low quality) contributions from those without accounts.

A version of this post was first posted on the Community Data Science blog.

The full citation for the paper is: Hill, Benjamin Mako, and Aaron Shaw. 2020. “The Hidden Costs of Requiring Accounts: Quasi-Experimental Evidence from Peer Production.” Communication Research, 48 (6): 771–95. https://doi.org/10.1177/0093650220910345.

If you do not have access to the paywalled journal, please check out this pre-print or get in touch with us. We have also released replication materials for the paper, including all the data and code used to conduct the analysis and compile the paper itself.

November 3, 2021

Q&A about doing a PhD with my research group

Ever considered doing research about online communities, free culture/software, and peer production full time? It’s PhD admission season and my research group—the Community Data Science Collective—is doing an open-to-anyone Q&A about PhD admissions this Friday November 5th. We’ve got room in the session and its not too late to sign up to join us!

The session will be a good opportunity to hear from and talk to faculty recruiting students to our various programs at the University of Washington, Purdue, and Northwestern and to talk with current and previous students in the group.

I am hoping to admit at least one new PhD advisee to the Department of Communication at UW this year (maybe more) and am currently co-advising (and/or have previously co-advised) students in UW’s Allen School of Computer Science & Engineering, Department of Human-Centered Design & Engineering, and Information School.

One thing to keep in mind is that my primary/home department—Communication—has a deadline for PhD applications of November 15th this year.

The registration deadline for the Q&A session is listed as today but we’ll do what we can to sneak you in even if you register late. That said, please do register ASAP so we can get you the link to the session!

https://blog.communitydata.science/join-us-for-the-cdsc-phd-application-q-a/

January 15, 2018June 14, 2025

OpenSym 2017 Program Postmortem

The International Symposium on Open Collaboration (OpenSym, formerly WikiSym) is the premier academic venue exclusively focused on scholarly research into open collaboration. OpenSym is an ACM conference which means that, like conferences in computer science, it’s really more like a journal that gets published once a year than it is like most social science conferences. The “journal”, in this case, is called the Proceedings of the International Symposium on Open Collaboration and it consists of final copies of papers which are typically also presented at the conference. Like journal articles, papers that are published in the proceedings are not typically published elsewhere.

Along with Claudia Müller-Birn from the Freie Universtät Berlin, I served as the Program Chair for OpenSym 2017. For the social scientists reading this, the role of program chair is similar to being an editor for a journal. My job was not to organize keynotes or logistics at the conference—that is the job of the General Chair. Indeed, in the end I didn’t even attend the conference! Along with Claudia, my role as Program Chair was to recruit submissions, recruit reviewers, coordinate and manage the review process, make final decisions on papers, and ensure that everything makes it into the published proceedings in good shape.

In OpenSym 2017, we made several changes to the way the conference has been run:

In previous years, OpenSym had tracks on topics like free/open source software, wikis, open innovation, open education, and so on. In 2017, we used a single track model.
Because we eliminated tracks, we also eliminated track-level chairs. Instead, we appointed Associate Chairs or ACs.
We eliminated page limits and the distinction between full papers and notes.
We allowed authors to write rebuttals before reviews were finalized. Reviewers and ACs were allowed to modify their reviews and decisions based on rebuttals.
To assist in assigning papers to ACs and reviewers, we made extensive use of bidding. This means we had to recruit the pool of reviewers before papers were submitted.

Although each of these things have been tried in other conferences, or even piloted within individual tracks in OpenSym, all were new to OpenSym in general.

Overview

Statistics
Papers submitted	44
Papers accepted	20
Acceptance rate	45%
Posters submitted	2
Posters presented	9
Associate Chairs	8
PC Members	59
Authors	108
Author countries	20

The program was similar in size to the ones in the last 2-3 years in terms of the number of submissions. OpenSym is a small but mature and stable venue for research on open collaboration. This year was also similar, although slightly more competitive, in terms of the conference acceptance rate (45%—it had been slightly above 50% in previous years).

As in recent years, there were more posters presented than submitted because the PC found that some rejected work, although not ready to be published in the proceedings, was promising and advanced enough to be presented as a poster at the conference. Authors of posters submitted 4-page extended abstracts for their projects which were published in a “Companion to the Proceedings.”

Topics

Over the years, OpenSym has established a clear set of niches. Although we eliminated tracks, we asked authors to choose from a set of categories when submitting their work. These categories are similar to the tracks at OpenSym 2016. Interestingly, a number of authors selected more than one category. This would have led to difficult decisions in the old track-based system.

distribution of papers across topics with breakdown by accept/poster/reject

The figure above shows a breakdown of papers in terms of these categories as well as indicators of how many papers in each group were accepted. Papers in multiple categories are counted multiple times. Research on FLOSS and Wikimedia/Wikipedia continue to make up a sizable chunk of OpenSym’s submissions and publications. That said, these now make up a minority of total submissions. Although Wikipedia and Wikimedia research made up a smaller proportion of the submission pool, it was accepted at a higher rate. Also notable is the fact that 2017 saw an uptick in the number of papers on open innovation. I suspect this was due, at least in part, to work by the General Chair Lorraine Morgan’s involvement (she specializes in that area). Somewhat surprisingly to me, we had a number of submission about Bitcoin and blockchains. These are natural areas of growth for OpenSym but have never been a big part of work in our community in the past.

Scores and Reviews

As in previous years, review was single blind in that reviewers’ identities are hidden but authors identities are not. Each paper received between 3 and 4 reviews plus a metareview by the Associate Chair assigned to the paper. All papers received 3 reviews but ACs were encouraged to call in a 4th reviewer at any point in the process. In addition to the text of the reviews, we used a -3 to +3 scoring system where papers that are seen as borderline will be scored as 0. Reviewers scored papers using full-point increments.

scores for each paper submitted to opensym 2017: average, distribution, etc

The figure above shows scores for each paper submitted. The vertical grey lines reflect the distribution of scores where the minimum and maximum scores for each paper are the ends of the lines. The colored dots show the arithmetic mean for each score (unweighted by reviewer confidence). Colors show whether the papers were accepted, rejected, or presented as a poster. It’s important to keep in mind that two papers were submitted as posters.

Although Associate Chairs made the final decisions on a case-by-case basis, every paper that had an average score of less than 0 (the horizontal orange line) was rejected or presented as a poster and most (but not all) papers with positive average scores were accepted. Although a positive average score seemed to be a requirement for publication, negative individual scores weren’t necessary showstoppers. We accepted 6 papers with at least one negative score. We ultimately accepted 20 papers—45% of those submitted.

Rebuttals

This was the first time that OpenSym used a rebuttal or author response and we are thrilled with how it went. Although they were entirely optional, almost every team of authors used it! Authors of 40 of our 46 submissions (87%!) submitted rebuttals.

Lower	Unchanged	Higher
6	24	10

The table above shows how average scores changed after authors submitted rebuttals. The table shows that rebuttals’ effect was typically neutral or positive. Most average scores stayed the same but nearly two times as many average scores increased as decreased in the post-rebuttal period. We hope that this made the process feel more fair for authors and I feel, having read them all, that it led to improvements in the quality of final papers.

Page Lengths

In previous years, OpenSym followed most other venues in computer science by allowing submission of two kinds of papers: full papers which could be up to 10 pages long and short papers which could be up to 4. Following some other conferences, we eliminated page limits altogether. This is the text we used in the OpenSym 2017 CFP:

There is no minimum or maximum length for submitted papers. Rather, reviewers will be instructed to weigh the contribution of a paper relative to its length. Papers should report research thoroughly but succinctly: brevity is a virtue. A typical length of a “long research paper” is 10 pages (formerly the maximum length limit and the limit on OpenSym tracks), but may be shorter if the contribution can be described and supported in fewer pages— shorter, more focused papers (called “short research papers” previously) are encouraged and will be reviewed like any other paper. While we will review papers longer than 10 pages, the contribution must warrant the extra length. Reviewers will be instructed to reject papers whose length is incommensurate with the size of their contribution.

The following graph shows the distribution of page lengths across papers in our final program.

histogram of paper lengths for final accepted papers In the end 3 of 20 published papers (15%) were over 10 pages. More surprisingly, 11 of the accepted papers (55%) were below the old 10-page limit. Fears that some have expressed that page limits are the only thing keeping OpenSym from publshing enormous rambling manuscripts seems to be unwarranted—at least so far.

Bidding

Although, I won’t post any analysis or graphs, bidding worked well. With only two exceptions, every single assigned review was to someone who had bid “yes” or “maybe” for the paper in question and the vast majority went to people that had bid “yes.” However, this comes with one major proviso: people that did not bid at all were marked as “maybe” for every single paper.

Given a reviewer pool whose diversity of expertise matches that in your pool of authors, bidding works fantastically. But everybody needs to bid. The only problems with reviewers we had were with people that had failed to bid. It might be reviewers who don’t bid are less committed to the conference, more overextended, more likely to drop things in general, etc. It might also be that reviewers who fail to bid get poor matches which cause them to become less interested, willing, or able to do their reviews well and on time.

Having used bidding twice as chair or track-chair, my sense is that bidding is a fantastic thing to incorporate into any conference review process. The major limitations are that you need to build a program committee (PC) before the conference (rather than finding the perfect reviewers for specific papers) and you have to find ways to incentivize or communicate the importance of getting your PC members to bid.

Conclusions

The final results were a fantastic collection of published papers. Of course, it couldn’t have been possible without the huge collection of conference chairs, associate chairs, program committee members, external reviewers, and staff supporters.

Although we tried quite a lot of new things, my sense is that nothing we changed made things worse and many changes made things smoother or better. Although I’m not directly involved in organizing OpenSym 2018, I am on the OpenSym steering committee. My sense is that most of the changes we made are going to be carried over this year.

Finally, it’s also been announced that OpenSym 2018 will be in Paris on August 22-24. The call for papers should be out soon and the OpenSym 2018 paper deadline has already been announced as March 15, 2018. You should consider submitting! I hope to see you in Paris!

This Analysis

OpenSym used the gratis version of EasyChair to manage the conference which doesn’t allow chairs to export data. As a result, data used in this this postmortem was scraped from EasyChair using two Python scripts. Numbers and graphs were created using a knitr file that combines R visualization and analysis code with markdown to create the HTML directly from the datasets. I’ve made all the code I used to produce this analysis available in this git repository. I hope someone else finds it useful. Because the data contains sensitive information on the review process, I’m not publishing the data.

This blog post was originally posted on the Community Data Science Collective blog.

June 10, 2017June 14, 2025

The Wikipedia Adventure

I recently finished a paper that presents a novel social computing system called the Wikipedia Adventure. The system was a gamified tutorial for new Wikipedia editors. Working with the tutorial creators, we conducted both a survey of its users and a randomized field experiment testing its effectiveness in encouraging subsequent contributions. We found that although users loved it, it did not affect subsequent participation rates.

A major concern that many online communities face is how to attract and retain new contributors. Despite it’s success, Wikipedia is no different. In fact, researchers have shown that after experiencing a massive initial surge in activity, the number of active editors on Wikipedia has been in slow decline since 2007.

The number of active, registered editors (≥5 edits per month) to Wikipedia over time. From Halfaker, Geiger, and Morgan 2012.

Research has attributed a large part of this decline to the hostile environment that newcomers experience when begin contributing. New editors often attempt to make contributions which are subsequently reverted by more experienced editors for not following Wikipedia’s increasingly long list of rules and guidelines for effective participation.

This problem has led many researchers and Wikipedians to wonder how to more effectively onboard newcomers to the community. How do you ensure that new editors Wikipedia quickly gain the knowledge they need in order to make contributions that are in line with community norms?

To this end, Jake Orlowitz and Jonathan Morgan from the Wikimedia Foundation worked with a team of Wikipedians to create a structured, interactive tutorial called The Wikipedia Adventure. The idea behind this system was that new editors would be invited to use it shortly after creating a new account on Wikipedia, and it would provide a step-by-step overview of the basics of editing.

The Wikipedia Adventure was designed to address issues that new editors frequently encountered while learning how to contribute to Wikipedia. It is structured into different ‘missions’ that guide users through various aspects of participation on Wikipedia, including how to communicate with other editors, how to cite sources, and how to ensure that edits present a neutral point of view. The sequence of the missions gives newbies an overview of what they need to know instead of having to figure everything out themselves. Additionally, the theme and tone of the tutorial sought to engage new users, rather than just redirecting them to the troves of policy pages.

Those who play the tutorial receive automated badges on their user page for every mission they complete. This signals to veteran editors that the user is acting in good-faith by attempting to learn the norms of Wikipedia.

An example of a badge that a user receives after demonstrating the skills to communicate with other users on Wikipedia.

Once the system was built, we were interested in knowing whether people enjoyed using it and found it helpful. So we conducted a survey asking editors who played the Wikipedia Adventure a number of questions about its design and educational effectiveness. Overall, we found that users had a very favorable opinion of the system and found it useful.

Survey responses about how users felt about TWA.

Survey responses about what users learned through TWA.

We were heartened by these results. We’d sought to build an orientation system that was engaging and educational, and our survey responses suggested that we succeeded on that front. This led us to ask the question – could an intervention like the Wikipedia Adventure help reverse the trend of a declining editor base on Wikipedia? In particular, would exposing new editors to the Wikipedia Adventure lead them to make more contributions to the community?

To find out, we conducted a field experiment on a population of new editors on Wikipedia. We identified 1,967 newly created accounts that passed a basic test of making good-faith edits. We then randomly invited 1,751 of these users via their talk page to play the Wikipedia Adventure. The rest were sent no invitation. Out of those who were invited, 386 completed at least some portion of the tutorial.

We were interested in knowing whether those we invited to play the tutorial (our treatment group) and those we didn’t (our control group) contributed differently in the first six months after they created accounts on Wikipedia. Specifically, we wanted to know whether there was a difference in the total number of edits they made to Wikipedia, the number of edits they made to talk pages, and the average quality of their edits as measured by content persistence.

We conducted two kinds of analyses on our dataset. First, we estimated the effect of inviting users to play the Wikipedia Adventure on our three outcomes of interest. Second, we estimated the effect of playing the Wikipedia Adventure, conditional on having been invited to do so, on those same outcomes.

To our surprise, we found that in both cases there were no significant effects on any of the outcomes of interest. Being invited to play the Wikipedia Adventure therefore had no effect on new users’ volume of participation either on Wikipedia in general, or on talk pages specifically, nor did it have any effect on the average quality of edits made by the users in our study. Despite the very positive feedback that the system received in the survey evaluation stage, it did not produce a significant change in newcomer contribution behavior. We concluded that the system by itself could not reverse the trend of newcomer attrition on Wikipedia.

Why would a system that was received so positively ultimately produce no aggregate effect on newcomer participation? We’ve identified a few possible reasons. One is that perhaps a tutorial by itself would not be sufficient to counter hostile behavior that newcomers might experience from experienced editors. Indeed, the friendly, welcoming tone of the Wikipedia Adventure might contrast with strongly worded messages that new editors receive from veteran editors or bots. Another explanation might be that users enjoyed playing the Wikipedia Adventure, but did not enjoy editing Wikipedia. After all, the two activities draw on different kinds of motivations. Finally, the system required new users to choose to play the tutorial. Maybe people who chose to play would have gone on to edit in similar ways without the tutorial.

Ultimately, this work shows us the importance of testing systems outside of lab studies. The Wikipedia Adventure was built by community members to address known gaps in the onboarding process, and our survey showed that users responded well to its design.

While it would have been easy to declare victory at that stage, the field deployment study painted a different picture. Systems like the Wikipedia Adventure may inform the design of future orientation systems. That said, more profound changes to the interface or modes of interaction between editors might also be needed to increase contributions from newcomers.

This blog post, and the open access paper that it describes, is a collaborative project with Sneha Narayan, Jake Orlowitz, Jonathan Morgan, and Aaron Shaw. Financial support came from the US National Science Foundation (grants IIS-1617129 and IIS-1617468), Northwestern University, and the University of Washington. We also published all the data and code necessary to reproduce our analysis in a repository in the Harvard Dataverse. Sneha posted the material in this blog post over on the Community Data Science Collective Blog.

April 1, 2015April 1, 2015

RomancR: The Future of the Sharing-Your-Bed Economy

Today, Aaron Shaw and I are pleased to announce a new startup. The startup is based around an app we are building called RomancR that will bring the sharing economy directly into your bedrooms and romantic lives.

When launched, RomancR will bring the kind of market-driven convenience and efficiency that Uber has brought to ride sharing, and that AirBnB has brought to room sharing, directly into the most frustrating and inefficient domain of our personal lives. RomancR is Uber for romance and sex.

Here’s how it will work:

Users will view profiles of nearby RomancR users that match any number of user-specified criteria for romantic matches (e.g., sexual orientation, gender, age, etc).
When a user finds a nearby match who they are interested in meeting, they can send a request to meet in person. If they choose, users initiating these requests can attach an optional monetary donation to their request.
When a user receives a request, they can accept or reject the request with a simple swipe to the left or right. Of course, they can take the donation offer into account when making this decision or “counter-offer” with a request for a higher donation. Larger donations will increase the likelihood of an affirmative answer.
If a user agrees to meet in person, and if the couple then subsequently spends the night together — RomancR will measure this automatically by ensuring that the geolocation of both users’ phones match the same physical space for at least 8 hours — the donation will be transferred from the requester to the user who responded affirmatively.
Users will be able to rate each other in ways that are similar to other sharing economy platforms.

Of course, there are many existing applications like Tinder and Grindr that help facilitate romance, dating, and hookups. Unfortunately, each of these still relies on old-fashion “intrinsic” ways of motivating people to participate in romantic endeavors. The sharing economy has shown us that systems that rely on these non-monetary motivations are ineffective and limiting! For example, many altruistic and socially-driven ride-sharing systems existed on platforms like Craigslist or Ridejoy before Uber. Similarly, volunteer-based communities like Couchsurfing and Hospitality Club existed for many years before AirBnB. None of those older systems took off in the way that their sharing economy counterparts were able to!

The reason that Uber and AirBnB exploded where previous efforts stalled is that this new generation of sharing economy startups brings the power of markets to bear on the problems they are trying to solve. Money both encourages more people to participate in providing a service and also makes it socially easier for people to take that service up without feeling like they are socially “in debt” to the person providing the service for free. The result has been more reliable and effective systems for proving rides and rooms! The reason that the sharing economy works, fundamentally, is that it has nothing to do with sharing at all! Systems that rely on people’s social desire to share without money — projects like Couchsurfing — are relics of the previous century.

RomancR, which we plan to launch later this year, will bring the power and efficiency of markets to our romantic lives. You will leave your pitiful dating life where it belongs in the dustbin of history! Go beyond antiquated non-market systems for finding lovers. Why should we rely on people’s fickle sense of taste and attractiveness, their complicated ideas of interpersonal compatibility, or their sense of altruism, when we can rely on the power of prices? With RomancR, we won’t have to!

Note: Thanks to Yochai Benkler whose example of how leaving a $100 bill on the bedside table of a person with whom you spent the night can change the nature of the a romantic interaction inspired the idea for this startup.

December 29, 2014June 14, 2025

Consider the Redirect

In wikis, redirects are special pages that silently take readers from the page they are visiting to another page. Although their presence is noted in tiny gray text (see the image below) most people use them all the time and never know they exist. Redirects exist to make linking between pages easier, they populate Wikipedia’s search autocomplete list, and are generally helpful in organizing information. In the English Wikipedia, redirects make up more than half of all article pages.

Over the years, I’ve spent some time contributing to to Redirects for Discussion (RfD). I think of RfD as like an ultra-low stakes version of Articles for Deletion where Wikipedians decide whether to delete or keep articles. If a redirect is deleted, viewers are taken to a search results page and almost nobody notices. That said, because redirects are almost never viewed directly, almost nobody notices if a redirect is kept either!

I’ve told people that if they want to understand the soul of a Wikipedian, they should spend time participating in RfD. When you understand why arguing about and working hard to come to consensus solutions for how Wikipedia should handle individual redirects is an enjoyable way to spend your spare time — where any outcome is invisible — you understand what it means to be a Wikipedian.

That said, wiki researchers rarely take redirects into account. For years, I’ve suspected that accounting for redirects was important for Wikipedia research and that several classes of findings were noisy or misleading because most people haven’t done so. As a result, I worked with my colleague Aaron Shaw at Northwestern earlier this year to build a longitudinal dataset of redirects that can capture the dynamic nature of redirects. Our work was published as a short paper at OpenSym several months ago.

It turns out, taking redirects into account correctly (especially if you are looking at activity over time) is tricky because redirects are stored as normal pages by MediaWiki except that they happen to start with special redirect text. Like other pages, redirects can be updated and changed over time are frequently are. As a result, taking redirects into account for any study that looks at activity over time requires looking at the text of every revision of every page.

Using our dataset, Aaron and I showed that the distribution of edits across pages in English Wikipedia (a relationships that is used in many research projects) looks pretty close to log normal when we remove redirects and very different when you don’t. After all, half of articles are really just redirects and, and because they are just redirects, these “articles” are almost never edited.

Another puzzling finding that’s been reported in a few places — and that I repeated myself several times — is that edits and views are surprisingly uncorrelated. I’ll write more about this later but the short version is that we found that a big chunk of this can, in fact, be explained by considering redirects.

We’ve published our code and data and the article itself is online because we paid the ACM’s open access fee to ransom the article.

September 27, 2014June 14, 2017

Community Data Science Workshops Post-Mortem

Earlier this year, I helped plan and run the Community Data Science Workshops: a series of three (and a half) day-long workshops designed to help people learn basic programming and tools for data science tools in order to ask and answer questions about online communities like Wikipedia and Twitter. You can read our initial announcement for more about the vision.

The workshops were organized by myself, Jonathan Morgan from the Wikimedia Foundation, long-time Software Carpentry teacher Tommy Guy, and a group of 15 volunteer “mentors” who taught project-based afternoon sessions and worked one-on-one with more than 50 participants. With overwhelming interest, we were ultimately constrained by the number of mentors who volunteered. Unfortunately, this meant that we had to turn away most of the people who applied. Although it was not emphasized in recruiting or used as a selection criteria, a majority of the participants were women.

The workshops were all free of charge and sponsored by the UW Department of Communication, who provided space, and the eScience Institute, who provided food.

The curriculum for all four session session is online:

Friday April 4th: Setup and Programming Practice
Saturday April 5th: Introduction to Python
Saturday May 3rd: Building data sets using web APIs
Saturday May 31st: Data analysis and visualization

The workshops were designed for people with no previous programming experience. Although most our participants were from the University of Washington, we had non-UW participants from as far away as Vancouver, BC.

Feedback we collected suggests that the sessions were a huge success, that participants learned enormously, and that the workshops filled a real need in the Seattle community. Between workshops, participants organized meet-ups to practice their programming skills.

Most excitingly, just as we based our curriculum for the first session on the Boston Python Workshop’s, others have been building off our curriculum. Elana Hashman, who was a mentor at the CDSW, is coordinating a set of Python Workshops for Beginners with a group at the University of Waterloo and with sponsorship from the Python Software Foundation using curriculum based on ours. I also know of two university classes that are tentatively being planned around the curriculum.

Because a growing number of groups have been contacting us about running their own events based on the CDSW — and because we are currently making plans to run another round of workshops in Seattle late this fall — I coordinated with a number of other mentors to go over participant feedback and to put together a long write-up of our reflections in the form of a post-mortem. Although our emphasis is on things we might do differently, we provide a broad range of information that might be useful to people running a CDSW (e.g., our budget). Please let me know if you are planning to run an event so we can coordinate going forward.

March 16, 2014June 14, 2017

Community Data Science Workshops in Seattle

Photo from the Boston Python Workshop – a similar workshop run in Boston that has inspired and provided a template for the CDSW.

On three Saturdays in April and May, I will be helping run three day-long project-based workshops at the University of Washington in Seattle. The workshops are for anyone interested in learning how to use programming and data science tools to ask and answer questions about online communities like Wikipedia, Twitter, free and open source software, and civic media.

The workshops are for people with no previous programming experience and the goal is to bring together researchers as well as participants and leaders in online communities. The workshops will all be free of charge and open to the public given availability of space.

Our goal is that, after the three workshops, participants will be able to use data to produce numbers, hypothesis tests, tables, and graphical visualizations to answer questions like:

Are new contributors to an article in Wikipedia sticking around longer or contributing more than people who joined last year?
Who are the most active or influential users of a particular Twitter hashtag?
Are people who participated in a Wikipedia outreach event staying involved? How do they compare to people that joined the project outside of the event?

If you are interested in participating, fill out our registration form here. The deadline to register is Wednesday March 26th. We will let participants know if we have room for them by Saturday March 29th. Space is limited and will depend on how many mentors we can recruit for the sessions.

If you already have experience with Python, please consider helping out at the sessions as a mentor. Being a mentor will involve working with participants and talking them through the challenges they encounter in programming. No special preparation is required. If you’re interested, send me an email.

March 7, 2014March 7, 2014

V-Day

My friend Noah mentioned the game VVVVVV. I was confused because I thought he was talking about the visual programming language vvvv. I went to Wikipedia to clear up my confusion but ended up on the article on VVVVV which is about the Latin phrase “vi veri universum vivus vici” meaning, “by the power of truth, I, while living, have conquered the universe”.

There is no Wikipedia article on VVVVVVV. That would be ridiculous.

December 31, 2013December 31, 2013

“When Free Software Isn’t Better” Talk

In late October, the FSF posted this video of a talk called When Free Software Isn’t (Practically) Better that I gave at LibrePlanet earlier in the year. I noticed it was public when, out of the blue, I started getting both a bunch of positive feedback about the talk as well as many people pointing out that my slides (which were rather important) were not visible in the video!

Finally, I’ve managed to edit together a version that includes the slides and posted it online and on Youtube.

The talk is very roughly based on this 2010 article and I argue that, despite our advocacy, free software isn’t always (or even often) better in practical terms. The talk moves beyond the article and tries to be more constructive by pointing to a series of inherent practical benefits grounded in software freedom principles and practice.

Most important to me though, the talk reflects my first serious attempt to bring together some of the findings from my day job as a social scientist with my work as a free software advocate. I present some nuggets from my own research and talk about about what they mean for free software and its advocates.

In related news, it also seems worth noting that I’m planning on being back at LibrePlanet this March and that the FSF annual fundraiser is currently going on.

November 4, 2013November 4, 2013

Settling in Seattle

I defended my dissertation three months ago. Since then, it feels like everything has changed.

I’ve moved from Somerville to Seattle, moved from MIT to the University of Washington, and gone from being a graduate student to a professor. Mika and I have moved out of a multi-apartment cooperative into into a small apartment we’re calling Extraordinary Least Squares. We’ve gone from a broad and deep social network to (almost) starting from scratch in a new city.

As things settle and I develop a little extra bandwidth, I am trying to take time to get connected to my community. If you’re in Seattle and know me, drop me a line! If you’re in Seattle but don’t know me yet, do the same so we can fix that!

April 5, 2013April 5, 2013

Students for Free Culture Conference FCX2013

On the weekend of April 20-21, Students for Free Culture is going to be holding its annual conference, FCX2013, at New York Law School in New York City. As a long-time SFC supporter and member, I am enormously proud to be giving the opening keynote address.

Although the program for Sunday is still shaping up, the published Saturday schedule looks great. If previous years are any indication, the conference can serve as an incredible introduction to free culture, free software, wikis, remixing, copyright, patent and trademark reform, and participatory culture. For folks that are already deeply involved, FCX is among the best places I know to connect with other passionate, creative, people working on free culture issues.

I’ve been closely following and involved with SFC for years and I am particularly excited about the group that is driving the organization forward this year. If you will be in or near New York that weekend — or if you can make the trip — you should definitely try to attend.

FCX2013 is pay what you can with a $15 suggested donation. You can register online now. Travel assistance — especially for members of active SFC chapters — may still be available. I hope to see you there!

March 27, 2013March 28, 2013

The Institute for Cultural Diplomacy and Wikipedia

A month ago, Mark Donfried from the Institute for Cultural Diplomacy (ICD) — an organization dedicated to promoting open dialogue — sent me this letter threatening me with legal action because of contributions I’ve made to Wikipedia. Yesterday, he sent me this followup threat.

According to the letters, Donfried has threatened me with legal action because I participated in a discussion on Wikipedia that resulted in his organization’s article being deleted. It is not anything I wrote in any Wikipedia article that made Donfried so upset — although Donfried is also unhappy about at least one off-hand comment I made during the deletion discussion on a now-deleted Wikipedia process page. Donfried is unhappy that my actions, in small part, have resulted in his organization not having an article in Wikipedia. He is able to threaten me personally because — unlike many people — I edit Wikipedia using my real, full, name.

Donfried’s letter is the latest step in a saga that has been ongoing since last June. It has been a frustrating learning experience for me that has made me worried about Wikipedia, its processes, and its future.

In Wikipedia, debates can be won by stamina. If you care more and argue longer, you will tend to get your way. The result, very often, is that individuals and organizations with a very strong interest in having Wikipedia say a particular thing tend to win out over other editors who just want the encyclopedia to be solid, neutral, and reliable. These less-committed editors simply have less at stake and their attention is more distributed.

The ICD is a non-profit organization based in Berlin. According to its own website, a large part of the organization’s activities are based around arranging conferences. Its goals — peace, cultural interchange, human rights — are admirable and close to my heart. Its advisors and affiliates are impressive.

I had never heard of the ICD before their founder, Mark Donfried, emailed me in April 2012 asking me to give a keynote address at their conference on “The 2012 International Symposium on Cultural Diplomacy & Human Rights.” I replied, interested, but puzzled because my own research seems very far afield of both “cultural diplomacy” (which I had never heard of) and human rights. I replied saying:

What would you like me to talk about — I ask because I don’t consider myself an expert in (or even particularly knowledgeable about) cultural diplomacy. Did someone else refer you to me?

Donfried replied with a long message — seemingly copy and pasted — thanking me for considering attending and asking me for details of my talk. I replied again repeating text from my previous email and asking why he was interested in me. Donfried suggested a phone call to talk about details. But by this point, I had looked around the web for information about the ICD and had decided to decline the invitation.

Among things I found was a blog post by my friend James Grimmelmann that suggests that, at least in his case, the ICD had a history of sending unsolicited email and an apparently inability to take folks off their email lists even after repeated requests.

I also read the Wikipedia article about the ICD. Although the Wikipedia article was long and detailed, it sent off some internal Wikipedian-alarm-bells for me. The page read, to me, like an advertisement or something written by the organization being described; it simply did not read — to me — like an encyclopedia article written by a neutral third-party.

I looked through the history of the article and found that the article had been created by a user called Icd_berlin who had made no other substantive edits to the encyclopedia. Upon further examination, I found that almost all other significant content contributions were from a series of anonymous editors with IP addresses associated with Berlin. I also found that a couple edits had removed criticism when it had been added to the article. The criticism was removed by an anonymous editor from Berlin.

Criticisms on the article included links to a website called “Inside the ICD” which was a website that mostly consisted of comments by anonymous people claiming to be former interns of the ICD complaining about the working conditions at the organization. There were also many very positive descriptions of work at the ICD. A wide array of pseudonymous users on the site accused the negative commenters of being liars and detractors and the positive commenters of being ICD insiders.

I also found that there had been evidence on Wikipedia — also removed without discussion by an anonymous IP from Berlin — of an effort launched by the youth wing of ver.di — one of the largest trade unions in Germany to “campaign for good internships at the ICD.” Although details of the original campaign have been removed from ver.di’s website, the campaigned ended after coming to an agreement with the ICD that made explicit a set of expectations and created an Intern Council.

Although the article about ICD on Wikipedia had many citations, many were to the ICD’s own website. Most of the rest were to articles that only tangentially mentioned the ICD. Many were about people with ICD connections but did not mention the ICD at all.

As Wikipedia editor, I was worried that Wikipedia’s policies on conflict of interest, advertising, neutrality, and notability were not being served by the article in its state. But as someone with no real experience or knowledge of the ICD, I wasn’t sure what to do. I posted a request for help on Wikipedia asking for others to get involved and offer their opinions.

It turns out, there were several editors who had tried to improve the article in the past and had been met by pro-ICD editors reverting their changes. Eventually, those editors lost patience or simply moved on to other topics.

By raising the issue again, I kicked off a round of discussion about the article. At the termination of that discussion, the article was proposed for deletion under Wikipedia’s Articles for Deletion policy. A new Wikipedia editor began working enthusiastically to keep the article by adding links and by arguing that the article should stay. The new user edited the Wikipedia article about me to accuse me of slander and defamation although they removed that claim after I tried to explain that I was only trying to help. I spent quite a bit of time trying to rewrite and improve the article during the deletion discussion and I went — link by link — through the many dozens of citations.

During the deletion discussion, Mark Donfried contacted me over email and explained that his representatives had told him that I was working against the ICD in Wikipedia. He suggested that we meet. We had a tentative plan to meet in Berlin on an afternoon last July but, in the end, I was too busy trying to submit my thesis proposal and neither of us followed up to confirm a particular time within the time window we had set. I have still never met him.

My feeling, toward the end of the deletion discussion on Wikipedia, was mostly exasperation. Somewhat reluctantly, I voted to delete the article saying:

Delete – This AFD is a complete mess for all the reasons that the article itself is. Basically: there are a small number of people who seem to have a very strong interest in the Institute for Cultural Diplomacy having an article in Wikipedia and, from what I can tell, very little else. Hessin fahem, like all the major contributors to the page, joined Wikipedia in order to participate in this issue.

This article has serious problems. I have posted a detailed list of my problems on the article talk page: primary sources, conflict of interest for nearly all substantive contributions and reading like an advert are the biggest issues. My efforts to list these problems were reverted without discussion by an anonymous editor from Berlin.

I have seen no evidence that the Institute for Cultural Diplomacy satisfies WP:ORG but I agree that it is possible that it does. I strongly agree with Arxiloxos that articles should always be fixed, and not deleted, if they are fixable. But I also know that Wikipedia does not deserve this article, that I don’t know to fix it, and that despite my efforts to address these issues (and I’ll keep trying), the old patterns of editing have continued and the article is only getting worse.

This ICD seems almost entirely based around a model that involves organizing conferences and then calling and emailing to recruit speakers and attendees. A large number of people will visit this Wikipedia article to find out more about the organization before deciding to pay for a conference or to join to do an internship. What Wikipedia shows to them reads like an advert, links almost exclusively to of pages on the organizations’ websites and seems very likely to have been written by the organization itself. We are doing an enormous disservice to our readers by keeping this page in its current form.

If somebody wants to make a serious effort to improve the article, I will help and will happily reconsider my !vote. But after quite a bit of time trying to raise interest and to get this fixed, I’m skeptical this can be addressed and my decision reflects this fact. —mako ๛ 05:18, 12 June 2012 (UTC)

I concluded that although the organization might be notable according to Wikipedia’s policies and although the Wikipedia article about it might be fixable, the pattern of editing gave me no faith that it could be fixed until something changed.

When the article was deleted, things became quiet. Several months later a new article was created — again, by an anonymous user with no other edit history. Although people tend to look closely at previously deleted new pages, this page was created under a different name: “The Institute of Cultural Diplomacy” and was not noticed.

Deleted Wikipedia articles are only supposed to be recreated after they go through a process called deletion review. Because the article was recreated out of this process, I nominated it for what is called speedy deletion under a policy specifically dealing with recreated articles. It was deleted again. Once again, things were quiet.

In January, it seems, the “Inside the ICD website” was threatened with a lawsuit by the ICD and the maintainers of the site took it down with the following message:

Apparently, the ICD is considering filing a lawsuit against this blog and it will now be taken down. We completely forgot about this blog. Let’s hope no one is being sued. Farewell.

On February 25, the Wikipedia article on ICD was recreated — once again out of process and by a user with almost no previous edit history. The next day, I received an email from Mark Donfried. In the message, Donfried said:

Please note that the ICD is completely in favor of fostering open dialogue and discussions, even critical ones, however some of your activities are raising serious questions about the motives behind your actions and some even seem to be motives of sabotage, since they resulted in ICD not having any Wikipedia page at all.

We are deeply concerned regarding these actions of yours, which are causing us considerable damages. As the person who initiated these actions with Wikipedia and member of the board of Wikipedia [1], we would therefore request your answer regarding our questions below within the next 10 days (by March 6th). If we do not receive your response we will unfortunately have to consider taking further legal actions with these regards against you and other anonymous editors.

I responded to Donfried to say that I did not think it was prudent to speak with him while he was threatening me. Meanwhile, other Wikipedia editors nominated the ICD article for deletion once again and unanimously decided to delete it. And although I did not participate in the discussion, Donfried emailed again with more threats of legal action hours after the ICD article was deleted:

[A]s the case of the ICD and its presentation on the Wikipedia has seriously worsened in recent days, we see no alternative but to forward this case (including all relevant visible and anonymous contributors) to our legal representatives in both USA and Europe/Germany as well as to the authorities and other corresponded organizations in order to find a remedy to this case.

Donfried has made it very clear that his organization really wants a Wikipedia article and that they believe they are being damaged without one. But the fact that he wants one doesn’t mean that Wikipedia’s policies mean he should have one. Anonymous editors in Berlin and in unknown locations have made it clear that they really want a Wikipedia article about the ICD that does not include criticism. Not only do Wikipedia’s policies and principles not guarantee them this, Wikipedia might be hurt as a project when this happens.

The ICD claims to want to foster open dialogue and criticism. I think they sound like a pretty nice group working toward issues I care about personally. I wish them success.

But there seems to be a disconnect between their goals and the actions of both their leader and proponents. Because I used my real name and was skeptical about the organization on discussion pages on Wikipedia, I was tracked down and threatened. Donfried insinuated that I was motivated to “sabotage” his organization and threatened legal action if I do not answer his questions. The timing of his first letter — the day after the ICD page was recreated — means that I was unwilling to act on my commitment to Wikipedia and its policies.

I have no problem with the ICD and I deeply regret being dragged into this whole mess simply because I wanted to improve Wikipedia. That said, Donfried’s threat has scared me off from attempts to improve the ICD articles. I suspect I will not edit ICD pages in Wikipedia in the future.

The saddest part for me is that I recognize that what is in effect bullying is working. There are currently Wikipedia articles about the ICD in many languages. For several years, ICD has had an article on English Wikipedia. For almost all of that period, that article has consisted entirely of universally positive text, without criticism, and has been written almost entirely by anonymous editors who have only contributed to articles related to the ICD.

In terms of the ICD and its article on Wikipedia, I still have hope. I encourage Donfried and his “representatives” to create accounts on Wikipedia with their full names — just like I have. I encourage them to engage in open dialogue in public on the wiki. I encourage them go through deletion review, make any conflicts of interest they have unambiguously clear, and to work with demonstrably non-conflicted editors on Wikipedia to establish notability under Wikipedia’s policies. The result still can be an awesome, neutral, article about their organization. I have offered both advice on how to do this and help in that process in the past. I have faith this can happen and I will be thrilled when it does.

But the general case still worries me deeply. If I can be scared off by threats like these, anybody can. After all, I have friends at the Wikimedia Foundation, a position at Harvard Law School, and am close friends with many of the world’s greatest lawyer-experts on both wikis and cyberlaw. And even I am intimidated into not improving the encyclopedia.

I am concerned by what I believe is the more common case — where those with skin in the game will fight harder and longer than a random Wikipedian. The fact that it’s usually not me on the end of the threat gives me lots of reasons to worry about Wikipedia at a time when its importance and readership continues to grow as its editor-base remains stagnant.

[1]	It’s a minor mistake but worth pointing out that I am not on the “board of Wikipedia”; I am on its advisory board which carries no power or responsbility within the organization. Sometimes, the foundation asks for my advice and I happily give it.

January 16, 2013January 18, 2013

1-800-INTERNET.COM

I just returned home from Aaron Swartz’s funeral in Chicago. Aaron was a good friend. The home I’ve returned to is an apartment that was Aaron’s before it was mine, that I have lived in with Aaron during several stints, and that I still share with many of his old books and posters. Although, I’ve spent what feels like most of the last five days reading things that people have written about Aaron, I’m still processing and digesting myself. Right now, I’m very sad and at a loss for words.

While I reflect, I wanted to share this video recently put online by Finne Boonen. The video was made in 2006 at a Web 1.0 Elevator Pitch Competition held at Wikimania 2006 — about a year after that both Aaron and I moved to Cambridge and met. The goal of the contest was to pitch Web 1.0 DotCom business ideas to a team of real Web 1.0 investors like it was still 1999.

Aaron and I formed a team along with SJ Klein (who I traveled to the funeral with this week), and Wikimania general counsel and interim executive director Brad Patrick. The video shows how — as Danny O’Brien has reminded us — Aaron was funny. He came up with many our teams’ best lines in addition to checking our Web 1.0 boxes for “tech guru” and “Stanford dropout.” Our pitch — for 1-800-INTERNET.COM — is in the video below. The transcript was done by Phoebe Ayers in Facebook and the video is also available in WebM.

SJ: You know, Mako and I had some pretty good ideas for improving connectivity to the internet, and we think we can reach 90% of the world’s population.

So think about this. You’re sitting in a Starbucks, and you need to connect to the internet. But you can’t, because there’s no internet. But what is there, near every Starbucks? There’s a payphone! You pick up the payphone, and you call…. 1-800-INTERNET. You can connect to our bank of researchers on our fast T1 connections and get any information you need!

So, we don’t actually have 1-800-INTERNET yet, we have 1-800-225-3224, so the first thing we need to do is buy the number.

So here’s Mako, who is our web designer from UC Santa Cruz and Bradford, our financial guru, and Aaron, who’s handling all of our technical implementation. But Mako, you should explain the earballs.

Mako: So, so, so yeah, so most people on the Internet are going for the eyeballs, but they’ve just left all of these … earballs. So I have some experience in web design, and it’s true that this isn’t really a website, but we still need good web design. So, so, I’ve actually got a really experienced team, we can go into later, and we have some really great earcons … not icons, but earcons..

And it’s going to be all together, not apart like some of the websites. It’s going to be together.

Brad: so how does this work technically?

Aaron: Well, I mean, so I only spent one year at Stanford but that’s Ok, because there are new developmental technologies, we’re going to throw away all that old stuff, we’re going to use really reliable and efficient well-designed code that everyone can clearly understand, and write the whole thing in Perl. I know this is a risk, but I am confident that Perl is going to destroy those old C websites. No one will write websites in C anymore once we do this, it’s going to be so much faster, and so dynamic, everythings going to be like, on top of everything. It’s going to be great.

Bradford: So here’s the business model. It’s really really simple, and it’s a really really great idea. It’s all about the licensing. Because what we’re going to have are these underlying audio ads, While you’re on the phone you’re going to hear this subliminal advertising message. And the way it works is really really cool, because it’s really really low volume, it’s high impact! And it’s even better, because we license it, and the way it works is when a caller calls 1-800-Internet, they’re hearing the ad, but so is the representative, so we get to bill ’em twice!

So that’s it:

All: 1-800-INTERNET.COM

We did not win and I still believe that we were robbed.

January 11, 2013January 12, 2013

Goodbye PyBlosxom, Hello WordPress

Since 2004, I’ve used the blogging software PyBlosxom. Over that time, the software has served me well and I have even written a series of patches and plugins.

PyBloxsom is blog software designed for hackers. It assumes you already have a text editor you love and relies on features of a POSIX filesystem instead of a relational database. It’s designed so you can keep your blog under revision control (since 2004 I’ve used GNU Arch, baz, bzr and now git). It is also hackers’ software in the sense that you should expect to write code to use it (e.g., the configuration is pure Python). I love it.

What PyBlosxom does not have is a large community. This summer, the most recent long-time maintainer of the project, Will Kahn-Greene, stepped down. Although the project eventually found a new maintainer, the reality is the project entered maintenance mode years ago and many features available in more popular blogging platforms are unlikely to make it PyBlosxom. The situation with comment spam is particularly dire. I’ve written several antispam plugins but time has shown that I don’t have the either the expertise or the time to make them as awesome as they need to be to really work in today’s web.

So, after many months of weighing, waffling, and planning I’ve switched to WordPress — a great piece of free software with an enormous and established community As you’ll know if you’ve read my interview on The Setup, I think a lot about the technology I surround myself with. I considered WordPress when I started my blog back in 2004 and rejected it soundly. Eight years ago, I would have laughed at you if you told me I’d be using it today; If PyBlosxom is for hackers, WordPress is designed for everyone else. But the way I evaluate software has changed over that period.

In the nineties, I used to download every new version of the Linux kernel to compile it — it took hours! — to try out the latest features. Configurabilty, hackability, and the ability to write my own features was — after a point — more important than the features the software came with. Today, I’m much more aware of the fact that for all the freedom that my software gives me, I simply do not have the time, energy, or inclination to take advantage of that freedom to hack very often.

Today, I give much more value to software that is not just free, but that is maintained by a community of people who can do all the work that I would do if I had unlimited time. Although I don’t have the time or experience to make WordPress do everything I would like, the collective of all WordPress users does. And they’ve done a lot of it already!

The flip side matters as well: Today, I give more value to other people using my software. When WordPress doesn’t do something and I write a plugin or patch, there are tons of people ready to pick it up and use it and perhaps even to collaborate on it with me. Want to guess how many patches my PyBlosxom plugins have received? None, if my memory serves me.

In the past, I’ve written about how free software is a victory even when it doesn’t build a community. I still believe that. But the large communities at the heart of the most successful free software communities (the promise of “open source”) are deeply important in a way that I increasingly value.

In that spirit: If you want to make the jump from PyBlosxom to WordPress, I’ve shared a Git repository with the scripts I used and wrote for the transition.