Category Archives: Blog Posts

Community Data Science Workshops in Seattle

Photo from the Boston Python Workshop – a similar workshop run in Boston that has inspired and provided a template for the CDSW.

Photo from the Boston Python Workshop – a similar workshop run in Boston that has inspired and provided a template for the CDSW.

On three Saturdays in April and May, I will be helping run three day-long project-based workshops at the University of Washington in Seattle. The workshops are for anyone interested in learning how to use programming and data science tools to ask and answer questions about online communities like Wikipedia, Twitter, free  and open source software, and civic media.

The workshops are for people with no previous programming experience and the goal is to bring together researchers as well as participants and leaders in online communities.  The workshops will all be free of charge and open to the public given availability of space.

Our goal is that, after the three workshops, participants will be able to use data to produce numbers, hypothesis tests, tables, and graphical visualizations to answer questions like:

  • Are new contributors to an article in Wikipedia sticking around longer or contributing more than people who joined last year?
  • Who are the most active or influential users of a particular Twitter hashtag?
  • Are people who participated in a Wikipedia outreach event staying involved? How do they compare to people that joined the project outside of the event?

If you are interested in participating, fill out our registration form here. The deadline to register is Wednesday March 26th.  We will let participants know if we have room for them by Saturday March 29th. Space is limited and will depend on how many mentors we can recruit for the sessions.

If you already have experience with Python, please consider helping out at the sessions as a mentor. Being a mentor will involve working with participants and talking them through the challenges they encounter in programming. No special preparation is required.  If you’re interested,  send me an email.

V-Day

My friend Noah mentioned the game VVVVVV. I was confused because I thought he was talking about the visual programming language vvvv. I went to Wikipedia to clear up my confusion but ended up on the article on VVVVV which is about the Latin phrase “vi veri universum vivus vici” meaning, “by the power of truth, I, while living, have conquered the universe”.

There is no Wikipedia article on VVVVVVV. That would be ridiculous.

Aaron Swartz — A Year Later

My friend Aaron Swartz died a little more than a year ago. This time last year, I was spending much of my time speaking with journalists and reading what they were writing about Aaron.

Since the anniversary of his death, I have tried to take time to remember Aaron. I’ve returned to the things I wrote and the things I said including this short article — published last year in Red Pepper — that SJ Klein and I wrote together but that I forgot to mention on my blog.

I’m also excited to see that a documentary film about Aaron premiered at the Sundance Film Festival last week. I was interviewed for the film but am not in it.

As I said last year at a memorial for Aaron, I think about Aaron frequently and often think about my own decisions in terms of what Aaron would have done. I continued to be optimistic about the potential for Aaron-inspired action.

My Geekhouse Bike Frame

In 2011, Mika and I bought in big at the Boston Red Bones party’s charity raffle — supporting MassBike and NEMBA — and came out huge. I won $500 off a custom frame at Geekouse Bikes.

For years, Mika and I have been planning to do the Tour d’Afrique route (Capetown to Cairo), unsupported, on bike. People that do this type of ride sometimes use an expedition touring frame. I worked with Marty Walsh at Geekhouse to design a bike based on this idea. The concept was a rugged steel touring frame, built for my body and comfortable over long distances, with two quirks:

  1. It’s designed for 26 inch mountain bike wheels and mountain bike components to ensure that the bike is repairable with parts from the kinds of cheap mountain bikes that can be found almost everywhere in the world.
  2. It includes S&S torque couplers that let me split the frame in half to travel with the bike as standard luggage.

As our pan-Africa trip kept getting pushed back, so did the need for the bike. Last week, I finally picked up the finished bike from Marty’s shop in Boston. It is gorgeous. I absolutely love it.

Picture of Geekhouse frame (1)Picture of Geekhouse frame (2)Picture of Geekhouse frame (4) Picture of Geekhouse frame (3)

I’m looking forward to building up the bicycle over the next couple months and I’ll post more pictures when it’s finished. I am blown away by Marty’s craftsmanship and attention to detail. I am psyched that his donation made this bike possible and that I was able to get the frame while helping cycling in Massachusetts!

“When Free Software Isn’t Better” Talk

In late October, the FSF posted this video of a talk called When Free Software Isn’t (Practically) Better that I gave at LibrePlanet earlier in the year. I noticed it was public when, out of the blue, I started getting both a bunch of positive feedback about the talk as well as many people pointing out that my slides (which were rather important) were not visible in the video!

Finally, I’ve managed to edit together a version that includes the slides and posted it online and on Youtube.

The talk is very roughly based on this 2010 article and I argue that, despite our advocacy, free software isn’t always (or even often) better in practical terms. The talk moves beyond the article and tries to be more constructive by pointing to a series of inherent practical benefits grounded in software freedom principles and practice.

Most important to me though, the talk reflects my first serious attempt to bring together some of the findings from my day job as a social scientist with my work as a free software advocate. I present some nuggets from my own research and talk about about what they mean for free software and its advocates.

In related news, it also seems worth noting that I’m planning on being back at LibrePlanet this March and that the FSF annual fundraiser is currently going on.

Settling in Seattle

Seattle from the airI defended my dissertation three months ago. Since then, it feels like everything has changed.

I’ve moved from Somerville to Seattle, moved from MIT to the University of Washington, and gone from being a graduate student to a professor. Mika and I have moved out of a multi-apartment cooperative into into a small apartment we’re calling Extraordinary Least Squares. We’ve gone from a broad and deep social network to (almost) starting from scratch in a new city.

As things settle and I develop a little extra bandwidth, I am trying to take time to get connected to my community. If you’re in Seattle and know me, drop me a line! If you’re in Seattle but don’t know me yet, do the same so we can fix that!

Doctor of Philosophy

On Wednesday, I successfully defended my PhD dissertation in front of a ridiculously packed house at the MIT Media Lab. I am humbled by the support shown by the MIT Sloan, Media Lab, and Harvard communities. Earlier today, I finished up paperwork and submitted my archival copies. I’m done.

Although I’ve often heard PhDs described as emotional roller coasters, I feel enormously blessed in that I honestly can’t relate. My eight years at MIT and Harvard have been almost universally positive and I have learned and grown indescribably. As excited as I am about my next chapter at the University of Washington, I’m going to miss my life here. Deeply.

My dissertation was three essays on volunteer mobilization in peer production. Once I have a chance to catch up and recover, I’ll be posting the previously unpublished pieces. The Remixing Dilemma was included in the dissertation and is already online. The Media Lab AV team shot professional video of the talk. When I get a copy of the video, I’ll post that too.

But because I think it’s important, I’ve formatted and published the acknowledgments section of the dissertation today. Although there are too many folks to thank, I’ve highlighted the contributions of my co-authors, and friends, Aaron Shaw and Andrés Monroy Hernández and my almost unbelievably incredible group of advisors: Eric von Hippel, Yochai Benkler, Mitch Resnick, and Tom Malone.

The Wikipedia Gender Gap Revisited

In a new paper, recently published in the open access journal PLOSONE, Aaron Shaw and I build on new research in survey methodology to describe a method for estimating bias in opt-in surveys of contributors to online communities. We use the technique to reevaluate the most widely cited estimate of the gender gap in Wikipedia.

A series of studies have shown that Wikipedia’s editor-base is overwhelmingly male. This extreme gender imbalance threatens to undermine Wikipedia’s capacity to produce high quality information from a full range of perspectives. For example, many articles on topics of particular interest to women tend to be under-produced or of poor quality.

Given the open and often anonymous nature of online communities, measuring contributor demographics is a challenge. Most demographic data on Wikipedia editors come from “opt-in” surveys where people respond to open, public invitations. Unfortunately, very few people answer these invitations. Results from opt-in surveys are unreliable because respondents are rarely representative of the community as a whole. The most widely-cited estimate from a large 2008 survey by the Wikimedia Foundation (WMF) and UN University in Maastrict (UNU-MERIT) suggested that only 13% of contributors were female. However, the very same survey suggested that less than 40% of Wikipedia’s readers were female. We know, from several reliable sources, that Wikipedia’s readership is evenly split by gender — a sign of bias in the WMF/UNU-MERIT survey.

In our paper, we combine data from a nationally representative survey of the US by the Pew Internet and American Life Project with the opt-in data from the 2008 WMF/UNU-MERIT survey to come up with revised estimates of the Wikipedia gender gap. The details of the estimation technique are in the paper, but the core steps are:

  1. We use the Pew dataset to provide baseline information about Wikipedia readers.
  2. We apply a statistical technique called “propensity scoring” to estimate the likelihood that a US adult Wikipedia reader would have volunteered to participate in the WMF/UNU-MERIT survey.
  3. We follow a process originally developed by Valliant and Dever to weight the WMF/UNU-MERIT survey to “correct” for estimated bias.
  4. We extend this weighting technique to Wikipedia editors in the WMF/UNU data to produce adjusted estimates of the demographics of their sample.

Using this method, we estimate that the proportion of female US adult editors was 27.5% higher than the original study reported (22.7%, versus 17.8%), and that the total proportion of female editors was 26.8% higher (16.1%, versus 12.7%). These findings are consistent with other work showing that opt-in surveys tend to undercount women.

Overall, these results reinforce the basic substantive finding that women are vastly under-represented among Wikipedia editors.

Beyond Wikipedia, our paper describes a method online communities can adopt to estimate contributor demographics using opt-in surveys, but that is more credible than relying entirely on opt-in data. Advertising-intelligence firms like ComScore and Quantcast provide demographic data on the readership of an enormous proportion of websites. With these sources, almost any community can use our method (and source code) to replicate a similar analysis by: (1) surveying a community’s readers (or a random subset) with the same instrument used to survey contributors; (2) combining results for readers with reliable demographic data about the readership population from a credible source; (3) reweighting survey results using the method we describe.

Although our new estimates will not help us us close the gender gap in Wikipedia or address its troubling implications, they give us a better picture of the problem. Additionally, our method offers an improved tool to build a clearer demographic picture of other online communities in general.

Iceowl’s Awesome New Icon

If you’re a Debian user, you are probably already familiar with some of the awesome icons for IceWeasel (rebranded Mozilla Firefox), IceDove (rebranded Mozilla Thunderbird) and IceApe (rebranded Mozilla SeaMonkey).

iceweasel_icon-200pxicedove_icon-200px    iceape_icon-200px

I was pretty ambivalent about the decision to rebrand Firefox until I saw some of proposed the IceWeasel icons which — in my humble opinion — were just too cute, and awesome, to pass up.

iceweasel-old

Until very recently however, IceOwl (rebranded Mozilla Sundbird) had no such awesome icon. Quite a while ago, I filed bug #658664 in Debian complaining that “iceowl does not include awesome icy owl icons.” I wrote:

I was extremely disappointed when I installed Iceowl and discovered that it does not ship with an awesome logo or icons showing a picture of an “IceOwl.” Instead, it seems to be represented by picture of a (boring) paper calendar which is very generic and not awesome at all.

IceWeasel, IceDove, and IceApe each include extremely awesome logos/icons that have really cool looking white illustrations of “icy” weasels, doves, and apes. IceOwl needs a similarly awesome logo to use as its icon.

This bug seems particularly egregious because owls actually live in icy climates and come in white versions! For example:

https://commons.wikimedia.org/wiki/File:Snowy_Owl_-_Schnee-Eule.jpg

While illustrators need to imagine what an “ice ape” or “ice weasel” might look like, there is no such need for imagination in the case of an ice owl!

As far as I’m concerned, this bug should be release critical. Hopefully, someone will upload a patch quickly!

Finally, after many months of all of us suffering in silence, Nick Morrott came along and fixed the bug with the creation of this new, incredibly awesome, icy owl logo!

iceowl_icon-350px

Job Market Materials

Last year, I applied for academic, tenure track, jobs at several communication departments, information schools, and in HCI-focused computer science programs with a tradition of hiring social scientists.

Being “on the market” — as it is called — is both scary and time consuming. Like me, many candidates have never been on the market before. Candidates are asked to produce documents in genres — e.g., cover letters, research statements, teaching statements, diversity statements — that most candidates have never written, read, or even heard of.

Candidates often rely on their supervisors for advice. I did so and my advisors were extremely helpful. The reality, however, is that although candidates’ advisors may sit on hiring committees, most have not been on candidates’ side of job market themselves for years or even decades.

The Internet is full of websites, like the academic jobs wiki, Academia StackExchange, and the Chronicle of Higher Education forums for people on the market. Confused and insecure candidates ask questions of the form, “Does blank matter?” and the answer is usually, “Doing/having blank may help/hurt, but it is only one factor of many.” The result is that candidates worry about everything. Then they worry about what they should be worrying about, but are not.

The most helpful thing, for me, was to read and synthesize the material submitted by recent successful job market candidates. For example, Michael Bernstein — a friend from MIT, now at Stanford — published his research and teaching statements on his website and I found both useful as I prepared mine. That said, I was surprised by how little material like this I could find on the web. For example, I could not find any examples of recent job market cover letters from successful candidates in fields close to mine.

So to help fill this gap, I am publishing all of my job market material. I’ve posted both the PDFs of the material I submitted as well as the LaTeX templates I used to generate the documents in my packet. My packet included:

  • Research Statement (TeX) — A description of my research to date and my current trajectory. Following a convention I have seen others follow, I “cited” my own work (but only my work) to form a a curated bibliography of my own publications and working papers.
  • Teaching Statement (TeX) — A two-page description of my approach to teaching, a list of my teaching experience, and a description of sample courses.
  • Diversity Statement (TeX) — A description of how I think about diversity and how I have, and will, engage with it in my teaching and research.
  • Cover Letter (TeX) — Each application I sent had a customized cover letter. I wrote mine on MIT letter head. Since each letter is different, I have published the letter I sent to the department that I took the job in (UW Communication). Because my new department did not request research and teaching statements, the cover letter includes material taken from both. For departments that requested separate statements, I limited myself to a shorter (1.5 pages) version of the letter with a similar structure.
  • Writing Samples — I included three or four of my papers to every job I applied to. The selection of articles changed a bit depending on the department but I included at least one single-authored paper in each packet.
  • Letters of Recommendation — Because I didn’t write these and haven’t seen them, I can’t share them. I requested letters from my four committee members: Eric von Hippel, Yochai Benkler, Mitch Resnick, and Tom Malone.
  • Curriculum Vitae (TeX) — I have tried to keep my CV up-to-date during graduate school. I keep my CV in git and have a little CGI script automatically rebuild the published version whenever an update is committed.

I hope people going “on the market” will find these materials useful. Obviously, you should not copy or reuse the text of any of my material. It is your application, after all. That said, please do help yourself to the formatting and structure.

Finally, I would encourage anyone who builds on my material to republish their own material to help other candidates. If you do, I’d appreciate a link back or comment on this blog post so that my readers can find your improvements.

Resurrecting Debian Seattle

seattle_skyline_night      debian_logo

When I last lived in Seattle, nearly a decade ago, I hosted the “Debian Seattle Social” email list. When I left the city, the mailing list eventually fell victim to bitrot.

When Allison Randall asked me about the list a couple months ago, I decided that moving back to Seattle was a good excuse to work with Allison and some others to revive the community. Toward that end, I’ve put up a little website and created a new mailing list. It’s hosted on Alioth this time which will be reliable than me. Since it has been years, we have not moved over the old subscriber list so you’ll have to sign up again if you were on it before.

If you’re a Debian developer or user and you’d like to hear about infrequent Debian social gatherings in the Seattle area, you should sign up on the list!

London and Michigan

I’ll be spending the week after next (June 17-23) in London for the annual meeting of the International Communication Association where I’ll be presenting a paper. This will be my first ICA and I’m looking forward to connecting with many new colleagues in the discipline. If you’re one of them, reading this, and would like to meet up in London, please let me know!

Starting June 24th, I’ll be in Ann Arbor, Michigan for four weeks of the ICPSR summer program in applied statistics at the Institute for Social Research. I have been wanting to sign up for some of their advanced methods classes for years and am planning to take the opportunity this summer before I start at UW. I’ll be living with my friends and fellow Berkman Cooperation Group members Aaron Shaw and Dennis Tennen.

I would love to make connections and meet people in both places so, if you would like to meet up, please get in contact.