Studying the relationship between remixing & learning

With more than 10 million users, the Scratch online community is the largest online community where kids learn to program. Since it was created, a central goal of the community has been to promote “remixing” — the reworking and recombination of existing creative artifacts. As the video above shows, remixing programming projects in the current web-based version of Scratch is as easy is as clicking on the “see inside” button in a project web-page, and then clicking on the “remix” button in the web-based code editor. Today, close to 30% of projects on Scratch are remixes.

Remixing plays such a central role in Scratch because its designers believed that remixing can play an important role in learning. After all, Scratch was designed first and foremost as a learning community with its roots in the Constructionist framework developed at MIT by Seymour Papert and his colleagues. The design of the Scratch online community was inspired by Papert’s vision of a learning community similar to Brazilian Samba schools (Henry Jenkins writes about his experience of Samba schools in the context of Papert’s vision here), and a comment Marvin Minsky made in 1984:

Adults worry a lot these days. Especially, they worry about how to make other people learn more about computers. They want to make us all “computer-literate.” Literacy means both reading and writing, but most books and courses about computers only tell you about writing programs. Worse, they only tell about commands and instructions and programming-language grammar rules. They hardly ever give examples. But real languages are more than words and grammar rules. There’s also literature – what people use the language for. No one ever learns a language from being told its grammar rules. We always start with stories about things that interest us.

In a new paper — titled “Remixing as a pathway to Computational Thinking” — that was recently published at the ACM Conference on Computer Supported Collaborative Work and Social Computing (CSCW) conference, we used a series of quantitative measures of online behavior to try to uncover evidence that might support the theory that remixing in Scratch is positively associated with learning.

scratchblocksOf course, because Scratch is an informal environment with no set path for users, no lesson plan, and no quizzes, measuring learning is an open problem. In our study, we built on two different approaches to measure learning in Scratch. The first approach considers the number of distinct types of programming blocks available in Scratch that a user has used over her lifetime in Scratch (there are 120 in total) — something that can be thought of as a block repertoire or vocabulary. This measure has been used to model informal learning in Scratch in an earlier study. Using this approach, we hypothesized that users who remix more will have a faster rate of growth for their code vocabulary.

Controlling for a number of factors (e.g. age of user, the general level of activity) we found evidence of a small, but positive relationship between the number of remixes a user has shared and her block vocabulary as measured by the unique blocks she used in her non-remix projects. Intriguingly, we also found a strong association between the number of downloads by a user and her vocabulary growth. One interpretation is that this learning might also be associated with less active forms of appropriation, like the process of reading source code described by Minksy.

The second approach we used considered specific concepts in programming, such as loops, or event-handling. To measure this, we utilized a mapping of Scratch blocks to key programming concepts found in this paper by Karen Brennan and Mitchel Resnick. For example, in the image below are all the Scratch blocks mapped to the concept of “loop”.

scratchblocksctWe looked at six concepts in total (conditionals, data, events, loops, operators, and parallelism). In each case, we hypothesized that if someone has had never used a given concept before, they would be more likely to use that concept after encountering it while remixing an existing project.

Using this second approach, we found that users who had never used a concept were more likely to do so if they had been exposed to the concept through remixing. Although some concepts were more widely used than others, we found a positive relationship between concept use and exposure through remixing for each of the six concepts. We found that this relationship was true even if we ignored obvious examples of cutting and pasting of blocks of code. In all of these models, we found what we believe is evidence of learning through remixing.

Of course, there are many limitations in this work. What we found are all positive correlations — we do not know if these relationships are causal. Moreover, our measures do not really tell us whether someone has “understood” the usage of a given block or programming concept.However, even with these limitations, we are excited by the results of our work, and we plan to build on what we have. Our next steps include developing and utilizing better measures of learning, as well as looking at other methods of appropriation like viewing the source code of a project.

This blog post and the paper it describes are collaborative work with Sayamindu Dasgupta, Andrés Monroy-Hernández, and William Hale. The paper is released as open access so anyone can read the entire paper here. This blog post was also posted on Sayamindu Dasgupta’s blog and on Medium by the MIT Media Lab.

Unhappy Birthday Suspended

More than 10 years ago, I launched Unhappy Birthday in a fit of copyrighteous exuberance. In the last decade, I have been interviewed on the CBC show WireTap and have received an unrelenting stream of hate mail from random strangers.

With a recently announced settlement suggesting that “Happy Birthday” is on its way into the public domain, it’s not possible for even the highest-protectionist in me to justify the continuation of the campaign in its original form. As a result, I’ve suspended the campaign while I plan my next move. Here’s the full text of the notice I posted on the Unhappy Birthday website:

Unfortunately, a series of recent legal rulings have forced us to suspend our campaign. In 2015, Time Warner’s copyright claim to “Happy Birthday” was declared invalid. In 2016, a settlement was announced that calls for a judge to officially declare that the song is in the public domain.

This is horrible news for the future of music. It is horrible news for anybody who cares that creators, their heirs, etc., are fairly remunerated when their work is performed. What incentive will there be for anybody to pen the next “Happy Birthday” knowing that less than a century after their deaths — their estates and the large multinational companies that buy their estates — might not be able to reap the financial rewards from their hard work and creativity?

We are currently planning a campaign to push for a retroactive extension of copyright law to place “Happy Birthday,” and other works, back into the private domain where they belong! We believe this is a winnable fight. After all, copyright has been retroactively extended before! Stay tuned! In the meantime, we’ll keep this page here for historical purposes.

—“Copyrighteous“ Benjamin Mako Hill (2016-02-11)

Welcome Back Poster

My office door is on the second floor in front the major staircase in my building. I work with my door open so that my colleagues and my students know when I’m in. The only time I consider deviating from this policy is the first week of the quarter when I’m faced with a stream of students, usually lost on their way to class and that, embarrassingly, I am usually unable to help.

I made this poster so that these conversations can, in a way, continue even when I am not in the office.



Celebrate Aaron Swartz in Seattle (or Atlanta, Chicago, Dallas, NYC, SF)

I’m organizing an event at the University of Washington in Seattle that involves a reading, the screening of a documentary film, and a Q&A about Aaron Swartz. The event coincides with the third anniversary of Aaron’s death and the release of a new book of Swartz’s writing that I contributed to.


The event is free and open the public and details are below:

WHEN: Wednesday, January 13 at 6:30-9:30 p.m.

WHERE: Communications Building (CMU) 120, University of Washington

We invite you to celebrate the life and activism efforts of Aaron Swartz, hosted by UW Communication professor Benjamin Mako Hill. The event is next week and will consist of a short book reading, a screening of a documentary about Aaron’s life, and a Q&A with Mako who knew Aaron well – details are below. No RSVP required; we hope you can join us.

Aaron Swartz was a programming prodigy, entrepreneur, and information activist who contributed to the core Internet protocol RSS and co-founded Reddit, among other groundbreaking work. However, it was his efforts in social justice and political organizing combined with his aggressive approach to promoting increased access to information that entangled him in a two-year legal nightmare that ended with the taking of his own life at the age of 26.

January 11, 2016 marks the third anniversary of his death. Join us two days later for a reading from a new posthumous collection of Swartz’s writing published by New Press, a showing of “The Internet’s Own Boy” (a documentary about his life), and a Q&A with UW Communication professor Benjamin Mako Hill – a former roommate and friend of Swartz and a contributor to and co-editor of the first section of the new book.

If you’re not in Seattle, there are events with similar programs being organized in Atlanta, Chicago, Dallas, New York, and San Francisco.  All of these other events will be on Monday January 11 and registration is required for all of them. I will be speaking at the event in San Francisco.

The Boy Who Could Change the World: The Writings of Aaron Swartz

The New Press has published a new collection of Aaron Swartz’s writing called The Boy Who Could Change the World: The Writings of Aaron Swartz. I worked with Seth Schoen to introduce and help edit the opening section of book that includes Aaron’s writings on free culture, access to information and knowledge, and copyright. Seth and I have put our introduction online under an appropriately free license (CC BY-SA).

aaronsw_book_coverOver the last week, I’ve read the whole book again. I think the book really is a wonderful snapshot of Aaron’s thought and personality. It’s got bits that make me roll my eyes, bits that make me want to shout in support, and bits that continue to challenge me. It all makes me miss Aaron terribly. I strongly recommend the book.

Because the publication is post-humous, it’s meant that folks like me are doing media work for the book. In honor of naming the book their “progressive pick” of the week, Truthout has also published an interview with me about Aaron and the book.

Other folks who introduced and/or edited topical sections in the book are David Auerbach (Computers), David Segal (Politics), Cory Doctorow (Media), James Grimmelmann (Books and Culture), and Astra Taylor (Unschool). The book is introduced by Larry Lessig.

Access Without Empowerment (LibrePlanet 2015 Keynote)

At LibrePlanet 2015 (the FSF’s annual conference), I gave a talk called “Access Without Empowerment” as one of the conference keynote addresses. As I did for my 2013 LibrePlanet talk, I’ve edited together a version that includes the slides and I’ve posted it online in WebM and on YouTube.

Here’s the summary written up in the LibrePlanet program:

The free software movement has twin goals: promoting access to software through users’ freedom to share, and empowering users by giving them control over their technology. For all our movement’s success, we have been much more successful at the former. I will use data from free software and from several related movements to explain why promoting empowerment is systematically more difficult than promoting access and I will explore how our movement might address the second challenge in the future.

In related news, registration is open for LibrePlanet 2016 and that it’s free for FSF members. If you’re not an FSF member, the FSF annual fundraiser is currently going on so now would be a great time to join.

Trust your technolust.

"Trust your technolust." from Hackers (1995)If you’ve ever lusted for a “Trust your technolust.” poster like the one seen in background of the climactic sequence in the 1995 film Hackers, you’re in luck. Just print this PDF template (also an SVG) onto a piece of yellow US letter paper.

Although I’m not even the first person I know to reproduce the poster, I did spend some time making sure that I got the typeface, kerning, wordspacing, and placement on the page just right. I figured I would share.

TheSetup ChangeLog

Several years ago, I did a long interview with TheSetup — a fantastic website that posts of interviews with nerdy people that ask the same four questions:

  1. Who are you, and what do you do?
  2. What hardware are you using?
  3. And what software?
  4. What would be your dream setup?

Because I have a very carefully considered — but admittedly quite idiosyncratic — setup, I spent a lot of time preparing my answers. Many people have told me that they found my write-up useful. I recently spoke with several students who said it had been assigned in one of their classes!

Of course, my setup has changed since 2012. Although the vast majority is still the same, there is a growing list of modifications and additions. To address this, I’ve been keeping a changelog on my wiki where I detail every major change and addition I’ve made to the setup that I described in the original interview.

Understanding Hydroplane Races for the New Seattleite

It’s Seafair weekend in Seattle. As always, the centerpiece is the H1 Unlimited hydroplane races on Lake Washington.

EllstromManufacturingHydroplaneIn my social circle, I’m nearly the only person I know who grew up in area. None of the newcomers I know had heard of hydroplane racing before moving to Seattle. Even after I explain it to them — i.e., boats with 3,000+ horse power airplane engines that fly just above the water at more than 320kph (200mph) leaving 10m+ (30ft) wakes behind them! — most people seem more puzzled than interested.

I grew up near the shore of Lake Washington and could see (and hear!) the races from my house. I don’t follow hydroplane racing throughout the year but I do enjoy watching the races at Seafair. Here’s my attempt to explain and make the case for the races to new Seattleites.

Before Microsoft, Amazon, Starbucks, etc., there were basically three major Seattle industries: (1) logging and lumber based industries like paper manufacturing; (2) maritime industries like fishing, shipbuilding, shipping, and the navy; (3) aerospace (i.e., Boeing). Vintage hydroplane racing represented the Seattle trifecta: Wooden boats with airplane engines!

The wooden U-60 Miss Thriftway circa 1955 (Thriftway is a Washinton-based supermarket that nobody outside has heard of) below is a picture of old-Seattle awesomeness. Modern hydroplanes are now made of fiberglass but two out of three isn’t bad.

miss_thriftwayAlthough the boats are racing this year in events in Indiana, San Diego, and Detroit in addition to the two races in Washington, hydroplane racing retains deep ties to the region. Most of the drivers are from the Seattle area. Many or most of the teams and boats are based in Washington throughout the year. Many of the sponsors are unknown outside of the state. This parochialness itself cultivates a certain kind of appeal among locals.

In addition to old-Seattle/new-Seattle cultural divide, there’s a class divide that I think is also worth challenging. Although the demographics of hydro-racing fans is surprisingly broad, it can seem like Formula One or NASCAR on the water. It seems safe to suggest that many of the demographic groups moving to Seattle for jobs in the tech industry are not big into motorsports. Although I’m no follower of motorsports in general, I’ve written before cultivated disinterest in professional sports, and it remains something that I believe is worth taking on.

It’s not all great. In particular, the close relationship between Seafair and the military makes me very uneasy. That said, even with the military-heavy airshow, I enjoy the way that Seafair weekend provides a little pocket of old-Seattle that remains effectively unchanged from when I was a kid. I’d encourage others to enjoy it as well!

DRM on Streaming Services

For the 2015 International Day Against DRM, I wrote a short essay on DRM for streaming services posted on the Defective by Design website. I’m republishing it here.

Between 2003 and 2009, most music purchased through Apple’s iTunes store was locked using Apple’s FairPlay digital restrictions management (DRM) software, which is designed to prevent users from copying music they purchased. Apple did not seem particularly concerned by the fact that FairPlay was never effective at stopping unauthorized distribution and was easily removed with publicly available tools. After all, FairPlay was effective at preventing most users from playing their purchased music on devices that were not made by Apple.

No user ever requested FairPlay. Apple did not build the system because music buyers complained that CDs purchased from Sony would play on Panasonic players or that discs could be played on an unlimited number of devices (FairPlay allowed five). Like all DRM systems, FairPlay was forced on users by a recording industry paranoid about file sharing and, perhaps more importantly, by technology companies like Apple, who were eager to control the digital infrastructure of music distribution and consumption. In 2007, Apple began charging users 30 percent extra for music files not processed with FairPlay. In 2009, after lawsuits were filed in Europe and the US, and after several years of protests, Apple capitulated to their customers’ complaints and removed DRM from the vast majority of the iTunes music catalog.

Fundamentally, DRM for downloaded music failed because it is what I’ve called an antifeature. Like features, antifeatures are functionality created at enormous cost to technology developers. That said, unlike features which users clamor to pay extra for, users pay to have antifeatures removed. You can think of antifeatures as a technological mob protection racket. Apple charges more for music without DRM and independent music distributors often use “DRM-free” as a primary selling point for their products.

Unfortunately, after being defeated a half-decade ago, DRM for digital music is becoming the norm again through the growth of music streaming services like Pandora and Spotify, which nearly all use DRM. Impressed by the convenience of these services, many people have forgotten the lessons we learned in the fight against FairPlay. Once again, the justification for DRM is both familiar and similarly disingenuous. Although the stated goal is still to prevent unauthorized copying, tools for “stripping” DRM from services continue to be widely available. Of course, the very need for DRM on these services is reduced because users don’t normally store copies of music and because the same music is now available for download without DRM on services like iTunes.

We should remember that, like ten years ago, the real effect of DRM is to allow technology companies to capture value by creating dependence in their customers and by blocking innovation and competition. For example, DRM in streaming services blocks third-party apps from playing music from services, just as FairPlay ensured that iTunes music would only play on Apple devices. DRM in streaming services means that listening to music requires one to use special proprietary clients. For example, even with a premium account, a subscriber cannot listen to music from their catalog using an alternative or modified music player. It means that their television, car, or mobile device manufacturer must cut deals with their service to allow each paying customer to play the catalog they have subscribed to. Although streaming services are able to capture and control value more effectively, this comes at the cost of reduced freedom, choice, and flexibility for users and at higher prices paid by subscribers.

A decade ago, arguments against DRM for downloaded music focused on the claim that users should have control over the music they purchase. Although these arguments may not seem to apply to subscription services, it is worth remembering that DRM is fundamentally a problem because it means that we do not have control of the technology we use to play our music, and because the firms aiming to control us are using DRM to push antifeatures, raise prices, and block innovation. In all of these senses, DRM in streaming services is exactly as bad as FairPlay, and we should continue to demand better.

RomancR: The Future of the Sharing-Your-Bed Economy


Today, Aaron Shaw and I are pleased to announce a new startup. The startup is based around an app we are building called RomancR that will bring the sharing economy directly into your bedrooms and romantic lives.

When launched, RomancR will bring the kind of market-driven convenience and efficiency that Uber has brought to ride sharing, and that AirBnB has brought to room sharing, directly into the most frustrating and inefficient domain of our personal lives. RomancR is Uber for romance and sex.

Here’s how it will work:

  • Users will view profiles of nearby RomancR users that match any number of user-specified criteria for romantic matches (e.g., sexual orientation, gender, age, etc).
  • When a user finds a nearby match who they are interested in meeting, they can send a request to meet in person. If they choose, users initiating these requests can attach an optional monetary donation to their request.
  • When a user receives a request, they can accept or reject the request with a simple swipe to the left or right. Of course, they can take the donation offer into account when making this decision or “counter-offer” with a request for a higher donation. Larger donations will increase the likelihood of an affirmative answer.
  • If a user agrees to meet in person, and if the couple then subsequently spends the night together — RomancR will measure this automatically by ensuring that the geolocation of both users’ phones match the same physical space for at least 8 hours — the donation will be transferred from the requester to the user who responded affirmatively.
  • Users will be able to rate each other in ways that are similar to other sharing economy platforms.

Of course, there are many existing applications like Tinder and Grindr that help facilitate romance, dating, and hookups. Unfortunately, each of these still relies on old-fashion “intrinsic” ways of motivating people to participate in romantic endeavors. The sharing economy has shown us that systems that rely on these non-monetary motivations are ineffective and limiting! For example, many altruistic and socially-driven ride-sharing systems existed on platforms like Craigslist or Ridejoy before Uber. Similarly, volunteer-based communities like Couchsurfing and Hospitality Club existed for many years before AirBnB. None of those older systems took off in the way that their sharing economy counterparts were able to!

The reason that Uber and AirBnB exploded where previous efforts stalled is that this new generation of sharing economy startups brings the power of markets to bear on the problems they are trying to solve. Money both encourages more people to participate in providing a service and also makes it socially easier for people to take that service up without feeling like they are socially “in debt” to the person providing the service for free. The result has been more reliable and effective systems for proving rides and rooms! The reason that the sharing economy works, fundamentally, is that it has nothing to do with sharing at all! Systems that rely on people’s social desire to share without money — projects like Couchsurfing — are relics of the previous century.

RomancR, which we plan to launch later this year, will bring the power and efficiency of markets to our romantic lives. You will leave your pitiful dating life where it belongs in the dustbin of history! Go beyond antiquated non-market systems for finding lovers. Why should we rely on people’s fickle sense of taste and attractiveness, their complicated ideas of interpersonal compatibility, or their sense of altruism, when we can rely on the power of prices? With RomancR, we won’t have to!

Note: Thanks to Yochai Benkler whose example of how leaving a $100 bill on the bedside table of a person with whom you spent the night can change the nature of the a romantic interaction inspired the idea for this startup.

More Community Data Science Workshops

Pictures from the CDSW sessions in Spring 2014
Pictures from the CDSW sessions in Spring 2014

After two successful rounds in 2014, I’m helping put on another round of the Community Data Science Workshops. Last year, our 40+ volunteer mentorss taught more than 150 absolute beginners the basics of programming in Python, data collection from web APIs, and tools for data analysis and visualization and we’re still in the process of improving our curriculum and scaling up.

Once again, the workshops will be totally free of charge and open to anybody. Once again, they will be possible through the generous participation of a small army of volunteer mentors.

We’ll be meeting for four sessions over three weekends:

  • Setup and Programming Tutorial (April 10 evening)
  • Introduction to Programming (April 11)
  • Importing Data from web APIs (April 25)
  • Data Analysis and Visualization (May 9)

If you’re interested in attending, or interested in volunteering as mentor, you can go to the information and registration page for the current round of workshops and sign up before April 3rd.

Kuchisake-onna Decision Tree

Mika recently brought up the Japanese modern legend of Kuchisake-onna (口裂け女). For background, I turned to the English Wikipedia article on Kuchisake-onna which had the following to say about the figure (the description matches Mika’s memory):

According to the legend, children walking alone at night may encounter a woman wearing a surgical mask, which is not an unusual sight in Japan as people wear them to protect others from their colds or sickness.

The woman will stop the child and ask, “Am I pretty?” If the child answers no, the child is killed with a pair of scissors which the woman carries. If the child answers yes, the woman pulls away the mask, revealing that her mouth is slit from ear to ear, and asks “How about now?” If the child answers no, he/she will be cut in half. If the child answers yes, then she will slit his/her mouth like hers. It is impossible to run away from her, as she will simply reappear in front of the victim.

To help anyone who is not only frightened, but also confused, Mika and I made the following decision tree of possible conversations with Kuchisake-onna and their universally unfortunate outcomes.

Decision tree of conversations with Kuchisake-onna.
Decision tree of conversations with Kuchisake-onna.

Of course, we uploaded the SVG source for the diagram to Wikimedia Commons and used the diagram to illustrate the Wikipedia article.

Consider the Redirect

In wikis, redirects are special pages that silently take readers from the page they are visiting to another page. Although their presence is noted in tiny gray text (see the image below) most people use them all the time and never know they exist. Redirects exist to make linking between pages easier, they populate Wikipedia’s search autocomplete list, and are generally helpful in organizing information. In the English Wikipedia, redirects make up more than half of all article pages.

seattle_redirectOver the years, I’ve spent some time contributing to to Redirects for Discussion (RfD). I think of RfD as like an ultra-low stakes version of Articles for Deletion where Wikipedians decide whether to delete or keep articles. If a redirect is deleted, viewers are taken to a search results page and almost nobody notices. That said, because redirects are almost never viewed directly, almost nobody notices if a redirect is kept either!

I’ve told people that if they want to understand the soul of a Wikipedian, they should spend time participating in RfD. When you understand why arguing about and working hard to come to consensus solutions for how Wikipedia should handle individual redirects is an enjoyable way to spend your spare time — where any outcome is invisible — you understand what it means to be a Wikipedian.

That said, wiki researchers rarely take redirects into account. For years, I’ve suspected that accounting for redirects was important for Wikipedia research and that several classes of findings were noisy or misleading because most people haven’t done so. As a result, I worked with my colleague Aaron Shaw at Northwestern earlier this year to build a longitudinal dataset of redirects that can capture the dynamic nature of redirects. Our work was published as a short paper at OpenSym several months ago.

It turns out, taking redirects into account correctly (especially if you are looking at activity over time) is tricky because redirects are stored as normal pages by MediaWiki except that they happen to start with special redirect text. Like other pages, redirects can be updated and changed over time are frequently are. As a result, taking redirects into account for any study that looks at activity over time requires looking at the text of every revision of every page.

Using our dataset, Aaron and I showed that the distribution of edits across pages in English Wikipedia (a relationships that is used in many research projects) looks pretty close to log normal when we remove redirects and very different when you don’t. After all, half of articles are really just redirects and, and because they are just redirects, these “articles” are almost never edited.

edits_over_pagesAnother puzzling finding that’s been reported in a few places — and that I repeated myself several times — is that edits and views are surprisingly uncorrelated. I’ll write more about this later but the short version is that we found that a big chunk of this can, in fact, be explained by considering redirects.

We’ve published our code and data and the article itself is online because we paid the ACM’s open access fee to ransom the article.