Wizards of OS Wrap-up

My joint workshop with Lawrence Lessig at Wizards of OS went, in my opinion, extremely well. The worst hitch was an unfortunate series of events that conspired to keep Vera Franz from attending and moderating the session as planned. Paul Keller, who was supposed to participate in the panel, graciously (and capably) moderated in her place.

The panel allowed Lessig and I to talk openly and publicly about about our disagreements for the first time while also highlighting the many places where we speak with one voice. The conversation managed to be both positive and productive without papering over issues.

I usually like to post talk notes and slides after each speaking engagement. However, our WOS meeting was a "workshop" so I have nothing prepared to present here. I have, however, seen two write-ups in the press:

If someone has a recording, tell me how I might get a copy.

One small note: I am quoted in IP watch as saying that most CC works are under the most restrictive licenses and that there has been no shift toward less restrictive licenses with time. Mia Garlick has pointed out that the latest license usage statistics, based on admittedly imprecise linkback data, show a several percentage point decrease in the usage of licenses that block commercial use and derivatives — when expressed as a fraction of the total number of works under CC licenses. The restrictive licenses are still the most popular but it was incorrect to say that there is no evidence of any progress whatsoever toward more free licenses.

Tomorrow, I will post a summary and response to one of the points that Lessig and I talked most about.

PyBlosxom and Comment Spam

Over the past few months, I’ve dealt with something of a blog spam nightmare on Copyrighteous: my blog running the PyBlosxom weblog software.

Wiser and less stubborn individuals might given up on either PyBlosxom or the ability to receive public comments. However, I find PyBlosxom unique in its flexibility and great ReStructured Text support and am always frustrated with others’ blogs that don’t accept comments. At the end of the day, I couldn’t bring myself to part with either.

Historically, my blogspam protection has been to use a simple weak CAPTCHA and to have my blog software email me each time a comment is successfully submitted so that I can (with a built-in macro in mutt) delete each spam comment that slips in. This has worked well for the last couple years.

This summer, Mika pointed out that my blog was full of Chinese link spam that I had not noticed or been notified about. Around the same time, I realized that my website had been dealt a massive spam penalty by Google and was basically not showing up in any search results.

I have spent a significant amount of time over the last month repairing the damage and working to prevent it from reoccuring. I’m documenting this process here in the hopes that it might save other time and energy.

Upon reflection, the situation could have been prevented the in three relatively easy ways — all of which I have now implemented.

  • Had the PyBlosxom comment.py plugin’s mail function been working properly, I would have known that I was being spammed.
  • Had PyBlosxom been configured to only make blog entries (and not comments) indexable by search engines, Google and others never would have seen the link spam.
  • Had I installed a stronger CAPTCHA, I might have blocked the spam from being submitted in the first place (although at the expense of the participation in comments by visually impaired users).

A month or so, hours of work, and a Google reinclusion request later, my website is beginning to show up in search entries again. Hopefully this message will help save others from a similar fate.

Fixing PyBlosxom’s Comment Notification

The most critical problem was a bug in PyBlosxom’s contributed comment.py plugin and its comment notification system. In short, the email based comment notification system failed silently if the body of the email — which included the full text of the comment — included any non-ASCII UTF-8 encoded text.

I’ve filed a bug against PyBlosxom and included a patch that fixes this issue. However, since this is a rather critical problem and because PyBlosxom releases tend to be few and far between, it might be worth patching your system now. My patch is against version 1.3 but can easily be modified and applied to version 1.2.

Hiding Comments from Search Engines

The major reason that the successful spam became a problem was that it triggered Google’s abuse detector, resulted in a spam penalty, and made all of the (non-spam) material on my website more difficult for others to find. A simple way to prevent this is to hide all comments from the search engines.

I’ve done this by creating a new PyBlosxom flavor that shows comments (and allows them to be input) which is not indexed by search Engines and to remove comments altogether from the default indexable flavors.

To do this, I removed all of the comment-* templates from html flavor and created a new flavor called comment.flav that included the comment templates. I also had to make the comment submit action point to the new flavor and to change the "Comments: N" link to point instead to .comment flavor rather than the .html. The rest of the template is simply symbolic links to the the HTML template.

The next step is to ensure that the comment flavor is not indexed by search engines. I found two ways of doing this and did both. The first was to add a "no index" meta tag to the header of each .comment page. It looked like this:

 <meta name="robots" content="noindex, nofollow"> 

This is necessary because the robots.txt standard, the normal way to tell search engines not to index a page, does not support wildcards.

Luckily, Google (and others I imagine) do support an extension to Robots.txt that allows you to use wildcards. To take advantage of this, I created a robots.txt for mako.cc that blocks indexing all of the comment flavor. The following robot.txt did the trick for me:

 User-agent: Googlebot Disallow: /copyrighteous/*.comment$ 

An Improved CAPTCHA

Ultimately, the best solution would be to keep the spam from showing up on the blog at all.

The only decent PyBlosxom CAPTCHA is the "nospam" plugin by Steven Armstrong. It is a simple image-based CAPTCHA and I was running it when I was spammed. It uses PIL but generates purely number-based strings and does some minimum obfuscation. Basically, spambots were able to break the CAPTCHA and toward the end, I was receiving thousands of pieces of a blog spam a day.

I’ve incorporated the PIL image generation code from Mediawiki’s ConfirmEdit/FancyCaptcha extension into nospam.py with this patch which I have also sent to Steven Armstrong — nospam.py’s original author. It’s much stronger.

Apologies, of course, go to all of my vision impaired users. Image-based CAPTCHAs really are evil. In this situation though with many thousands of attempts of a day, the alternative is that I will turn off comments altogether — the standard (and poor) lesser of two evils argument.

Ultimately, I will write a python implementation of a new strong text-based CAPTCHA I’ve invented that uses commonsense knowledge and pulls off some cool data acquisition in the process. I presented this project at the Wikimania Hacking Days and at a Media Lab open house for AAAI 2006 where I got universally positive and useful feedback. CAPTCHA inventor and recent genius-grantee Luis von Ahn seemed to like the idea too. I’ll write more about this on another day though.

Wizards of OS 4

I’m in Berlin for just over 48 hours to give a workshop at Wizards of OS 4.

The workshop is Free Content Licensing: Success, Challenges and the Way Forward and will be a conversation between myself, Lawrence Lessig, and Paul Keller from Waag Society and Creative Commons Netherlands.

When I first published Toward a Standard of Freedom (my first article that was critical of Creative Commons) a couple years ago, I received an email from someone at Creative Commons within two hours of posting the note. The email pointed out that I had incorrectly licensed my work as the CC license I applied to my essay had the old mailing address for CC. I thanked the mail’s author for pointing out my mistake but asked if, perhaps, she or someone else at CC had anything to say about the content of the article itself which was, after all, about her organization’s work. I never received a reply.

To date, I have not been able to engage in meaningful public discussion of my criticism of CC with CC, although I have tried several times.

I’m thrilled that Volker Grassmuck and the Wizards of OS organizers have been able to put together this opportunity to start what I hope will be a longer conversation with people at CC about some of what some of us perceive as tactical shortcomings of the CC approach. It can only make our movements stronger.

It may someday become useful…

/copyrighteous/images/crowd_robot.jpg

My philosopher Mika says:

These are robots that can walk through crowds without bumping into people. I almost feel that humans need to develop that skill rather than robots.

But nonetheless, it may someday become useful to have such robots.

Planet Debian Upgrade

I’ve upgraded Planet (the software that runs Planet Debian) to version 2.0. It’s been a while since I touched the planet software so many issues that had annoyed users of planet should now be remedied by this upgrade. I think people will very happy with the upgrade.

In the process, I really screwed up planet for the moment. New posts from the last day or so may not be showing up. Other people may appear to have flooded planet when in fact they’ve done nothing at all. This is not their fault but I’ve had a hard time going through and picking up the pieces left by the upgrade. Please just bear with me.

If you notice a little bit of funkiness in Planet Debian in the next day or so, please wait a day or two (or a post or two) before contacting me to debug the problem. Thanks for your patience everyone.

If you have questions, don’t hesitate to get in contact with me.

Controversy

Several days ago, I got a message from David G. Reichert, my representative in the U.S. House of Representatives (and the incumbent candidate in one of the New York Time’s "Races to Watch"). His letter started out:

As your Representative in Congress, I want to share with you some of the work I have been doing to assist orphans in underdeveloped countries.

I grew up in a family that adopted several orphans from underdeveloped countries so I’m glad to see this happening — I really am.

But what really makes me happy is that I get to hear from my elected representative unsolicited — for the first time, no less — advertising his work on such a controversial subject. He seems perfectly willing to stand up for what he believes in, even if it means that he loses the crucial anti-developing-nation-orphan vote.

Rhyme Time

Who would design consumer electronic products around technological necessity when they could design them around clever bits of word play?

I came up with the idea for an "iPod Tripod" — if you will, a "TriPod" — and was thrilled to see that someone else had already (a) stumbled upon the same little rhyme and (b) followed through on the idea and was already selling a product!

The review I read seemed to indicate that the execution was not quite as good as the name. But then again, how could it be?

TV-B-Gone

I bought a TV-B-Gone a few years back. It’s been fun. I make a point of never turning off a TV that anyone is obviously watching and have only once had anyone turn a television back on. Most people find it easier to just talk to the person sitting across the table than to turn the TV back on.

The only problem is that I can never really tell in advance when I’ll need my TV-B-Gone and so frequently end somewhere wishing I had one when I’ve left it at home. What I really need is a TV-B-Gone-B-Here.

New Creative Commons Licenses

In the last couple years, I’ve earned something of a reputation for giving Creative Commons a hard time. This fact hit home a few weeks ago when a reporter for the San Antonio Current called me up to get "the other side" on a story he was doing on CC. Apparently, the journalist had found my name in the criticism section of Wikipedia’s Creative Commons article.

Now, while I’m not happy with CC’s reticence to take a normative stance of any kind and I’m not thrilled with many CC licenses that don’t respect what I believe are essential freedoms, I should give credit to CC where credit is due.

Over the past half year or so, I’ve had the pleasure of helping represent Debian in conversations between a Debian team and folks at CC to help iron out a number of nits with the CC licenses that seemed to be (unnecessarily) creating barriers to Debian blessing some more permissive licenses as DFSG free. Throughout this process, folks at CC have been helpful, responsive, flexible, and seriously willing to make changes based on our suggestions.

The first and hardest stage of this work culminated with CC’s release of the discussion draft of their 3.0 licenses. Evan Prodromou published a great in-depth report on the talks between Debian and CC that helped shape these drafts. While we didn’t get 100% of what we were asking, I’m personally quite confident that we have or will get all of what is necessary to ensure that the licenses are DFSG free both in letter and spirit.

Not only does CC build several great licenses, they are willing to work with the community in difficult meaningful ways. When we build a real social movement around calls for essential freedom of culture and content, we’ll be lucky to have CC writing some of the licenses that help make it happen.

Acronym Expansion

XM Radio claims that it’s "Beyond AM. Beyond FM." I’m sure that’s the case. I’m a little less clear on what it is, outside of the first letter in the acronym, they are modulating.

Musical Beds

During Wikimania, I was explaining to someone that Aziz Ridouan (Audionautes) was staying at Elizabeth Stark’s apartment, that Elizabeth Stark was staying at Jean-Baptiste Soufron’s apartment, and that Jean-Baptiste Soufron was staying at my apartment.

In fact, all four of us have slept at least one night at both the Acetarium and at Jean-Baptiste’s apartment in the last month and a half.

The Official Ubuntu Book

Any Day Now, The Official Ubuntu Book will show up in stores. I have a rubber-banded-together copy of the folded and gathered sheets and the the first batch of books should be bound (or being bound) right now. Those who have pre-ordered it from Amazon or elsewhere should have it in their hands quickly.

In addition to my own name on the author page is (future Ubuntu Community Manager) Jono Bacon, Corey Burger, Jonathan Jesse, and Ivan Krstić. Many more members of the Ubuntu community and many editors at Prentice Hall deserve credit as well.

I’m proud of the book. I sense that it’s more consistent, better organized, and of a higher overall quality than my last book. Even better though, is the fact that the book is released under a Creative Commons Attribution-ShareAlike license. Several chapters are already being shipped by default on the Ubuntu desktop and several translations are underway.

You can read more about the book on the publishers site and order it from any number of places online. Books under such licenses are economically risky for publishers so please support the project by buying it if you end up finding the text useful!