Supporting children in doing data science

As children use digital media to learn and socialize, others are collecting and analyzing data about these activities. In school and at play, these children find that they are the subjects of data science. As believers in the power of data analysis, we believe that this approach falls short of data science’s potential to promote innovation, learning, and power.

Motivated by this fact, we have been working over the last three years as part of a team at the MIT Media Lab and the University of Washington to design and build a system that attempts to support an alternative vision: children as data scientists. The system we have built is described in a new paper—Scratch Community Blocks: Supporting Children as Data Scientists—that will be published in the proceedings of CHI 2017.

Our system is built on top of Scratch, a visual, block-based programming language designed for children and youth. Scratch is also an online community with over 15 million registered members who share their Scratch projects, remix each others’ work, have conversations, provide feedback, bookmark or “love” projects they like, follow other users, and more. Over the last decade, researchers—including us—have used the Scratch online community’s database to study the youth using Scratch. With Scratch Community Blocks, we attempt to put the power to programmatically analyze these data into the hands of the users themselves.

To do so, our new system adds a set of new programming primitives (blocks) to Scratch so that users can access public data from the Scratch website from inside Scratch. Blocks in the new system gives users access to project and user metadata, information about social interaction, and data about what types of code are used in projects. The full palette of blocks to access different categories of data is shown below.

Project metadata
User metadata
Site-wide statistics

The new blocks allow users to programmatically access, filter, and analyze data about their own participation in the community. For example, with the simple script below, we can find whether we have followers in Scratch who report themselves to be from Spain, and what their usernames are.

Simple demonstration of Scratch Community Blocks

In designing the system, we had two primary motivations. First, we wanted to support avenues through which children can engage in curiosity-driven, creative explorations of public Scratch data. Second, we wanted to foster self-reflection with data. As children looked back upon their own participation and coding activity in Scratch through the project they and their peers made, we wanted them to reflect on their own behavior and learning in ways that shaped their future behavior and promoted exploration.

After designing and building the system over 2014 and 2015, we invited a group of active Scratch users to beta test the system in early 2016. Over four months, 700 users created more than 1,600 projects. The diversity and depth of users creativity with the new blocks surprised us. Children created projects that gave the viewer of the project a personalized doughnut-chart visualization of their coding vocabulary on Scratch, rendered the viewer’s number of followers as scoops of ice-cream on a cone, attempted to find whether “love-its” for projects are more common on Scratch than “favorites”, and told users how “talkative” they were by counting the cumulative string-length of project titles and descriptions.

We found that children, rather than making canonical visualizations such as pie-charts or bar-graphs, frequently made information representations that spoke to their own identities and aesthetic sensibilities. A 13-year-old girl had made a virtual doll dress-up game where the player’s ability to buy virtual clothes and accessories for the doll was determined by the level of their activity in the Scratch community. When we asked about her motivation for making such a project, she said:

I was trying to think of something that somebody hadn’t done yet, and I didn’t see that. And also I really like to do art on Scratch and that was a good opportunity to use that and mix the two [art and data] together.

We also found at least some evidence that the system supported self-reflection with data. For example, after seeing a project that showed its viewers a visualization of their past coding vocabulary, a 15-year-old realized that he does not do much programming with the pen-related primitives in Scratch, and wrote in a comment, “epic! looks like we need to use more pen blocks. :D.”

Doughnut visualization
Ice-cream visualization
Data-driven doll dress up

Additionally, we noted that that as children made and interacted with projects made with Scratch Community Blocks, they started to critically think about the implications of data collection and analysis. These conversations are the subject of another paper (also being published in CHI 2017).

In a 1971 article called “Teaching Children to be Mathematicians vs. Teaching About Mathematics”, Seymour Papert argued for the need for children doing mathematics vs. learning about it. He showed how Logo, the programming language he was developing at that time with his colleagues, could offer children a space to use and engage with mathematical ideas in creative and personally motivated ways. This, he argued, enabled children to go beyond knowing about mathematics to “doing” mathematics, as a mathematician would.

Scratch Community Blocks has not yet been launched for all Scratch users and has several important limitations we discuss in the paper. That said, we feel that the projects created by children in our the beta test demonstrate the real potential for children to do data science, and not just know about it, provide data for it, and to have their behavior nudged and shaped by it.

This blog post and the paper it describes are collaborative work with Sayamindu Dasgupta. We have also received support and feedback from members of the Scratch team at MIT (especially Mitch Resnick and Natalie Rusk), as well as from Hal Abelson. Financial support came from the US National Science Foundation. The paper itself is open access so anyone can read the entire paper here. This blog post was also posted on Sayamindu Dasgupta’s blog, on the Community Data Science Collective blog, and in several other places.

3 Replies to “Supporting children in doing data science”

  1. What is data science?

    From the perspective of my university, it seems like they treat data science as computer science for dummies. They removed functional programming, logic, and theoretical CS, and add efficient algorithms and machine learning courses (which you could choose in CS yourself – and most people do, because they are easier than the alternatives). They also switched some stochastic courses (matching previous CS courses), and added optimization and “basics of advanced math” – whatever that might be (and obviously you can choose those in CS too).

    But all in all, it basically seems to be a fallback for someone who passes all the practical CS modules, but fails the theory. Instead of leaving without a degree, they can get one of the data science ones.

    Do I want people to learn that? Hell no. They can learn real computer science instead of the dumbed down version.

    1. When I am say data science, I’m simply referring to people asking and answering questions using data and a deductive scientific approach. I’m not taking about your university’s major.

      I think there are two other important responses to your comment:

      1. Don’t condemn the concept based on your university’s implementation. There’s no consensus on what data science is or what it should constitute and different data science programs teach very different things. Perhaps your university is just teaching an easier version of computer science but lots of other universities are taking different and interesting approaches. Many data science programs require classes on statistics, experimental design, and observational scientific research methods. There can be critical skills for many professional data scientists that are rarely part of a computer scientists or engineers toolbox.
      2. Not everybody needs to be a computer scientist. Being able to ask and answer questions using data seems like a valuable thing in general. I strongly believe that there’s value in non-computer scientists learning the tools of computer science. If you really think that everybody should get a degree in computer science and become a computer scientist, I suspect we’re just going to have to agree to disagree. If you don’t think that, why do you object to people learning a subset of computer science?

      If you’re interested in understanding how people define data science and what goes into data science curricula, there’s a great article coming out soon by Jason Portenoy and Jevin West (there’s a presentation version of it here) or you can read this article I wrote with a few others that includes some discussion of this in the background.

  2. Mr. Papert’s LOGO paper reminded me of a Paul Lockhart’s essay “A Mathematician’s Lament” (, which I wholeheartedly agree, that teaching math for the sake of math is a very easy way for students to lose motivation.

    By the way, the link to your full Scratch paper seems to be broken, the correct one should be

Leave a Reply

Your email address will not be published. Required fields are marked *