The End of GSoC 2013

This article has been sitting on my desktop for quite some time. I try to reason that the procrastination was because schoolwork suddenly became overwhelming when holiday was over. Well, that's probably not true. Maybe I just didn't want it to end.

What I have done

I have created Plotrb, a Ruby plotting library that is based on D3.js and Vega.

The original plan was to write a D3 wrapper to replace the current Rubyvis, which is based on Protovis, the predecessor of D3.

When the project actually started, I realized some problems with the approach, and wrote my first lengthy explanationn on why I chose to create a wrapper for Vega, instead of D3 directly.

Over the three months, I have learned a great deal of things, and even "rewrote" Plotrb at least twice. So, have I achieved the goal I set back in June? Yes. Plortb now has its own set of (simple) grammar, that allows you to create Vega specifications in Ruby with relatively small amount of code.

However, am I satisfied with what I have accomplished and where Plotrb currently stands? Honestly, no and no.

Therefore, tonight I forced myself to sit down in front of this article, not trying to boast what a wonderful thing I've done or how much it would benefit the Ruby community, but rather to tell people who care to read this, that I see problems, and I'm still seeking solutions.

Problems

1. Where does (should) Plortb fit in?

We already have many excellent plotting libraries - matplotlib for Python, ggplot2 for R, D3.js for javascript, you name it. However, if you compare these three big names, you will find that D3 is somewhat different. D3 stands for "Data Driven Documentation", not "Damn Difficult Drawer" or something. To quote from its official introduction, D3 is for "manipulating documents based on data". It's more than a plotting library.

When I read scientific papers and see plots created by ggplot2 or matplotlib, I would nod and say "Hmm this looks very professional." But when I play with the interactive data visualization created by Mike Bostock on The New York Times, I'm like "Man, that's some kickass cool stuff!"

Now back to the question. Where should Plotrb fit in? It seems the culture of the Ruby community has been influenced quite a lot by Rails framework, and nowadays more and more things are happening in the browser. We chose D3 as the foundation of Plotrb because we wanted to elegantly handle data, visualize them and share them, just as D3 does, but not in javascript. We especially covet the interactivity brought by D3 that is not found in either ggplot2 or matplotlib.

However, we have to realize that such interactivity is a result of the combination of D3, browser technology and web standards. Ruby is not designed to be a client-side programming language, so there is naturally a barrier to use Ruby to manipulate CSS, SVG, DOM, etc. Of course we can leave the job to D3, but that would essentially mean we have to devise a way to translate Ruby code into D3's javascript. I am not aware of any mature solution for this yet.

This is a problem we can't avoid. From its current stand, it seems Plotrb has run into a field that Ruby is not particularly good at.

2. Didn't I choose Vega precisely because of that?

Yes, I did. The whole point is to let javascript people handle javascript stuff. The only bridge connecting the javascript world and the Ruby world is the JSON specification, which can be handled graciously by both sides.

However, this solution is still far from perfect. Vega itself is a new library, even during the course of Plotrb's development, Vega has had a few major updates with new components (such as Legends that is not in Plotrb yet because it came out very recently). The specification does not have a formal standard (such as JSON schema) for reference. Many of the rules seem quite arbitrary and inconsistent, and they caused a lot of pain in the last few weeks of developing Plotrb. I have to hack Plotrb to workaround these inconsistency of Vega, and this considerably reduced the maintainability of Plotrb.

Most importantly, the one single feature we wanted the most - interactivity - is largely missing in Vega. This is understandable because writing interactive visualization with only JSON but not proper javascript would be quite a stretch.

I wouldn't say it was a bad decision to choose Vega in the first place, as working with Vega has provided me some valuable insights on how to build an abstraction on top of the low-level D3 grammar. This however rekindled the idea of building Plotrb on top of D3 directly, albeit in a less "brute" way (for example not trying to translate Ruby to javascript bluntly), drawing inspiration from Vega's approach.

This is still just an idea not yet clear enough to be actionable. But my hunch tells me it is the way to go.

3. Why plotting tools are hard to design, and especially so for Ruby?

The following paragraphs are very opinionated. Please take with a grain of salt.

Creating a plotting tool is very hard, because it just seems impossible to cover both ends of the spectrum. People use it for all kinds of things, from plotting a few sets of experiment data, to visualizing enormous DNA sequence in a single graph. A tool catered for the former would be almost useless for the later; and a tool that is powerful enough for the later would be a huge overkill for the former.

You simply don't know what kind of plots the user is expecting to produce with the tool, so it's really difficult, if not impossible, to design a grammar that works for both ends.

Why is it especially hard to do in Ruby? Because I think Ruby developers are the most artistic among all. No other programming language community emphasizes on the elegance of code as much as we do. This leads to the harder question - how to design a plotting grammar/DSL that is succinct, elegant, and yet flexible enough to handle more than trivial use cases?

Just a few days back, I came across ggplot for python. I think it's a brilliant attempt to use ggplot2's grammar based on matplotlib, and I'm sure many people, especially if they are familiar with R, would love it.

But look at the code,

from ggplot import *

ggplot(aes(x='date', y='beef'), data=meat) + \
    geom_point(color='lightblue') + \
    geom_line(alpha=0.25) + \
    stat_smooth(span=.05, color='black') + \
    ggtitle("Beef: It's What's for Dinner") + \
    xlab("Date") + \
    ylab("Head of Cattle Slaughtered")

What? WHAT?

Is it just me or are those brackets, pluses and backslashes really, really, really, frustratingly ugly?

So...

It's already 4 am and I should probably stop. I'm not sure if it's a proper endnote for this year's GSoC. As I said, I see problems, and I won't stop searching for answers. So in that sense, it is not the end, no, far from it.


comments powered by Disqus