Every now and then someone says something which is blindingly obvious, but still well worth saying. Twice in the last couple of months I've read articles which have made me sit up and say, "hang on, that's common sense - so why isn't the world like that?"
One was a recent post by Hal Daumé on his natural language processing blog, the other was Ted Pedersen's 'Last Words' piece in September's Computational Linguistics (Vol 34, Issue 3, pp465-470).
To summarise very, very briefly for anyone who doesn't want to read both articles (or who doesn't have access to the CL back-catalogue):
- Hal talks about competitiveness in research - making the observation that researchers tend to be quite competitive and protective of their work, despite the fact that we're all nominally working towards the same goal, and all progress is good progress.
- Ted talks about the failings of computational linguistics as a science, namely that we don't share enough of our code - or detail of corpora used - to make our experiments truly repeatable and results verifiable.
Ted's article really struck a chord with me because I was slightly floored the first time I did peer-reviewing of papers - it felt weird to be assessing someone's write-up of the results from their shiny new programme, but without being able to run or even read their code. I could go on for hours about scientific method (or lack thereof) in areas of linguistics such as pragmatics - maybe that's a post for another day - but I'd assumed that in computational linguistics (part of computer science, after all) there would be no such issues.
Competition and co-operation don't go very well together, of course, and maybe some of these problems would melt away if there was enough funding to go round. I went to Slimbridge wetlands centre at the weekend and was discussing with friends how such a huge number of birds were living there, in fairly cramped conditions, and yet we saw no fighting - we wondered whether that was because there was essentially unlimited food provided by the centre. No competition for resources, no need to fight?
I'm generally a pretty co-operative person by nature - any number of personality profiles will tell you that I'm all about people and teams and working together - but even I get a little defensive when it comes to the topic of my PhD. Why? Because I'm studying part-time, and six years is a long time in politics, and if someone else scooped up my topic and did it full-time they could finish before me... at which point I'd no longer be doing original research and I don't fancy starting the whole thing over again. But on the other hand I will certainly be putting my code on my website - once I have code, rather than just a lengthy literature review and a few ideas. I'd prefer my work to feed into others' research rather than just disappear into the ether.
I've found it hugely frustrating to read about techniques and programmes in the journals, think to myself "yes, that would be useful," but then discover that nowhere is the code available to download and build on. Reinventing the wheel seems to be a job requirement... and it hurts. Mathematicians must have the same competitive drivers (i.e. funding), but they publish their proofs for others to verify and build upon, and that must speed up progress over all.
So, why is there such an ingrained cultural problem in computational linguistics? Is it just us, or is the rest of computer science just as bad? And what can we do to 'fix' it? Answers on a postcard...