Monday, 8 December 2008

Common Sense vs. The Status Quo

Every now and then someone says something which is blindingly obvious, but still well worth saying. Twice in the last couple of months I've read articles which have made me sit up and say, "hang on, that's common sense - so why isn't the world like that?"

One was a recent post by Hal Daumé on his natural language processing blog, the other was Ted Pedersen's 'Last Words' piece in September's Computational Linguistics (Vol 34, Issue 3, pp465-470).

To summarise very, very briefly for anyone who doesn't want to read both articles (or who doesn't have access to the CL back-catalogue):
  • Hal talks about competitiveness in research - making the observation that researchers tend to be quite competitive and protective of their work, despite the fact that we're all nominally working towards the same goal, and all progress is good progress.
  • Ted talks about the failings of computational linguistics as a science, namely that we don't share enough of our code - or detail of corpora used - to make our experiments truly repeatable and results verifiable.
So they are making different points, but there's a lot of common ground. I'm sure that part of the reason researchers (and departments) don't give away all their code to download is because they're afraid someone will use it to do the same thing but better.

Ted's article really struck a chord with me because I was slightly floored the first time I did peer-reviewing of papers - it felt weird to be assessing someone's write-up of the results from their shiny new programme, but without being able to run or even read their code. I could go on for hours about scientific method (or lack thereof) in areas of linguistics such as pragmatics - maybe that's a post for another day - but I'd assumed that in computational linguistics (part of computer science, after all) there would be no such issues.

Competition and co-operation don't go very well together, of course, and maybe some of these problems would melt away if there was enough funding to go round. I went to Slimbridge wetlands centre at the weekend and was discussing with friends how such a huge number of birds were living there, in fairly cramped conditions, and yet we saw no fighting - we wondered whether that was because there was essentially unlimited food provided by the centre. No competition for resources, no need to fight?

I'm generally a pretty co-operative person by nature - any number of personality profiles will tell you that I'm all about people and teams and working together - but even I get a little defensive when it comes to the topic of my PhD. Why? Because I'm studying part-time, and six years is a long time in politics, and if someone else scooped up my topic and did it full-time they could finish before me... at which point I'd no longer be doing original research and I don't fancy starting the whole thing over again. But on the other hand I will certainly be putting my code on my website - once I have code, rather than just a lengthy literature review and a few ideas. I'd prefer my work to feed into others' research rather than just disappear into the ether.

I've found it hugely frustrating to read about techniques and programmes in the journals, think to myself "yes, that would be useful," but then discover that nowhere is the code available to download and build on. Reinventing the wheel seems to be a job requirement... and it hurts. Mathematicians must have the same competitive drivers (i.e. funding), but they publish their proofs for others to verify and build upon, and that must speed up progress over all.

So, why is there such an ingrained cultural problem in computational linguistics? Is it just us, or is the rest of computer science just as bad? And what can we do to 'fix' it? Answers on a postcard...

No comments:

Post a Comment

Thanks for dropping in! I'd love to hear what you have to say, and if you leave a URL, I'll be round to visit you soon. (Comment moderation is on because the spam has become overwhelming!)

Related Posts Plugin for WordPress, Blogger...