Friday, 6 August 2010

Crowdsourcing My PhD



I'm working towards a PhD in computational linguistics. Many of you have asked me for more detail about my studies... and now seems like a good time, because I'm having some problems that you can help with.

I'm a linguist at heart, with a background in pragmatics: the study of what people really mean by what they say. It was almost by accident that I ended up in a computer science department for my doctorate, but one major advantage of a computational approach is that it's comparatively easy to study a vast amount of data in a short length of time. And there's a nice dataset of email, in the form of the Enron corpus (thousands of emails subpoenaed when the company was investigated, and subsequently released into the public domain for research).

However, it's really hard to find good-quality, human-annotated language data. For what I'm interested in looking at, the data simply doesn't exist yet. And yet without hand-crafted examples of data, it's hard to "teach" a computer how to process something, and even harder to assess how well it performs.

Hence, a new kind of experiment (for me, at least).

I'd like to ask the power of the internet to help me pull together some sample answers. If you're a fluent English speaker with a few minutes to spare, please help me to categorise some questions. I'll then compare the answers given by different people, and try to find some kind of underlying "truth". If I get enough good answers, this will be the basis for training a computer model to perform the same task.

If you can spare even ten minutes to help out, I'd really appreciate it. Please pass the link to your friends, too. And I promise to let you know how it goes.

13 comments:

lakeviewer said...

Great idea to ask your readers to help: wide sample, across nationalities of English speakers, across age and educational status. Good luck.

Anything Fits A Naked Man said...

I completed 50 questions, Rachel! Hope that helps, good luck!!

ScoMan said...

I'm willing to help but not sure what I have to do. I guess step one is to follow that link.

Away I go!

Christine said...

I had a go. interestingly I felt my responses changing in relation to a developing and subjective perception of the place, the people, and the story that I may have been creating in my mind about what seemed to be unfolding in these emails. Can a computer reflect these subjective minutae?

Louiz said...

I did some, will do more this evening if I get a chance, and will post to my facebook later for you.

Stephanie V said...

I did a few this morning and will come back. I agree with Christine that one begins to try to make some sense out of the stream of questions.

Writing Without Periods! said...

I answered a few questions. Very interesting.
mary

Dave King said...

Great idea. I've answered a few. Might like to return and try some more. Allowed?

christine said...

I spent about an hour working through it, hope that will be useful, though it took me a few minutes to get the hang of the categarisations:(

Deirdre said...

I answered around 40 this morning. I'll try to come back to it as soon as I can.

:)

Natasha said...

I completed 30 questions. I'm here from Mama's Little Nestwork!

ladyfi said...

Fascinating! I'll try and help you out.

Damien Hall said...

Sorry, Rachel, but I think this survey may have been hacked! I've been doing some this lunchtime, and I got through about 10, then clicking on a response for the next question sent me to a page of fake-looking Google results for floorcleaner under the domain

custom404error dot com

This happened twice. The other time, the page of fake-looking Google results was at the same domain but for John Paul Gaultier perfume.

I hope you can get it back on track. It looks like a great, and important, study!

Damien Hall
@hall_damien

Post a Comment

Thanks for dropping in! I'd love to hear what you have to say, and if you leave a URL, I'll be round to visit you soon. (Comment moderation is on because the spam has become overwhelming!)

Related Posts Plugin for WordPress, Blogger...