Tuesday, 10 May 2011
A few people have asked me for more detail about my PhD, and since I've been thinking about it a lot at the moment, I thought I'd have a go at writing something here.
In language processing, there's a sub-field known as Question Answering (QA). Typically, work in this area is focused on the kinds of short, factual questions you might type into a search engine: "What's the capital of France?" or "Who is the Queen of England?"
In my work I tend to look at written dialogue (I'm most interested in language as it expresses human interaction) and as part of my PhD studies I'm looking at the topic of questions in a dialogue context, specifically the Enron corpus of email. So I thought it might be nice to apply some QA techniques to the questions people actually ask one another.
The first practical stage of QA is question classification: given this question, what type of answer is expected? For the examples above, "What's the capital of France?" demands a place (a city, to be precise) while "Who is the Queen of England?" requires a person's name.
But this is where I hit my first problem. Most of the time, people aren't asking one another these short, factual questions. They're asking things like "Do you want to go for coffee?" or "Please can you read my report?" or "What the ****?!"
These questions don't need facts to answer them. They need a quick yes/no, or a physical action, or (for rhetorical questions) no response at all. My first published paper in January showed that about a quarter of questions in the Enron corpus are of the "factual" kind, so clearly there's a lot of work for me to do with the other question types.
Meanwhile, if you're interested in the most headline-grabbing application of QA, Google for "Watson plays Jeopardy"...