Can computers understand online conversations?

February 15, 2012

TheySay, Oxford, Corpora linguistics, big data New software being developed at Oxford University may be able to instantly measure the emotions and reactions of large populations by evaluating the words we use on the internet. Investors seem to think this idea will pay off.

So, why do we care? Well, this software, called TheySay, uses something called corpora linguistics, which basically means taking a whole bunch of text (whether Dickens’ novels or Facebook posts) and analyzing what the words mean in relation to each other. This technology may be used to analyze language on an unprecedented scale. For example, one could gather all the public tweets on Twitter in a given month and use text-based analysis to measure the general sentiment of that time period. A group of mathematicians used a variation of this approach on a language-wide scale. Learn more here.

Consider how a tool like this could work like a heart monitor that measures your pulse, but instead of faster and slower, it would measure happier or sadder. As the mood changes, it can be correlated to a public event, like an election, the Super Bowl or the Grammys.

Last week, Steve Lohr at the New York Times discussed the growing amount and usefulness of immense collections of raw data. As he put it, “…the computer tools for gleaning knowledge and insights from the Internet era’s vast trove of unstructured data are fast gaining ground. At the forefront are the rapidly advancing techniques of artificial intelligence like natural-language processing, pattern recognition and machine learning.” Companies hope to use these developing technologies to predict and measure public health outbreaks or fluctuations in the housing market, among other applications.

TheySay wants to measure perception of a particular product or company to help businesses measure their reputations based on the immediate feedback that language analysis provides. As one of the professors behind the program told The Engineer: “We have a very large database of words annotated by hand along several dimensions for the emotional meaning they carry, and we also evaluate the grammatical context in which these words occur, taking account of the effects of negation and other constructs that change meaning. A word such as ‘progress’ is generally perceived as positive, but not when it is in a context such as ‘fail to progress’, or ‘little progress’.” Pretty cool, huh?

Do you think this software will be able to interpret human language?