AI-Complete
Google is your friend
2005-02-08 21:19:01
After a long hiatus, I'm going to start blogging again with this paper on "Automatic Meaning Discovery using Google" by Paul Vitanyi (the Kolmogorov complexity guy) and Rudi Cilibrasi, which has generated a flurry of interest, even sparking a slashdot discussion. The essence of the paper is this : Using the page counts returned by the Google search engine to define a distribution over words and word pairs and using it to automatically extract meaning from the world wide web. For example, the number of page counts returned by the query "horse"+"rider"(about 2,710,000) versus "hoarse"+"rider"(31,400) gives some information on the semantic associations between these words. In this way they are trying to exploit the huge but low-quality information source that is the world wide web to generate a lexicon and an ontology, in comparison to Cyc for example which is building a hand-crafted knowledge base.

The idea has been suggested before but this is the first realistic attempt that I know of. Their approach is interesting for two reasons. First, it has strong theoretical justification: an argument based on Kolmogorov complexity and optimal string encodings. Basically the metric they use, called the Normalized Google Distance, is universal w.r.t. the Google Distance of individual authors ie. the NGD of any two words is within a linear factor of the GD of those words in the web documents originating from any one source.

Secondly they have impressive experimental results, especially one involving the heirarchical classification of a set of numbers and colours. Another set of experiments uses the NGD between an instance word and a set of "anchor" words to define a set of features that is used as input to an SVM. By using the correct set of anchor words, they were able to classify all words that are "electrical" terms with 100% accuracy.
Deepak
Articles By Month
November-2008................................................................
October-2008................................................................
August-2008................................................................
July-2008................................................................
June-2008................................................................
May-2008................................................................
April-2008................................................................
March-2008................................................................
February-2008................................................................
January-2008................................................................
December-2007................................................................
November-2007................................................................
October-2007................................................................
September-2007................................................................
August-2007................................................................
July-2007................................................................
June-2007................................................................
May-2007................................................................
April-2007................................................................
March-2007................................................................
February-2007................................................................
January-2007................................................................
December-2006................................................................
November-2006................................................................
October-2006................................................................
February-2005................................................................
December-2004................................................................
November-2004................................................................
October-2004................................................................
December-1969................................................................
footer