Semantic intelligence from ExpertSystem
I had an interesting conversation with Luca Scagliarini of Expert System, a company that doesn't make expert systems per se. They are a semantic technology firm with an interesting set of products, for their customers and a few that face the public.
Their biggest public foray is with the beta of AskWiki, a project that uses the Wikipedia as a source of knowledge and their semantic engine in the background to let people ask natural-language questions of the Wikipedia to find answers to things like, How tall is Pikes Peak (14115 ft). Interestingly, it has difficulty with "who is the president of the United States," giving me an answer that describes the role, but not the person. I think this isn't the best demonstration of their technology, as what I saw in our discussion was much better. (Several people commented about AskWiki last week: Gary O. Grimm and Glyn Moody)
The core of their ability to do semantic analysis is a deep knowledge of the semantics of a given language. Any semantic analysis technology needs to be able to take text and pull out the meaning. And the way you can get meaning is by understanding the language, which means understanding how sentences are constructed AND what the words mean within that structure. To understand the words, the system needs to understand how each word can be used (definitions and concepts) and map against the sentence structures.
The underlying network here contains ~1.5 million connections. In the semantic analysis, the tool maps a specific text onto the semantic network. The result is the specific instantiation of the semantic network for the text, whether that is one sentence or a book. Once you have that mapping, you can do all sorts of interesting things. You can answer basic questions, a la AskWiki. You can look at the semantic concepts associated with the text.
This big hairy beast is called Cogito, which has several variations, depending on the specific application: semantic search, categorization (content interpretation, automatic tagging), text mining, trend analysis, and even some customer-relationship-management applications.
Luca grabbed a current story and ran it through the analyzer (50 KB text / second), and the tool gave us the top concepts in the article - concepts that weren't necessarily keywords in the article itself. It extracted proper nouns and relevant information about those proper nouns (were they people, organizations, places, etc). And it did some nice disambiguation of those proper nouns: "Jack Vinson" is the same as "Vinson," and he is referred to as "doctor," so he must have an advanced degree.
Use the tool for market intelligence / trend analysis across web-based content (i.e. blogs). Look at the general impression of articles about a certain brand - what is that impression over the last time period, how has it changed. What are the trends compared to other brands? What are the trends in the market, generally? This particular tool has been developed for a customer, but will become a product in its own right next year.
Text mining for intelligence purposes. Look for trends of terms / people / organizations being mentioned in a corpus of text (the example looked at newspapers). With the analysis done, you can pull back lots of interesting data. What has Condoleezza Rice "done" in the last month (as reported in the NY Times)? When she "talked," to whom did she talk? About what? And then you can always go back to the source article where she talked to the prime minister of Freedonia.
Previous entry: Science for fun: Toilet seat position via game theory
Next entry: Picture an under construction sign