Next step in text mining

Glenn Fannick of Factiva was the featured speaker at the KM Chicago meeting this past Tuesday with a topic of "Using Text mining and Visualization to Make Sense of Content Chaos."

Rather than diving directly into a technology discussion, Glenn provided some motivation around why text mining might be interesting.  He discussed how important it is for companies to pay attention to what is being said about them out in the world.  And as a corollary, he discussed the importance of companies engaging their customers, rather than just broadcasting at them.  And, yes, he did reference Cluetrain Manifesto.

In both of these cases, a technology like text mining can help find important connections.  Companies like Factiva can help monitor newspapers and other media for references to your company.  One can do the same in the blogosphere, and in much of the online world with various blog and web search engines.  Services like this offer various subscription models, so one can get periodic reports on what's happening.  And the sooner a company knows that people are talking, the sooner it can decide if and how to respond.

So, what's coming in advanced text mining and visualization? 

  • Enhanced ability to understand how words and terms are related to one another: does one company name appear frequently near another company name?  Or is there a phrase that appears near a company name?  Text miners have massive archives of proper nouns they can monitor, as well as place and region names.  They are beginning to understand how people are related to a chunk of text (about, author, quoted, interviewed, etc.) and can apply the same kinds of questions. 
  • Is a given combination of terms happening more or less frequently in a given time period?  i.e. What does the occurrence of Apple + Motorola + iTunes look like in relation to the discussion of the ROKR phone?  What about commentary on the success or failure of the venture?
  • Is a particular class of topics gaining / losing currency over time?
  • Re-engineering search results.  Rather than strictly get a list of results, attempt to apply the text mining capabilities to the results to pull out key concepts, companies, names to help the user focus their energies and give a wider context as to what is in the results.  I got the feeling that this was similar to what Technorati is doing with adding Flickr, Furl and del.icio.us matches to the search results.  (I understand that search and mining are very different activities.)
  • One of the interesting aspects of visualization is to use the time data to show the frequency of occurrence over time, whether that is a single term or term combinations.

4 Comment(s)

Jack,
Thanks for taking the time to post on this. A very nice summary. I was a pleasure speaking to the group, even if I couldn't be there to enjoy the holiday festivities afterward. We're all very excited about this area of our business. We see it as one with significant growth, as our clients have told us time and time again that they need to find ways to add more value to the information they already have in "raw" form.
Glenn

Ronald Kostoff said:

There are other applications of text mining that have the potential for enormous payoff. I have appended an announcement notice, sent earlier this year, that describes our approaches to text mining for discovery and innovation.

Additionally, since your group appears to have interest in knowledge management, I published a chapter earlier this year on Science and Technology Knowledge Management in the book "New Frontiers of Knowledge Management. (Ed.) Kevin DeSouza. Palgrave Macmillan, United Kingdom". This chapter places text mining in its larger context of knowledge management, and present some problems and potential payoffs associated with text mining today.

APPENDIX - ANNOUNCEMENT

FROM: DR. RONALD N. KOSTOFF (Office of Naval Research)

TO: TECHNICAL DISTRIBUTION (BCC:)

SUBJECT: ACCELERATING RADICAL DISCOVERY AND INNOVATION

A recent report (1) describes a new family of approaches for systematically identifying and accelerating potentially radical discovery and innovation in science and technology. This systematic capability is applicable to all phases of the science and technology development cycle (planning, investment/ selection, execution, review, publication/ dissemination, transition). The report should be of interest to research managers, performers, administrators, investors, and journal editors who might benefit strongly from using such a systematic discovery and innovation approach in their work.

“Systematic Acceleration of Radical Discovery and Innovation in Science and Technology” (1) presents new approaches for solving problems in target technical disciplines by systematically extrapolating insights and principles from disparate technical disciplines. The Appendix to this e-mail briefly summarizes the approach fundamentals (separate forthcoming journal publications will describe the successful applications of the technique). Anyone interested in implementing the discovery and innovation approach should obtain the report (1) for the extra details.

Specifically, the technique is useful to the following members of the science and technology community:

1) Sponsoring agencies, for identifying radically new technical directions to support. For example, based on demonstration, an enormous increase in numbers of proposal responses to funding competition announcements, both in absolute numbers of proposals and especially proposals from disparate technical disciplines, is possible, using one of the discovery variants described in (1);

2) Researchers, for identifying potentially high-payoff research avenues to pursue. For example, based on demonstration, up to an order of magnitude (or more) increase in technically innovative approaches to solve problems can be hypothesized for further lab and field testing;

3) Journal Editors, for proactively using Special Issues as engines of radical innovation. For example, journal Special Issues featuring papers written by experts from very disparate disciplines could identify radically innovative approaches for solving problems related to the Special Issue’s theme;

4) Administrators, when restructuring organizations and teams to maximize innovation and discovery potential. For example, an improved mix of people representing very diverse relevant technical disciplines could be structured around core technologies to increase the potential of radical discovery and innovation;

5) Forecasters and Planners. For example, the possibility of identifying potentially radical solutions for technical problems could be estimated, and emerging technologies that could enable radical problem solving for many different core technologies could be identified. This capability would be of particular value to technology venture capital organizations;

6) Selected Researchers in informatics, bioinformatics, innovation management, science and technology management, interdisciplinary and multidisciplinary studies, and information retrieval. For example, representatives of these disciplines could apply their expertise to enhancing the approaches described in this note, and in turn their disciplines could benefit from such approaches.

RNK

APPENDIX – SUMMARY OF APPROACH

Radical discovery and innovation are systematically identified by extrapolating principles and insights from relevant disparate technical disciplines to solve problems of interest. To access these disparate technical disciplines, the following steps are necessary.

a) A query is developed to retrieve a core literature set (the literature describing the existing technology base for solving the problems of interest) from raw literature data sources (e.g., Science Citation Index, Medline)

b) The core literature is retrieved and characterized (technical structure, infrastructure).

c) The query used for generating the core literature is generalized/ expanded to retrieve a much broader literature set with direct and indirect connections to the core literature.

d) Potential discovery and innovation candidates (concepts) from this expanded literature are identified by technical experts.

e) The principles and insights from these potential discovery and innovation concepts are extrapolated to solve problems related to the core literature theme by either the authors of the expanded literature documents, or by independent analysts examining the contents of these expanded literature documents.

In summary, the key elements of the process are 1) developing the expanded query, 2) identifying potential discovery and innovation candidates, and 3) drawing the connections between the potential discovery and innovation candidates and solutions to the prevalent problems of interest.

For example, assume the problem of interest is to identify ‘improved’ water purification alternatives to existing practices, where ‘improved’ could reflect any combination of lower cost, lower energy use, lower maintenance, higher system reliability, lighter system weight, or improved system modularity for faster assembly. Then, the core literature would consist of the existing published water purification techniques. These publications could either be water purification science and technology output documents (papers, reports, patents, etc.), or funding agency narrative descriptions of water purification projects. Then, assume the core literature could be divided into two fundamental technical categories: mass separation (e.g., distillation, membrane filtering) and disinfection (e.g. ozonation, chlorination). The core literature would then be generalized/ expanded to contain all conceivable mass separation and disinfection techniques.

Radical discovery and innovation (i.e., extrapolation of insights and principles from these disparate literatures to solving the water purification problem) would be identified either by the authors associated with the expanded literature documents (through grant competitions and/ or workshops and/ or other multi-disciplinary group activities) or through independent analyses of these documents by technical experts. The authors of the expanded literature documents would also provide links to experts representing the disparate technical disciplines who may not be associated with the expanded literature documents. Membership in common professional societies, employment in same institution divisions, and attendance at same conferences and workshops, etc., would provide the linkages between experts accessed through the expanded literature documents, and those not accessed. This would ensure that all relevant expertise is accessed for solving the problem of interest.

RNK

REFERENCES
1). Kostoff, R.N. Systematic Acceleration of Radical Discovery and Innovation in Science and Technology. DTIC Technical Report Number ADA430720 (http://www.dtic.mil/). Defense Technical Information Center. Fort Belvoir, VA. 2005.

An MSWord version of the report’s final draft is available in downloadable form at http://www.onr.navy.mil/sci_tech/special/354/technowatch/textmine.asp (Press CTRL + click to follow link; GO TO FIRST PUBLICATION LISTED; click on Word Version.)

2). DISCLAIMER-The views in this e-mail and the referenced report are solely those of the author, and do not necessarily represent the views of the U.S. Department of the Navy, or any of its components.

Most of the predictions that were flying around at the end of last year concerned company acquisitions, forsight into the boom or bust of various technologies and the maturation of others. Few, however, made any attempt at predicting where major Read More

» Doing something useful with information from Knowledge Jolt with Jack

Ron Friedmann saw an interesting product demo from LexisNexis, which spawned some thoughts about the next life for search technology. This sounds like what Glenn Fannick of Factiva discussed at a KM Chicago event in December 2005. Read More

Leave a comment


About this Entry

This entry was published on December 16, 2005 12:17 AM and has 4 comment(s).

Categories:

Related Entries

Previous entry: Trimergent: Send, Publish ... Share

Next entry: What is Viable Vision

Find recent content on the main index, explore the full tag cloud, or look in the archives to find all content.

Powered by Movable Type 4.01
Picture a steaming coffee cup. Better yet, grab one and have a read!