How much structure do you carry
Seth Grimes talks about the claim that "80% of business-related information resides in unstructured form, primarily text." I remember this being an important element of discussions of information management (and into knowledge management) as I was getting into the topic.
BridgePoint Experts' Corner: Unstructured Data and the 80 Percent Rule .
[snip] It does seem obvious that a very high proportion of data is unstructured: How much of your workday is spent reading or writing e-mails, reports, or articles and the like, in conversations, or listening to live or recorded audio? And in making the case for tapping unstructured sources, a very important asset in fields ranging from customer experience management to counter-terrorism, it’s helpful to be able to quantify the proportion, to put a number on it.
I like that he's taken the time to explore source of this claim, which appears to be more-or-less correct. But even more important, why is it interesting?
As I read through Seth's discovery, the thing that I thought is that at one point ALL data is "unstructured" because we can only add structure to it when we build a narrative around it. Of course, I know that databases are "structured" in that i can find a phone number vs. a fax number, if the fields are labeled properly. But the connections that I can draw about that data can really only be expressed in words and language. And this is when I add my own structure to the data, as well as unstructuring it from the formal rows and columns of a data cube.
[Yes, I know I am playing loose with information architecture concepts. Please forgive.]
1 Comment(s)
Leave a comment
Previous entry: Planning for a change
Next entry: KM professorship at Kent State




In the current age of Google, it is the structured data that is a problem. Google (and other search engines) have proven that they can handle unstructured data and return it in a meaningful way.
The structured systems need to make content available for search by unstructured search engines. These separate silos of information act as barriers to the distribution of information.
If you search for "Doug's phone number" you can't find it. You need to go into the contact system, search for Doug and then parse that from system.