|
I introduced Enterprise Information Management (EIM) in my last posting as a discipline that supports leveraging information as a strategic asset. This discipline requires companies to fuse skill sets and technologies that are traditionally focused on specific types of digital content. Examples include: - Data management – Data management is generally focused on structured data, i.e. data that conforms to a rigid, fixed schema.
- Document management – Document management is focused on managing containers for unstructured text and increasingly other types of content such as graphics, voice annotations and hyperlinks.
- Content management – Content management is focused on managing unstructured or semi-structured content, including Web content, images and documents.
- Event management – Event management is focused on generating or capturing structured data representing events. These events are then used to trigger other events.
- Metadata management – Metadata management is focused on managing data that describes other data. Standalone metadata management systems are often based on repository technology rather than database technology, which provides better support for complex objects, complex relationships and behavior.
- Master Data Management (MDM) – Master data is a specific type of metadata that describes core business entities in a consistent manner across systems and silos. MDM can be thought of as canonical modeling for core business entities.
The boundaries between these silos are getting fuzzier. Relational databases were originally designed for structured data contained in tables, but they now support binary objects, XML hierarchical data, and unstructured text; content management systems now support XML content and documents; and so on. The result is a proliferation of overlapping information silos, where each silo gradually acquires more capabilities. Ultimately, the net result is a lot of silos that are still more-or-less focused on specific types of digital content and functional capabilities specific it's content. However, there’s a growing recognition of the need for an overarching discipline that makes sense of all these silos through convergence and/or integration, and that links different types of silo content to business drivers—including business goals, business strategies, and business value. The central focus of EIM is to support this goal by leveraging data and information as a strategic asset, which requires links to business drivers. EIM impacts applications, services, reporting, and business intelligence systems that access different types of digital content from different silos. EIM capabilities must extend across organizational and technology silos since business value often depends on integrating data and information from multiple silos. EIM serves all types of consumers, including people, applications, services, processes, and devices, which must be provided with the ability to discover, access, integrate, transform, distribute, and present different forms of information consistent with a governance framework. To perform EIM effectively, companies need a governance framework that defines and enforces information controls and characteristics, including security, quality, compliance, and risk management policies. Let’s take a look at each one of these EIM capabilities with a particular emphasis on semantic issues: - Discovery – Discovery capabilities include browsing and search. Many types of search are available in commercial products, including products that provide one or more of the following capabilities:
- Key word search – Key word search is one of the simplest forms of search. It’s language dependent, works only with text, and tends to have poor recall and precision even when the target space is restricted to a single language.
- Recall - Recall is the number of relevant documents retrieved divided by the total number of relevant documents in the target space. Key word search has poor recall because slight variations in word forms and synonyms do not produce “hits.”
- Precision – Precision is the number of relevant documents retrieved divided by the total number of documents retrieved. Key word search has poor precision because it produces many irrelevant “hits.”
- Grammar – Grammar-based semantic modeling helps improve search results by using grammar templates to represent concepts in a language-specific way.
- Ontologies – Ontological modeling can be used to produce “smarter” search results. It’s typically based on fairly simple, industry-specific ontologies (e.g. banking) that might contain a few hundred terms. This type of search is language dependent.
- N-grams – Probabilistic N-gram modeling helps improve search results by building “n-grams” in advance, which are similar to indices. This approach is language independent and works for text but not multimedia.
- Concepts – Bayesian inference supports smart search by automatically identifying concepts (i.e. “meaning”) in arbitrary digital content. It’s a branch of mathematical probability theory used to model uncertainty about the world and outcomes of interest by combining common-sense knowledge and observational evidence. This approach to search is particularly interesting since it’s language independent and works with multimedia, including images, voice and video.
- Access – This refers to accessing information that you know exists. You also generally know where it’s located, what access protocols should be used, and you have access privileges. SQL access to structured data in a relational database is a classic example.
Discovery is an important EIM capability because it can be used to find information you may not know even exists. Alternatively, you may not know where it is or how many copies exist, which could have major impacts on risk management and regulatory compliance. Side effects of having discovery capabilities are also important, including the ability to automatically generate hyperlinks, summaries and taxonomies.Meanwhile, the beneficial side effects of having discovery capabilities are also important, giving the business the ability to automatically generate hyperlinks (explicit relationships), summaries, and taxonomies that generally make information assets more accessible, usable, and valuable. - Integrate – Integrate refers to combining digital content from multiple sources in a meaningful way. Doing this may or may not require metadata from silos other than those supplying the content that’s being integrated.
- Transform – Transform refers to changing the structure or other properties of digital content.
- Distribute – Distribute refers to moving digital content to new location(s).
- Present – Present refers to displaying digital content on one or more types of devices.
In my next posting, I'll discuss another important topic, Information-as-a-Service (IaaS), and its relationship to EIM.
|