Now that we’ve set the stage, let’s look at how to get started with semantic integration in typical enterprise IT environments. We first need to understand the current state of information management (the as-is state), which in most organizations is characterized by:
- Disparate information management disciplines, responsible for islands of structured data, semi-structured data, unstructured content, metadata, and rich media, which together we’ll refer to as “information.” Even within the realm of relational databases that mostly manage structured data, there are islands represented by different database platforms such as Oracle and DB2 with associated, platform-specific skill sets. Furthermore, different disciplines exist for managing transactional databases vs. analytically oriented data warehouses.
- Disparate information management and software development disciplines. Different groups are generally responsible for developing applications and services vs. managing information. Since the purpose of an application or service is to acquire or process information in some fashion, improving the alignment between these two groups could enable improved information management capabilities.
- Information islands. Object-relational database management technology adds support for additional data types, which is helping to reduce or eliminate the necessity for islands of information. However, for a variety of reasons, islands still exist. While low-level mechanisms such as SQL and federated databases exist to integrate these islands, most organizations still lack the abstraction mechanisms required to support true virtualization (see below), which is required for cost effective integration. Capability deficiencies typically include:
- Models: Model-driven abstractions and transformations support cost-effective data and information integration in complex environments involving multiple data stores, data types, and access methods.
- Schema-driven behavior: The ability to read and derive implied schemas (introspection) from structured and unstructured sources support cost-effective data and information integration in complex environments involving multiple data stores, data types, and access methods.
- Limited virtualization: Information virtualization facilitates the use (from single sources) and integration and use (from multiple sources) of information by insulating users from implementation details, including location and the impact of physical storage structures on access. Canonicalization and Master Data Management (MDM) are two traditional forms of virtualization that few organizations have mastered:
- Lack of canonical models: Canonical information models are independent of any specific organizational unit, business process, application, service or platform. Canonical models provide two important advantages:
- Reuse: Reusing information across organizational, process, application, service and platform silos is easier. Reusing information across service silos, for example, would mean that different service domains—groupings of services governed and managed separately, possibly on different repository and/or registry platforms—would be able to share information without requiring onerous integration efforts.
- Business alignment: Since canonical models are independent of specific organizational units, business processes, applications, services and platforms, they must use something else as a frame of reference. This “something else” is the business. Canonical models are naturally aligned with business requirements.
- Fragmented Master Data Management (MDM): MDM is the discipline of describing core business entities in a consistent manner across organizational, process, application and platform silos. MDM therefore represents canonical modeling applied to core business entities. Fragmented or non-existent MDM disciplines, tools, and methodologies negatively impact the organization's ability to manage information.
- Syntax-based processing: Most organizations process information based on syntax, not semantics. Syntax-based processing is based on superficial structural rules; which, while able to convey a limited amount of “meaning,” nevertheless fall short of what is required to establish an enterprise information management (EIM) discipline.
So, how do we move from the as-is state to the to-be state, and what is the to-be state? The answer lies in EIM, a discipline that supports leveraging information as a strategic asset by all types of consumers, including people, applications, services, processes, and devices. As such, EIM defines the principles required to deliver business benefits, which address all of the deficiencies associated with legacy information management environments. EIM enables information processing that’s strategically aligned with business requirements, including business goals, business strategies, and business implementation plans.
Will drill a little deeper into this proposition in my next posting, which will discuss EIM in more detail and introduce related concepts, such as Information-as-a-Service (IaaS), Rich Internet Applications (RIAs), classification hierarchies and concept hierarchies.
|