An open UNICOM Intelligence Data Model for the MR Industry... the challenge and the promise

Historically, the survey research industry and its computer software suppliers have been forced to accept barriers to the interchange of survey data between software systems and operating environments. These barriers, arising in part from disparate formats for handling the unique requirements of survey data, have been handled primarily through import and export programs, usually of limited effectiveness, and substantial production delays and costs are often incurred as a result. In addition, a core set of common functionality is often created in parallel by each software supplier or division, which affects software costs and ability to adapt quickly to change.

The fundamental need is not only for a medium of interchange for study information and respondent data, but for a means of simplifying the creation and linking together of application programs across a diverse range of functionality.

While significant headway has been made against the basic interchange problem by standards groups such as the Triple-S Committee, the challenge is in the meantime evolving. The survey research industry is beginning to see growing client demand for ever more complex analysis and data mining, for customer relationship management, for web-based publishing of interactive databases, and for executive information systems and marketing information portals that integrate multiple data streams. As we rise to meet these needs, we encounter fundamental inefficiencies arising from the difficulty of integrating applications from multiple vendors to create comprehensive solutions for clients.

The challenges include both logistical and industry issues: on the one hand, the need to handle the requirements of survey data, as well as practical issues that include hierarchical data structures and the tracking of changes over time in longitudinal survey instruments; and, on the other, to achieve consensus among industry players and software providers on the benefits to all of adopting a lingua franca that will allow applications from competing suppliers to share information simply and openly.

This paper will provide an overview of an effort by UNICOM Intelligence to create an open UNICOM Intelligence Data Model that provides an interchange medium, a published Application Programming Interface (API), and extensive functionality to simplify the development and integration of applications for survey research. The paper will review some of the challenges; the potential benefits for agencies and their clients, systems suppliers, and the industry as a whole; and learnings gathered to date from this extensive undertaking.

With these challenges at hand, and with a perceived industry need for a shared, open solution to alleviate the frustrations described, and to facilitate the creation of larger, more complex next-generation applications--the need seemed clear for a technical solution that would meet at least the following design goals:

To do this, the UNICOM Intelligence Data Model must be more than just a repository for survey data. It must provide functionality to allow applications to handle unique aspects of MR data, including multiple response questions, hierarchical data sets, and tracking of study versions, simply and quickly, so that these wheels need not be reinvented for each study. And it must provide an API that simplifies application programming.

The key benefit for the survey research community would lie beyond simple interchange of data, in customization of systems, products and business processes:

An early, simple real-world example might help to illustrate this last point. The construction of paper questionnaires for scanning often involves some fairly extensive formatting, as well as a time-consuming step in which fields on the paper are mapped into the scanning software. We've combined third-party software from multiple vendors with our Data Model and a proprietary application to create a solution that automates much of the formatting--and reduces the mapping process from one that often requires a day or more, depending on the questionnaire, to a matter of seconds. Further, changes to the questionnaire are automatically handled, both in the formatting and in the mapping to the scanning software. This saves multiple iterations of the manual mapping process, and allows agencies to be far more flexible and responsive.

A research agency, using these standard tools, could do the same thing.

In addressing this ambitious agenda, the design team faced some key challenges in meeting its design goals while taking into consideration the following:

Fundamentally, the UNICOM Intelligence Data Model is a set of data/metadata APIs to enable applications in a survey research environment to:

The UNICOM Intelligence Data Model hides the implementation details of underlying, package-specific storage architectures, so that programmers of application programs need not even be aware of them. Each user program is written using a standard interface. Thus, the application can be insulated from any changes to the underlying data storage format, software environment, and even hardware platform. Further, the UNICOM Intelligence Data Model provides functions to handle common tasks such as simple aggregation and the comparison and manipulation of multiple response data, freeing the programmer to focus on the unique aspects of the application.

The UNICOM Intelligence Data Model APIs are based on Microsoft technology, and make it straightforward to create custom applications in the Windows/ASP environment.

Case (respondent) data is read and written using the standard ADO interface, using a subset of standard SQL commands, extended with a set of functions which provide such features as the ability to access and manipulate multiple response (“multipunch”) category variables. Lower-level OLE DB interfaces are also planned.

Regardless of the underlying data storage format, the ADO provider exposes case data to the consumer application in a very simple virtual relational table. (Additional tables are used when representing hierarchical, or “levels” data.) The columns presented in this “VDATA” table correspond to the variables (for example, questions) requested in the application's SQL query, and the rows correspond to respondent records. The results of a query are presented by the provider as a rowset of the selected data. For example, the following query:

produces the following table:

▪The advert and advert_other columns contain the data from a single response categorical variable, advert. The last row includes an associated open-ended response, and the text for this Other Specify is recorded in the advert_other column.

The comprehensive metadata object model is designed to accommodate all required project information other than the case data--that is, information about the study, rather than the respondents, for example:

This information is stored in an XML file. The UNICOM Intelligence Data Model interacts with the XML document through another Microsoft standard, the Document Object Model (DOM), and provides a straightforward API for application programs.

In the preceding chart:

▪Labels hold any other text, and are stored in a multidimensional array according to multiple attributes: user context (for example, question prompts, interviewer instructions), language, and application context (for example, data collection vs. analytic database).

▪Custom Properties are user-defined fields used to extend the metadata content. These can be used, for instance, to hold column locations, tabulation instruction tags--or whatever unique information your authoring tool, questionnaire database, or other application requires.

▪Routings currently hold question order for each version (for example, CATI, Web, WAP) and simple questionnaire logic such as skips and filters. Full, “CATI-style” questionnaire logic is planned for future implementation.

Information on questionnaire versions is also maintained by the Metadata Model. Once a version number is specified by the end-user application, the current view shown to the application will be of that version alone. This permits applications that must be aware of changes to questionnaires over time, such as tabulation databases and OLAP environments, to access exactly what was asked in each wave, for instance, of a tracking study. A synchronization process, initiated as each version is finalized and “locked in”, allows each version to store new, unique information separately, while incorporating information common to multiple versions by reference.

The following example shows the six lines of Visual Basic code required to open a metadata object, query it and print out the names of the question fields. Compare this with, say, the time required to write a program to extract the same information from a Triple-S metadata file--or from a closed architecture such as Quanvert, where it can't be done at all.

' VB snippet to print variable names from an MDM document

Dim Document As New MDMLib.Document
Dim V As MDMLib.VariableInstance

Document.Load "MyMetadata.mdd"

For Each V In Document.Variables
Print V.Name ' print names of VariableInstances

Next

The abstraction layer, which makes diverse data and metadata storage formats accessible through the UNICOM Intelligence Data Model APIs in real time, at the individual transaction-level, is achieved through the creation of data source components, or DSCs. These are programs (COM objects), individually written for each storage format, which serve as adapters or translators. Where the high level of integration provided by data source components might not be required--where batch conversions are sufficient, for instance--simpler programs using the API can be written to import studies from proprietary formats into the UNICOM Intelligence Data Model, and thus to the many data formats supported by the UNICOM Intelligence Data Model.

Within the UNICOM Systems, Inc. product line, DSCs have been or will be made available for Quanvert data, IBM SPSS Statistics .sav files, UNICOM Intelligence Interviewer Server's relational database, and so on. A software developer's library is being made available to assist other software suppliers, research agencies with proprietary databases, and others in creating these components and with using the API. Once a DSC is complete, the agency or supplier will have the opportunity, over time, to integrate proprietary applications at the transaction level with the full line of UNICOM Systems, Inc. products and with those of other suppliers who have written DSCs--and import and export programs for these packages should be a thing of the past.

The UNICOM Intelligence Data Model is “open” on two levels: first, it is built around open, industry-standard technologies; and second, it provides an open, published interface (API) for the use of one and all. The first allows DM-enabled programs to interact smoothly with a large body of other standards-based software, allowing them to be combined easily as components of more complex systems. The second allows others outside UNICOM Systems, Inc.--clients, partner companies and even competitors--to easily take advantage of the benefits offered by the UNICOM Intelligence Data Model.

The UNICOM Intelligence Data Model is built around industry-standard, open software technologies. From a practical standpoint, the UNICOM Intelligence Data Model's functionality is provided by a set of COM components, implemented as dynamically linked libraries (DLLs). COM is Microsoft's Component Object Model, which allows programs to expose a program interface to other programs. This enables Windows applications to open and control sessions in other programs such as Excel, Word, and PowerPoint. Many survey research agencies use this functionality in Microsoft Office to automate tasks such as report and graphics production. The COM standard facilitates the combining of “component” programs such as the UNICOM Intelligence Data Model, the Office products and many others to build applications.

Your proprietary applications can control the entire process in a straightforward way, using Visual Basic or any of the languages that work with these standards. Thus, our products and others can be combined and extended to create your own proprietary applications--you can add your own unique value, and your own look and feel, combining and adding onto these building blocks. You need not go to the expense and trouble of recreating core functionality--your investment can be focused on those specific areas where your products break new ground.

The use of open standards provides a tremendous efficiency benefit at every level--whether you're simply loading a data set into Excel with a few lines of VBA, or you're creating a complex tracking environment or a marketing information portal. Moreover, because you can work with survey data independently of the underlying storage format, you are not tied to any one product or software supplier. The result is greater flexibility and improved business processes.

In order to encourage adoption of the UNICOM Intelligence Data Model, the APIs are being published and a developer's kit is being made available to the industry. The data of this conference happens to coincide exactly with the public release of the UNICOM Intelligence Developer Documentation Library. Only if the UNICOM Intelligence Data Model--or something like it--is adopted broadly, however, will it yield all of its potential benefits.

To this end, UNICOM Systems, Inc. is supporting industry standards efforts wherever possible. In the MR area, our director of development has been working with the Triple-S committee, and will contribute in any practical way to the converging of our UNICOM Intelligence Data Model with this published standard.

On another front, the Object Management Group (OMG) has published the Common Warehouse Metamodel (CWM), a specification for metadata interchange among data warehousing, business intelligence, knowledge management and portal technologies.

This is a comprehensive standard. It includes relational database structures, OLAP storage, XML structures, data transformation specifications, application of analytic techniques, specification of presentation formats for information, and operational parameters and metrics for data warehouse operation (see CWM development). UNICOM Systems, Inc. has business intelligence products in areas such as analytical CRM, business metrics, and data mining, which currently utilize proprietary, XML-based metadata formats. UNICOM Systems, Inc. is currently working on implementation of a metadata repository that supports CWM-based XMI interchange, which would bridge existing internal standards. While much existing metadata maps fairly easily to the CWM standard, there are places where the standard's definition of variables would have to be extended to support required functionality in the UNICOM Systems, Inc. products. The tentative plan is to adopt this CWM-based repository when it becomes available. If this does in fact happen, then the MR division intends either to adopt the same metadata model or to link with it transparently.

Our goal at UNICOM Systems, Inc. is to build on the deep, long-term partnerships we have with our clients. We must earn and sustain their trust--they must know that we as a company are out to help them deliver the most effective possible solutions for their customers. UNICOM Systems, Inc. has a clear-cut commitment to open standards and systems, so as to allow clients maximum flexibility to extend and combine our products to create differentiated solutions that add value in turn for their customers. More and more, UNICOM Systems, Inc. products are offered both as turnkey solutions and as components, developers' tools that support our clients in this endeavor.

This isn't the traditional thinking in this industry. Companies have felt that protecting their secrets meant protecting their client bases. The problems that result have become painfully clear to UNICOM Systems, Inc. as we’ve had to rationalize, integrate and maintain within our own product line several parallel lines of traditional MR software that didn't really talk with one another, and somehow to integrate them with our wide array of analytic, business intelligence, graphics, and other applications.

Today, though, it is important to leave this kind of tactical thinking behind. As the MR industry changes, as it faces threats from consultancies and from the do-it-yourselfers, as demand emerges for more and more sophisticated data warehousing, analytic, and marketing information applications, the stakes are far too high. Our clients will be--and should be--intolerant of company policy that “traps” them within a product line.

It is important to all of us, as an industry, to think strategically about open systems and to support industry standards. By adopting standard technologies, by publishing our APIs, and by working with industry standards organizations, we can as an industry help our clients to compete effectively in this changing marketplace. And when our clients win, we win.

UNICOM Intelligence	Competitor	Research Agency
Smooth, end-to-end integration of UNICOM Systems, Inc. products.	Reduced costs for import/export features	Best of breed solutions
Open standards benefit our clients	Interoperability with multiple third party products through a single data model	Efficient development of new applications
Fewer barriers to adoption of our high-end tools	More credible multivendor solutions	Customization
Happier clients	Real-time native solutions	Efficiency gains
	Better integrated solutions	More options
	Happier clients	Relief for “all eggs in one basket” concerns
		Happier clients

The CWM was developed by a group of companies that included IBM, Oracle, and Hyperion. This standard might be merged with another standard from the Meta Data Group, until recently a separate coalition that included Informatica, Microsoft, and SAS. If and when the work of merging these standards is complete, the resulting specification will be issued by the OMG as the next version of the CWM. A single standard should allow users to exchange metadata between different products from different vendors with relative freedom.