Data Management performance guidelines white paper > Data Source Components (DSCs) > Relational Database (RDB) CDSC
 
Relational Database (RDB) CDSC
Introduction
The Relational Database (RDB) CDSC reads and writes case data to a relational Microsoft SQL Server database. A version of the CDSC is also available for Oracle databases.
The RDB CDSC has the following features:
Feature
UNICOM Intelligence database
Storage type
Database
Read
Yes
Write/update
Yes
Unbounded loops (levels)
Yes
Native WHERE clause support
Yes
Native expression support includes:
Respondent.Serial system variable compared to numeric values (for example, Respondent.Serial >= 10000).
Long or categorical variable is not null (for example, Age IS NOT NULL).
Compressed format
No
Multiple-user read
Yes
Multiple-user write
Yes
Data storage
The RDB CDSC uses a fixed schema where each response is written as an individual row in the Responses2 and, depending on the variable type, to OtherData2.
The fixed schema has the following advantages:
The number of schema tables is fixed so that no new tables need to be created.
The number of columns in each of the tables is fixed and relatively small (the byte limit is unlikely to be reached).
Multiple projects can be stored in the same database, reducing database administrative overhead (for example, user permissions need to be granted only once).
The schema stores case data on a per-response basis, rather than allocating space for variables that might never be used.
Data is written with an INSERT statement more often than an UPDATE statement, which reduces overhead.
The fixed schema has the following disadvantages:
The schema makes it more difficult to implement native WHERE clause evaluation because the data for a single respondent record is stored as multiple rows in the database. The schema also makes it more difficult for third-party applications to directly access the data.
An entire respondent record cannot be added through a single INSERT statement; each response needs to be added with individual statements. However, the logical respondent record can still be committed through a single transaction.
Fast access of individual variables relies on the VariableID index on the Responses2 and OtherData2 tables. Because the VariableID index is automatically created, it can become fragmented over time, which reduces read performance.
The following table provides the overall storage file size (in megabytes) for the RDB CDSC:
RDB CDSC storage file size
skidemo
UNICOM Intelligence database
UNICOM Intelligence file
Quantum
Quanvert
SPSS
Total size
19
2.18
0.9
3.25
1.7
Backup
18.1
0.65
*
2
*
The total size of 19 MB was achieved after shrinking the database and truncating the transaction log. Before truncating the log and shrinking the database, the total file size was over 32 MB. The file size can be even larger, depending on fragmentation and fill factor size.
The storage size for the RDB CDSC is much larger than the other DSCs, due to the size of the indexes. The indexes account for approximately 50% of the total storage that is used by the CDSC.
The size of a full backup is not less than the database size after shrinking. Data cannot be opened directly from the backup. As such, the backup in only useful for data transfers and recovery through the SQL Server management tools.
Read performance
The following table provides the RDB CDSC read performance (measured in seconds):
RDB CDSC read performance
skidemo
UNICOM Intelligence database
UNICOM Intelligence file
Quantum
Quanvert
SPSS
1 variable
0.062
0.062
0.109
0.031
0.062
5 variables
0.11
0.063
0.204
0.078
0.078
All variables
1.078
0.562
0.843
0.672
0.61
The RDB CDSC has acceptable single variable read performance, providing acceptable response times for ad hoc tabulation scenarios.
Read performance for the entire data set is not as good as other DSCs, making the RDB CDSC less suited for data management activities.
Performance is acceptable for fast data export.
Read performance for a single row, when using a WHERE clause that can be evaluated natively in the RDB CDSC, is typically an order of magnitude faster than evaluating the WHERE clause in the data model. The following table shows read performance when querying a single row by serial number.
One row queried via a Serial WHERE clause
skidemo
UNICOM Intelligence database
UNICOM Intelligence file
Quantum
Quanvert
SPSS
All variables
0.01
0.156
0.359
0.167
0.09
The benefit of evaluating WHERE clauses natively reduces as more rows are returned. Native WHERE clause support provides better performance for scenarios where a small subset of the data set is being returned or updated.
Note Unbounded loops (levels) support was implemented in the RDB CDSC starting with UNICOM Intelligence 6.0.
Unbounded loops
Household
UNICOM Intelligence database
UNICOM Intelligence file
200K respondents
174.08
139.24
The RDB CDSC has acceptable unbounded loop read performance.
Write performance
The following table provides the RDB CDSC write performance (measured in records per second):
RDB CDSC write performance
skidemo
UNICOM Intelligence database
UNICOM Intelligence file
Quantum
Quanvert
SPSS
Records per second
80
1897
1523
*
1971
In comparison to the other DSCs, performance is poor when entire records are transferred to the RDB CDSC.
If the RDB DSC is to be used for offline analysis, it is not recommended that the data be transferred through the CDSC. Alternatively, SQL Server data management services must be used to transfer the data (for example, Replication or Integration Services).
The following table provides the update performance (measured in seconds) for every value in a single weight column:
RDB CDSC update performance
skidemo
UNICOM Intelligence database
UNICOM Intelligence file
Quantum
Quanvert
SPSS
1 variable
0.719
0.219
0.282
*
0.578
Usage recommendations
The recommended RDB CDSC usage is as follows:
UNICOM Intelligence. The RDB CDSC was designed for Web data collection. It is the only UNICOM Intelligence DSC that was tested for multiple-user write access and, as such, it is the only CDSC that should be used for Web or CATI data collection.
Ad hoc data analysis. The CDSC is suited for top-line analysis of live data collection, including the creation and update of weighting variables.
Data export to other formats. The RDB CDSC has acceptable read performance for the entire data set when exporting to other DSCs.
Web-based tabulation/analysis. The RDB CDSC is a suitable DSC for web-based tabulation, either through UNICOM Intelligence Reporter - Survey Tabulation or UNICOM Intelligence Reporter.
WHERE clauses evaluated natively. While the RDB CDSC has limited native WHERE clause support, WHERE clauses evaluated by the CDSC provide a significant performance benefit in the following scenarios:
Update of individual rows through UPDATE statements.
Filtering a small subset of data, either for tabulation purposes or for data export.
The RDB CDSC is not suited in the following scenarios:
Data management. Copying to the RDB CDSC is slow in comparison to the other DSCs. It is not recommended that the RDB CDSC be used as the output DSC for any data management steps (unless it is the last data management step before making the data set available for server based analysis). The only exception to this rule is when native WHERE clauses can be used to improve performance.
Offline tabulation/analysis. Using the RDB CDSC for offline tabulation requires Microsoft SQL Server to be installed on each client machine.
Portable data format. The RDB CDSC is a poor choice for distributing data. The database backup file is almost as large as the database (which is at least twice the size for other DSCs). Also, the backup must be manually restored as the CDSC does not have a feature for automatically restoring backups.
The UNICOM Intelligence Data File CDSC supports native WHERE clause support. Native WHERE clause support avoids the need to use the RDB CDSC for data management steps, which improves performance when database filters are evaluated.
Known performance issues
The following issues are known to impact RDB CDSC performance:
Natively evaluated WHERE clauses. Using WHERE clauses that can be evaluated by the CDSC will significantly improve performance.
Index fragmentation. Index fragmentation has a significant impact on read performance.
For large data sets that are collected using RDB CDSC, create a maintenance plan: see Reorganizing indexes.
For static databases that are used for analysis, you can rebuild the indexes by running these commands against the SQL Server database:
ALTERINDEX ALL ON Responses2 REBUILD ;
ALTERINDEX ALL ON OtherData2 REBUILD ;
ALTERINDEX ALL ON Variables REBUILD ;
Index statistics. Updating the index statistics can improve read performance when transferring a data set to the RDB CDSC. For example:
UPDATESTATISTICS Responses2 ;
UPDATESTATISTICS OtherData2 ;
UPDATESTATISTICS Variables ;
FavorSpeed versus FavorMemory. The MR Init Custom connection property can be used to specify whether the connection should favor memory use or speed. The valid settings include:
FavorMemory. Uses server-side cursors so that a single SQL Server connection can be reused across multiple commands. This mode is slower, but uses less memory (and fewer SQL Server connections) as the number of concurrent commands increases. This option can be used connections are created for data collection.
FavorSpeed. Uses default cursors that require a new connection for each command. This mode is considerably faster than the FavorMemory mode, but does use more memory when there are multiple commands. The FavorSpeed option also optimizes binding look-up by creating a look-up vector. FavorSpeed is the default setting and can be used by connections that are created for analysis.
See
Data Source Components (DSCs)