Replication with Infosphere CDC : IBM InfoSphere CDC for solidDB® reference : About IBM InfoSphere CDC : IBM InfoSphere CDC for solidDB® system requirements Disk space requirements
  
IBM InfoSphere CDC for solidDB® system requirements Disk space requirements
See also
Disk space
RAM requirements
About IBM InfoSphere CDC
Disk space
IBM InfoSphere CDC source system:
100 GB—Default value for the Staging Store Disk Quota for each instance of IBM InfoSphere CDC. Use the IBM InfoSphere CDC configuration tool to configure disk space for this quota.
5 GB—For installation files, data queues, and log files.
Global disk quota—Disk space is required on your source system for this quota which is used to store in-scope change data that has not been committed in your database. The amount of disk space required is determined by your replication environment and the workload of your source database. Use the mirror_global_disk_quota_gb system parameter to configure the amount of disk space used by this quota.
IBM InfoSphere CDC target system:
1 GB—The minimum amount of disk space allowed for the Staging Store Disk Quota for each instance of IBM InfoSphere CDC. The minimum value for this quota is sufficient for all instances created on your target system. Use the IBM InfoSphere CDC configuration tool to configure the disk space for this quota.
5 GB—For installation files, data queues, and log files.
Global disk quota—Disk space is required on your target system for this quota which is used to store LOB data received from your IBM InfoSphere CDC source system. The amount of disk space required is determined by your replication environment and the amount of LOB data you are replicating. To improve performance, IBM InfoSphere CDC will only persist LOB data to disk if RAM is not available on your target system. Use the mirror_global_disk_quota_gb system parameter to configure the amount of disk space used by this quota.
IBM InfoSphere CDC may require additional disk space in the following situations:
You are running large batch transactions in the database on your source system.
You are configuring multiple subscriptions and one of your subscriptions is latent. In this type of scenario, IBM InfoSphere CDC on your source system may persist transaction queues to disk if RAM is not available.
You are replicating large LOB data types.
You are replicating “wide” tables that have hundreds of columns.
You are performing regular back ups of your metadata with the dmbackupmd command-line utility.
See also
IBM InfoSphere CDC for solidDB® system requirements Disk space requirements
RAM requirements
RAM
Each instance of IBM InfoSphere CDC requires memory for the Java Virtual Machine (JVM). The following default values for memory are assigned:
1024 MB of RAM—Default value for each 64-bit instance of IBM InfoSphere CDC.
512 MB of RAM—Default value for each 32-bit instance of IBM InfoSphere CDC. Use the IBM InfoSphere CDC configuration tool to configure the memory for each instance of IBM InfoSphere CDC.
Note IBM InfoSphere CDC is predominantly a Java-based application. However, some portions of it are written in C. These portions of IBM InfoSphere CDC are not subject to the memory limits specified for the JVM.
Although IBM InfoSphere CDC memory requirements will fluctuate, you must work with your system administrator to ensure the allocated memory for each instance of the product is available at all times. This may involve deployment planning since other applications with memory requirements may be installed on the same server with IBM InfoSphere CDC. Using values other than the defaults or allocating more RAM than is physically available on your server should only be undertaken after considering the impacts on product performance.
IBM InfoSphere CDC source deployments may require additional RAM in the following scenarios:
You are replicating large LOB data types with your IBM InfoSphere CDC source deployment. These data types are sent to target while being retrieved from the source database. The target waits until all LOBs (for each record) are received before applying a row. LOBs are stored in memory as long as there is adequate RAM, otherwise they are written to disk on the target.
You are replicating “wide” tables with hundreds of columns.
You are performing large batch transactions in your source database rather than online transaction processing (OLTP).
Port requirements
IBM InfoSphere CDC requires that you allocate a set of ports for communications with other components in the replication environment. The ports must be accessible through firewalls, although you do not require access to the internet.
Protocol
Default port
Purpose
TCP
11101
Accepts connections from:
Management Console
Other installations of IBM InfoSphere CDC as a source of replication
Command line utilities
Assessing disk space and memory requirements
IBM InfoSphere CDC requires disk space and memory when it processes change data from your source database. In order to process change data efficiently and replicate these changes to your target system, it is very important that IBM InfoSphere CDC has adequate disk space and memory for each of the components described in this section.
Disk space requirements for the staging store
The IBM InfoSphere CDC staging store is located on your source system and is a cache of change data read from the database logs. The size of the staging store will increase as the product accumulates change data, and therefore you must plan your source environment accordingly, particularly disk space.
The disk space allocated to the staging store is controlled by the Staging Store Disk Quota value that is set when you create an instance with the IBM InfoSphere CDC configuration tool. In most cases, the default value is appropriate for IBM InfoSphere CDC source systems. Since the staging store is only used on source systems, you can reduce this value to the minimum of 1 GB if you are configuring a target instance of IBM InfoSphere CDC.
Note You can also allocate disk space to the staging store with the staging_store_disk_quota_gb system parameter in Management Console.
Memory requirements for the JVM (Java Virtual Machine)
As a Java-based product, IBM InfoSphere CDC requires you to allocate the maximum amount of memory (RAM) to be used by the Java Virtual Machine (JVM). This prevents IBM InfoSphere CDC from using all of the available memory on the system where it is installed. The Maximum Memory Allowed value is set on a per-instance basis for each instance you create for your source or target database. In most cases the default values are appropriate for 32-bit and 64-bit instances. However, if your database is processing an extremely heavy workload, you may have to adjust the default values. The RAM allocated must be physically available on your system.
Disk space requirements for the global disk quota
The global disk quota on your source and target systems is used for all capture components including temporary files, transaction queues, and LOBs which are staged on the target before being applied. IBM InfoSphere CDC will manage disk space utilization across all components as required.
Most databases have a mechanism that allows you to roll back or undo changes to your database by storing uncommitted changes. Similarly, IBM InfoSphere CDC uses this disk quota to store in-scope change data that has not been committed in your database. Once the database transaction is committed, the disk space used by the transaction is released. Long running open transactions will contribute to the amount of disk space used.
You can configure the amount disk space that is allocated to this quota with the mirror_global_disk_quota_gb system parameter. The default setting of this system parameter is such that IBM InfoSphere CDC will only stop replicating after this disk quota exhausts all available disk space on your system. If you would prefer IBM InfoSphere CDC to stop replicating after it uses a specific amount of disk space, you can specify the value with this system parameter in Management Console.
Sizing considerations for the staging store:
This topic outlines scenarios that will increase the disk requirements for the staging store on your source system. All of these scenarios should be kept in mind when you are planning the disk space requirements for your replication environment.
Latent subscriptions
The amount of data within the staging store is related to the latency of your subscriptions. IBM InfoSphere CDC measures latency as the amount of time that passes between when data changes on a source table and when it changes on the target table. For example, if an application inserts and commits a row into the source table at 10:00 and IBM InfoSphere CDC applies that row to the target table at 10:15, then the latency for the subscription is 15 minutes.
When all of your subscriptions are mirroring and have very little latency, the volume of data that needs to be kept in the staging store will be relatively small. If all of your subscriptions are mirroring but some are latent, the staging store will contain all the data generated by the logs for the latent subscriptions during the entire time they are mirroring. For example, if the difference in latency between the least latent subscription and the most latent subscription is 3 hours, and your database generates 100 GB of log data per hour, the staging store will require approximately 300 GB of disk storage space.
Inactive subscriptions
An inactive (not currently replicating) subscription that contains tables with a replication method of Mirror will continue to accumulate change data in the staging store from the current point back to the point where mirroring was stopped. For this reason, you should delete subscriptions that are no longer required, or change the replication method of all tables in the subscription to Refresh to prevent the accumulation of change data in the staging store on your source system.
Continuous Capture
Continuous Capture is designed to accommodate those replication environments in which it is necessary to separate the reading of the database logs from the transmission of the logical database operations. This is useful when you want to continue processing log data even if replication and your subscriptions stop due to issues such as network communication failures over a fragile network, target server maintenance, or some other issue. You can enable or disable Continuous Capture without stopping subscriptions.
Continuous Capture results in additional disk utilization on the source machine in order to accumulate change data from the database log file when these are not being replicated to the target machine. This change data is stored in the staging store. The additional disk utilization due to the accumulation of change data in the staging store should be evaluated and understood before deciding to use this feature in your replication environment.
See also
IBM InfoSphere CDC for solidDB® system requirements Disk space requirements