Developer Documentation Library > Interviewer - Server > Monitoring and tuning system performance > Monitoring and tuning the system > Cluster tuning > Monitoring using performance counters
 
Monitoring using performance counters
Performance counters provide important information in the tuning phase for a new cluster. Once the cluster is tuned, a subset of counters can be used to monitor ongoing stability and performance. There are additional counters that might be required when investigating an issue on the cluster.
Customers often ask about the impact to the cluster when monitoring performance counters. Performance counters are very lightweight and are updated in memory (whether they are monitored or not). Monitoring is also lightweight as long as a suitable sampling time is used. A sampling time of one minute should be sufficient in most cases. However, continuous sampling consumes large amounts of disk space. Therefore it is suggested that the monitoring tool be setup using a circular log with the option to overwrite older logs. Set the log size to give the system administration team time to investigate, or backup the log when an alert occurs (this might take some experimentation).
It is also recommended that monitoring be done from a remote computer rather than directly from the cluster computers. Refer to the following table for information on each counter. The counters can be used to investigate the most frequent bottlenecks.
The mrIWeb and mrIEngWS cluster monitoring average response time counters have been updated to average over the last interval rather than over the time since started. They are now PERF_AVERAGE_TIMER type counters instead of PERF_COUNTER_RAWCOUNT type counters.
Group
When to monitor?
Alert at
Counter
Servers
Memory
When issues arise
 
% Committed Bytes In Use
All UNICOM Intelligence
Memory
Always
Consistently < 20% of installed RAM indicates insufficient memory; if alerted investigate which process is using the memory.
Available Mbytes
All UNICOM Intelligence
Memory
When issues arise
Sustained > 5
Page/sec
All UNICOM Intelligence
NetworkInterface
Benchmarking
Sustained > 80% of bandwidth
Bytes Total/sec
All UNICOM Intelligence
NetworkInterface
When issues arise
 
Output Queue Length
All UNICOM Intelligence
NetworkInterface
When issues arise
 
Packets/sec
All UNICOM Intelligence
Processor(_Total)
Always
>= 80% for more than 1 minute; if alerted investigate which process is using the CPU.
% Processor Time
All UNICOM Intelligence
System
When issues arise
 
Context Switches/sec
All UNICOM Intelligence
System
Benchmarking
Average value > 2
Processor Queue Length
All UNICOM Intelligence
ASP.NET
When issues arise
 
Active Threads
All UNICOM Intelligence
ASP.NET
When issues arise
 
Request Execution Time
All UNICOM Intelligence
ASP.NET
When issues arise
 
Request Wait Time
All UNICOM Intelligence
ASP.NET
Always
>= 0 for more than 1 minute
Requests Queued
All UNICOM Intelligence
Interview Web [For all web tier instances]
Always
>= 30 per sec
Active Threads
Accessories, Web
Interview Web [For all web tier instances]
Always
>= 500ms
Average Response Time
Accessories, Web
Interview Web [For all web tier instances]
Always
>= 0 for more than 1 minute
Current Queued Requests
Accessories, Web
Interview Web [For all web tier instances]
Always
At each increase
Engines Failed
Accessories, Web
Interview Web [For all web tier instances]
Always
 
Failed Transfers/sec
Accessories, Web
Interview Web [For all web tier instances]
Always
>= 50 per sec
Server Requests/sec
Accessories, Web
Interview Web [For all web tier instances]
Always
 
Total Cache Files
Accessories, Web
Interview Web [For all web tier instances]
Always
 
Total Failed Transfers
Accessories, Web
Process [For all w3wp processes]
Benchmarking
>= 80%
% Processor Time
All UNICOM Intelligence
Process [For all w3wp processes]
Always
>= 30 per sec
Thread Count
All UNICOM Intelligence
Process [For all w3wp processes]
When issues arise
Increasing without leveling off
Virtual Bytes
All UNICOM Intelligence
Process [For all w3wp processes]
When issues arise
 
Virtual Bytes Peak
All UNICOM Intelligence
Process [For all w3wp processes]
When issues arise
 
Working Set
All UNICOM Intelligence
Process [For all w3wp processes]
When issues arise
 
Working Set Peak
All UNICOM Intelligence
APP_POOL_WAS(_Total)
APP_POOL_WAS(_Total)
At each increase
Total Worker Process Failures
Interviewing
APP_POOL_WAS(_Total)
APP_POOL_WAS(_Total)
Log only
Total Application Pool Recycles
Interviewing
Interview Engine [For all interview engines]
Always
>= 30 per sec
Active Threads
Interviewing
Interview Engine [For all interview engines]
Always
>= 100ms
Average Response Time
Interviewing
Interview Engine [For all interview engines]
Always
Log only
Completes/sec
Interviewing
Interview Engine [For all interview engines]
Always
Log only; ConnectionLimit will stop engines from overloading; use PercentLoaded for alerting.
Current Interviews
Interviewing
Interview Engine [For all interview engines]
Always
>= 0 for more than 1 minute
Current Queued Requests
X
Interview Engine [For all interview engines]
Always
>= 80%
Percent Loaded
Interviewing
Interview Engine [For all interview engines]
Always
>= 50 per sec
Server Requests/sec
Interviewing
Interview Engine [For all interview engines]
When issues arise
 
Total Interviews
Interviewing
PhysicalDisk(_Total)
Always
>= 50%
% Disk Time
Database, FMRoot
PhysicalDisk(_Total)
Always
>= 2
Avg. Disk Read Queue Length
Database, FMRoot
PhysicalDisk(_Total)
Always
>= 2
Avg. Disk Write Queue Length
Database, FMRoot
Process(sqlservr)
Always
>= 80% for more than 1 minute; if alerted investigate which process is using the CPU.
% Processor Time
Database
Process(sqlservr)
When issues arise
 
Thread Count
Database
Process(sqlservr)
When issues arise
 
Virtual Bytes
Database
Process(sqlservr)
When issues arise
 
Virtual Bytes Peak
Database
Process(sqlservr)
When issues arise
 
Working Set
Database
Process(sqlservr)
When issues arise
 
Working Set Peak
Database
SQLServer:Access Methods
When issues arise
 
Full Scans/sec
Database
SQLServer:Buffer Manager
When issues arise
 
Buffer cache hit ratio
Database
SQLServer:General Statistics
When issues arise
 
Logical Connections
Database
SQLServer:General Statistics
When issues arise
 
User Connections
Database
SQLServer:Locks(_Total)
Benchmarking
 
Lock Requests/sec
Database
SQLServer:Locks(_Total)
Benchmarking
 
Lock Waits/sec
Database
SQLServer:Locks(_Total)
When issues arise
 
Number of Deadlocks/sec
Database
SQLServer:Memory Manager
When issues arise
 
Target Server Memory (KB)
Database
SQLServer:Memory Manager
When issues arise
 
Total Server Memory (KB)
Database
SQLServer:SQL Statistics
When issues arise
 
Batch Requests/sec
Database
Use the following recommendations when monitoring counters:
Monitor the counters marked as “Always” at all times. The “Alert at” column documents a level for the counter that is considered to be too high.
Monitor the counters marked as “Benchmarking” in addition the counters marked as “Always” when benchmarking. Look for levels greater than the “Alert at” values when tuning.
The counters marked as “When issues arise” might be useful when investigating issues.
Note that the suggested alerts do not always indicate that a cluster is about to fail. The alerts do indicate a cluster that has started to perform less optimally and might impact user experience. Therefore, you might want to adjust alerts taking into account your acceptable level of temporary performance degradation and your cluster support staff availability.
For a list of all UNICOM Intelligence Web and Interviewer Session Engine counters, see the UNICOM Intelligence Developer Documentation Library under “Performance counters for the UNICOM Intelligence Interviewer Web service” and “Performance counters for the UNICOM Intelligence Interviewer Interviewer Session Engine”.
It might be useful to create performance counter templates for each of these monitoring requirements to be available (when required).
Some counters are only created when the process is created. For example, the engine counters for a particular engine are only created when the engine is created. This makes it more difficult to create a template in Microsoft Performance Monitor for a production environment. However it is possible to monitor transient counters using systems management and monitoring tools (such as IBM Tivoli or Microsoft Systems Manager).
See
Project monitoring
See also
Cluster tuning
Project monitoring
The stability and performance of a UNICOM Intelligence cluster can vary greatly depending on the active projects. Large projects with lots of variable instances require more memory. Projects with complex routing require more CPU resources. Projects that use external COM objects (using CreateObject) can be impacted by the requirements of the external objects. For example, some objects might limit the number of simultaneous accesses, resulting in contention issues in a server environment. Other objects might load files into memory that increase the memory requirements. ADO objects could impact the load on the database server.
You can monitor and track many of these issues by using the performance counters (see Monitoring using performance counters). To monitor the actual performance of each project, and therefore the performance seen by the respondents (or interviewers), use the “per project” counters.
Monitoring per project performance counters becomes more of a challenge as active projects change over time. Companies with programmatic activation processes have an advantage in that the counters can be programmatically added into the monitoring tool. Alternatively, you can monitor transient counters by using tools such as IBM Tivoli or Microsoft Systems Manager.
Available project counters
Group
When to monitor
Alert at
Counter
Servers
Interview project
Always
> 1 second
Web - Average time to start interview
Interviewing
Interview project
Always
> 4 seconds
Web - Maximum time to start interview
Interviewing
Interview project
Always
> 1 second
Web - Average time page-to-page
Interviewing
Interview project
Always
> 2 seconds
Web - Maximum time page-to-page
Interviewing
Interview project
Always
> 1 second
Telephone - Average time to start interview
Interviewing
Interview project
Always
> 4 seconds
Telephone - Maximum time to start interview
Interviewing
Interview project
Always
> 1 second
Telephone - Average time page-to-page
Interviewing
Interview project
Always
> 2 seconds
Telephone - Maximum time page-to-page
Interviewing
For a list of all UNICOM Intelligence project counters, see Performance counters for UNICOM Intelligence Interviewer projects.
See also
Monitoring using performance counters