Interviewer - Server > Monitoring and tuning system performance > Monitoring and tuning the system > Cluster tuning > Monitoring using performance counters
 
Monitoring using performance counters
Performance counters provide important information in the tuning phase for a new cluster. Once the cluster is tuned, a subset of counters can be used to monitor ongoing stability and performance. There are additional counters that may be required when investigating an issue on the cluster.
Customers often ask about the impact to the cluster when monitoring performance counters. Performance counters are very lightweight and are updated in memory (whether they are monitored or not). Monitoring is also lightweight as long as a suitable sampling time is used. A sampling time of one minute should be sufficient in most cases. However, continuous sampling consumes large amounts of disk space. Therefore it is suggested that the monitoring tool be setup using a circular log with the option to overwrite older logs. Set the log size to give the system administration team time to investigate, or backup the log when an alert occurs (this may take some experimentation).
It is also recommended that monitoring be done from a remote computer rather than directly from the cluster computers. Refer to the following table for information on each counter. The counters can be used to investigate the most frequent bottlenecks.
Note The mrIWeb and mrIEngWS cluster monitoring average response time counters have been updated to average over the last interval rather than over the time since started. They are now PERF_AVERAGE_TIMER type counters instead of PERF_COUNTER_RAWCOUNT type counters.
GroupWhen to monitor?
Alert at
Counter
Servers
MemoryWhen issues arise
 
% Committed Bytes In Use
All UNICOM Intelligence
MemoryAlways
Consistently < 20% of installed RAM indicates insufficient memory; if alerted investigate which process is using the memory.
Available Mbytes
All UNICOM Intelligence
MemoryWhen issues arise
Sustained > 5
Page/sec
All UNICOM Intelligence
NetworkInterface Benchmarking
Sustained > 80% of bandwidth
Bytes Total/sec
All UNICOM Intelligence
NetworkInterface When issues arise
 
Output Queue Length
All UNICOM Intelligence
NetworkInterface When issues arise
 
Packets/sec
All UNICOM Intelligence
Processor(_Total) Always
>= 80% for more than 1 minute; if alerted investigate which process is using the CPU.
% Processor Time
All UNICOM Intelligence
SystemWhen issues arise
 
Context Switches/sec
All UNICOM Intelligence
SystemBenchmarking
Average value > 2
Processor Queue Length
All UNICOM Intelligence
ASP.NETWhen issues arise
 
Active Threads
All UNICOM Intelligence
ASP.NETWhen issues arise
 
Request Execution Time
All UNICOM Intelligence
ASP.NETWhen issues arise
 
Request Wait Time
All UNICOM Intelligence
ASP.NETAlways
>= 0 for more than 1 minute
Requests Queued
All UNICOM Intelligence
Interview Web [For all web tier instances] Always
>= 30 per sec
Active Threads
Accessories, Web
Interview Web [For all web tier instances] Always
>= 500ms
Average Response Time
Accessories, Web
Interview Web [For all web tier instances] Always
>= 0 for more than 1 minute
Current Queued Requests
Accessories, Web
Interview Web [For all web tier instances] Always
At each increase
Engines Failed
Accessories, Web
Interview Web [For all web tier instances] Always
 
Failed Transfers/sec
Accessories, Web
Interview Web [For all web tier instances] Always
>= 50 per sec
Server Requests/sec
Accessories, Web
Interview Web [For all web tier instances] Always
 
Total Cache Files
Accessories, Web
Interview Web [For all web tier instances] Always
 
Total Failed Transfers
Accessories, Web
Process [For all w3wp processes] Benchmarking
>= 80%
% Processor Time
All UNICOM Intelligence
Process [For all w3wp processes] Always
>= 30 per sec
Thread Count
All UNICOM Intelligence
Process [For all w3wp processes] When issues arise
Increasing without leveling off
Virtual Bytes
All UNICOM Intelligence
Process [For all w3wp processes] When issues arise
 
Virtual Bytes Peak
All UNICOM Intelligence
Process [For all w3wp processes] When issues arise
 
Working Set
All UNICOM Intelligence
Process [For all w3wp processes] When issues arise
 
Working Set Peak
All UNICOM Intelligence
APP_POOL_WAS(_Total) Always
At each increase
Total Worker Process Failures
Interviewing
APP_POOL_WAS(_Total) Always
Log only
Total Application Pool Recycles
Interviewing
Interview Engine [For all interview engines]Always
>= 30 per sec
Active Threads
Interviewing
Interview Engine [For all interview engines]Always
>= 100ms
Average Response Time
Interviewing
Interview Engine [For all interview engines]Always
Log only
Completes/sec
Interviewing
Interview Engine [For all interview engines]Always
Log only; ConnectionLimit will stop engines from overloading; use PercentLoaded for alerting.
Current Interviews
Interviewing
Interview Engine [For all interview engines]Always
>= 0 for more than 1 minute
Current Queued Requests
X
Interview Engine [For all interview engines]Always
>= 80%
Percent Loaded
Interviewing
Interview Engine [For all interview engines]Always
>= 50 per sec
Server Requests/sec
Interviewing
Interview Engine [For all interview engines]When issues arise
 
Total Interviews
Interviewing
PhysicalDisk(_Total) Always
>= 50%
% Disk Time
Database, FMRoot
PhysicalDisk(_Total)Always
>= 2
Avg. Disk Read Queue Length
Database, FMRoot
PhysicalDisk(_Total) Always
>= 2
Avg. Disk Write Queue Length
Database, FMRoot
Process(sqlservr)Always
>= 80% for more than 1 minute; if alerted investigate which process is using the CPU.
% Processor Time
Database
Process(sqlservr)When issues arise
 
Thread Count
Database
Process(sqlservr)When issues arise
 
Virtual Bytes
Database
Process(sqlservr)When issues arise
 
Virtual Bytes Peak
Database
Process(sqlservr)When issues arise
 
Working Set
Database
Process(sqlservr)When issues arise
 
Working Set Peak
Database
SQLServer:Access Methods When issues arise
 
Full Scans/sec
Database
SQLServer:Buffer Manager When issues arise
 
Buffer cache hit ratio
Database
SQLServer:General Statistics When issues arise
 
Logical Connections
Database
SQLServer:General Statistics When issues arise
 
User Connections
Database
SQLServer:Locks(_Total) Benchmarking
 
Lock Requests/sec
Database
SQLServer:Locks(_Total) Benchmarking
 
Lock Waits/sec
Database
SQLServer:Locks(_Total) When issues arise
 
Number of Deadlocks/sec
Database
SQLServer:Memory Manager When issues arise
 
Target Server Memory (KB)
Database
SQLServer:Memory ManagerWhen issues arise
 
Total Server Memory (KB)
Database
SQLServer:SQL StatisticsWhen issues arise
 
Batch Requests/sec
Database
Use the following recommendations when monitoring counters:
Monitor the counters marked as “Always” at all times. The “Alert at” column documents a level for the counter that is considered to be too high.
Monitor the counters marked as “Benchmarking” in addition the counters marked as “Always” when benchmarking. Look for levels greater than the “Alert at” values when tuning.
The counters marked as “When issues arise” may be useful when investigating issues.
Note that the suggested alerts do not always indicate that a cluster is about to fail. The alerts do indicate a cluster that has started to perform less optimally and may impact user experience. Therefore, you may want to adjust alerts taking into account your acceptable level of temporary performance degradation and your cluster support staff availability.
For a list of all UNICOM Intelligence Web and Interviewer Session Engine counters, see the UNICOM Intelligence Developer Documentation Library under “Performance counters for the UNICOM Intelligence Interviewer Web service” and “Performance counters for the UNICOM Intelligence Interviewer Interviewer Session Engine”.
It may be useful to create performance counter templates for each of these monitoring requirements to be available (when required).
Some counters are only created when the process is created. For example, the engine counters for a particular engine are only created when the engine is created. This makes it more difficult to create a template in Microsoft Performance Monitor for a production environment. However it is possible to monitor transient counters using systems management and monitoring tools (such as IBM Tivoli or Microsoft Systems Manager).
See
Project monitoring
See also
Cluster tuning