Predefined Monitoring Templates

SysKit comes with predefined Monitoring Templates created by SysKit team according to technology’s best practices. They can be used to monitor specific servers out-of-the-box, or modified to answer the specific requirements.

The following monitoring templates are included in SysKit and come with the installation:

IIS Monitoring Template tracks the most critical performance counters for web server functionality.

Name Critical Warning Description
Web Service: Current Current Connections:* Number of current client connections to the W3Svc Service.
Web Service: Get Requests/sec: * The number of HTTP requests that are using the GET method per second.
Web Service: Bytes Total/sec:* The sum of Bytes Sent/sec and Bytes Received/sec. This is the total rate of bytes transferred by the W3Svc Service.
Web Service: Post requests/sec:* Shows the rate at which HTTP requests using the POST method are made. POST requests are used for forms or gateway requests.
ASP.NET: Requests Queued >1000 >350 The number of requests waiting for service from the queue. This number should be small, except during heavy traffic periods.
ASP.NET: Requests Wait Time >1200ms >800ms Number of milliseconds that the most recent ASP request was waiting in the queue.
ASP.NET: Application restarts The number of times that an application has been restarted during the Web server’s lifetime.
ASP.NET Applications: Requests/sec: __Total__ The number of requests executed per second. This represents the current throughput of the application.certain range.
ASP.NET: Worker Processes Restarts >5 >1 The number of times a worker process has been restarted on the server computer. Investigate if this counter increases unexpectedly.
ASP.NET Applications: Errors Total: __Total__ The total number of errors that occur during the execution of HTTP requests, including any parser, compilation, or run-time errors.
Web Service Cache: File Cache Hits % The ratio of user-mode file cache hits to total cache requests that have been made since the WWW service started up.
Web Service Cache:Kernel: URI Cache Hits % The ratio of Kernel: URI Cache Hits to total cache requests since the WWW service started.
Web Service Cache: File Cache Misses The number of unsuccessful lookups in the user-mode file cache that have been made since the WWW service started.
Web Service Cache: Kernel: URI Cache Misses The number of unsuccessful lookups in the kernel URI cache that have occurred since the WWW service started.
Active Server Pages: Request Execution time >5000 >2000 The number of milliseconds that the last ASP request took to execute. This value can be somewhat misleading because it is not an average.

SQL Monitoring Template tracks the most critical performance counters for SQL Server functionality.

Name Critical Warning Description
SQLServer: General Statistics: User Connections Counts the number of users currently connected to SQL Server.
SQLServer: Buffer Manager: Page life expectancy <250 <350 Indicates the number of seconds a page will stay in the buffer pool without references.
SQLServer: SQL Statistics: Batch Requests/Sec >1500 >1000 Number of Transact-SQL command batches received per second. This statistic is affected by all constraints. High batch requests mean good throughput.
SQLServer: SQL Statistics: SQL Compilations/Sec >150 >100 Number of SQL compilations per second. Indicates the number of times the compile code path is entered. Includes compiles caused by statement-level recompilations.
SQLServer: SQL Statistics: SQL Re-Compilations/Sec >2 >1 Number of statement recompiles per second. Counts the number of times statement recompiles are triggered. Generally, you want the recompiles to be low.
SQL Server: Buffer Manager: Lazy writes/sec >20 >15 Indicates the number of buffers written per second by the buffer manager’s lazy writer. The lazy writer eliminates the need for performing the frequent checkpoints to create available buffers.
SQLServer: Access Methods: Page Splits / Sec >300 >200 Number of page splits per second that occur as the result of overflowing index pages.
SQLServer: Buffer Manager: Buffer cache hit ratio <80 <90 Indicates the percentage of pages found in the buffer cache without having to read from disk. The ratio is the total of cache hits divided by the total of cache lookups over the last few thousand page accesses.
SQLServer: Locks: Lock Waits / Sec: _Total >2 >1 Number of lock requests per second that required the caller to wait.
SQL Server: Latches: Latch Waits/sec Number of latch requests that could not be granted immediately.
SQL Server: Latches: Total Latch Wait Time (ms) Total latch wait time (in milliseconds) for latch requests in the last second.
SQL Server: Access Methods: Full Scans/sec Number of unrestricted full scans per second. These can be either base-table or full-index scans.
SQLServer: Databases: Transactions/sec Number of transactions started for the database per second. Transactions/sec does not count XTP-only transactions.
SQLServer: Databases: Log Flushes/sec Number of log flushes per second.

SharePoint Monitoring Template tracks the most critical performance counters for SharePoint farms functionality.

Name Critical Warning Description
Sharepoint Publishing Cache: Total Number of cache Compactions >6 >2 Represents the number of cache compactions. If this number is frequently high, the cache size is too small for the data being requested.
Sharepoint Publishing Cache: Publishing cache hit ratio 1 <1 A low ratio can indicate that unpublished items are being requested, and these cannot be cached. If this is a portal site, the site might be set to require check-out, or many users have items checked out.
Sharepoint Publishing Cache: Publishing cache flushes/sec The counter indicates that site owners might be performing actions on the sites that are causing the cache to be flushed.
ASP.NET Applications: Cache API hit ratio The cache hit-to-miss ratio when accessed through the external cache APIs. This counter does not track use of the cache by the ASP.NET page framework.
ASP.NET Applications: Cache API trims >1 This component monitor returns the number of cache items that have been removed due to a memory limit being hit. Ideally, this number should be very low or zero.
ASP.NET: Requests Queued >1000 >350 The number of requests waiting for service from the queue. When this number starts to increment linearly with client load, the Web server has reached the limit of concurrent requests it can process.
ASP.NET: Requests Rejected >5 >2 The total number of requests not executed because of insufficient server resources to process them. This counter represents the number of requests that return a 503 HTTP status code.
Memory: Pages/sec >80 >20 Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults.
ASP.NET: Worker Processes Restarts >5 >1 The number of times a worker process has been restarted on the server computer. If this counter increases unexpectedly, you should investigate as soon as possible.
Web Service: Current Anonymous Users: _Total Shows the number of users who currently have an anonymous connection using the Web service.
Web Service: Current Connections: _Total Shows the current number of connections established with the Web service.
Web Service: Anonymous Users / second:_Total Shows the rate at which users are making anonymous connections using the Web service.

The monitoring templates below are located in Acceleratio’s repository and are ready for you to download and use.

Hyper-V monitoring template tracks the most critical Hyper-V counters for virtual machine management.

Name Warning Description
Hyper-V Hypervisor Logical Processor: Guest Run Time % >50% for > 5min The percentage of time spent by the processor in guest mode. This is the percentage of time guest mode is running on an LP or for the _Total the average percentage across all LPs.
Hyper-V Hypervisor Logical Processor: Hypervisor Run Time % >50% for > 5min The percentage of time spent by the processor in hypervisor mode. This is the percentage of time the Hypervisor is running on an LP or for _Total the average percentage across all LP.
Hyper-V Hypervisor Logical Processor: Idle Time % >50% for > 5min The percentage of time spent by the processor in an idle state. This is the percentage of time the LP is waiting for work for _Total the average percentage across all LP.
Hyper-V Hypervisor Logical Processor: Total Run Time % >101% The percentage of time spent by the processor in guest and hypervisor mode. This is just a sum of %Guest Run Time + % Hypervisor Runtime. This counter can go over 100% just slightly (<0.5%).
Hyper-V Virtual Machine Health Summary: Health Critical >1 The number of virtual machines that have critical health. If anything is critical it means some resource, has been exhausted or some other unrecoverable error has occurred.
Cluster CSV File System(*): Read Latency > 15ms Bad latency is the most common cause of non-optimal virtual machine performance. It significantly helps to know what the normal range is in an specific environment before troubleshooting.
Cluster CSV File System(*): Write Latency > 15ms Bad latency is the most common cause of non-optimal virtual machine performance. It significantly helps to know what the normal range is in an specific environment before troubleshooting.
Hyper-V Virtual Storage Device(*): Read Operations/sec This counter represents which virtual machines are generating the most storage read operations.
Hyper-V Virtual Storage Device(*): Write Operations/sec This counter represents which virtual machines are generating the most storage write operations.
Network Interface(*): Bytes Total/sec The percentage of network utilization is calculated by multiplying Bytes Total/sec by 8 to convert it to bits, multiply the result by 100, then divide by the network adapter’s current bandwidth.
Network Interface(*): Output Queue Length 1-2 The output queue length measures the number of threads waiting on the network adapter. If there are more than 2 threads waiting on the network adapter, then the network may be a bottleneck. Common causes of this are poor network latency and/or high collision rates on the network

Citrix monitoring template tracks the most critical Citrix performance counters for XenApp and XenDesktop functionality.

Name Warning Description
Citrix Broker Service: Brokered Sessions The number of virtual desktop sessions brokered by the Citrix Broker Service.
Citrix Broker Service: Database Transaction Errors/sec >0 The rate at which database transactions are failing when executed from Citrix Broker Service. High values may indicate connectivity issues of the Broker Service with the XenDesktop database.
CitrixCPUUtilizationMgmtUser: CPU Entitlement The number representing inbound bytes per second allowed to users.
CitrixCPUUtilizationMgmtUser: CPU Reservation The percentage of total computer CPU resource reserved for a user, should that user require it.
CitrixCPUUtilizationMgmtUser: CPU Shares The proportion of CPU resource assigned to a user.
CitrixCPUUtilizationMgmtUser: CPU Usage The percentage of CPU resource consumed by a user at a given time, averaged over a few seconds.
CitrixIMANetworking: BytesReceived/sec The number representing  inbound bytes per second on active IMA network.
CitrixIMANetworking: Bytes Sent/sec The number representing outbound bytes per second on active IMA network.
CitrixIMANetworking: Network Connections The number of active IMA network connections to other IMA servers.
CitrixLicensing: Average License Check-In Response Time The average license check-in response time in milliseconds.
CitrixLicensing: Average License Check-Out Response Time The average license check-out response time in milliseconds.
CitrixLicensing: License Server Connection Failure The number of minutes that the XenApp server has been disconnected from the License Server.

Distributed Cache monitoring template tracks the most critical SharePoint performance counters for caching functionality.
Caching functionalities, provided by the Distributed Cache service which is built on Windows Server AppFabric Cache, enable the SharePoint features to quickly retrieve data without any dependency on databases stored in SQL Server as everything is stored in memory.

It is essential to monitor the performance counters related to SharePoint Distributed Cache as the SharePoint performance relies on the provision of caching functionalities. Windows Server AppFabric performance counters allow you to monitor and troubleshoot caching. There are three counter categories related to the caching features.
The Distributed Cache Service should also be monitored for memory management, resiliency, and availability issues.
This is all very important to maintain the large amounts of information on the SharePoint Server – ensure that the information is fresh and readily available for the end user.

This template does not have defined thresholds as they greatly depend on the SharePoint farm.

Name Warning
SharePoint Distributed Cache Counters: Cache Data Transferred/sec
SharePoint Distributed Cache Counters: Database Transaction Errors/sec
SharePoint Distributed Cache Counters: Cache Hit Count
SharePoint Distributed Cache Counters: Cache Hit Ratio
SharePoint Distributed Cache Counters: Cache Miss Count
SharePoint Distributed Cache Counters: Cache Read Requests/sec
SharePoint Distributed Cache Counters: Cache Write Requests/sec
SharePoint Distributed Cache Counters: Total Cache Read Requests
SharePoint Distributed Cache Counters: Total Cache Write Requests
AppFabirc Caching/Cache: Cache Miss Percentage
AppFabirc Caching/Cache: Total Read-Through Misses
AppFabirc Caching/Cache: Total Read-Through Errors
AppFabirc Caching/Cache: Total Write-Behind Items Dropped
AppFabirc Caching/Host: Available Memory Percentage
AppFabirc Caching/Host: Cache Miss Percentage
AppFabirc Caching/Host: Gateway Faliure Percentage
AppFabirc Caching/Host: Request Processing Error Percentage
AppFabirc Caching/Host: Total Availabe Memory Bytes
AppFabirc Caching/Host: Total Failure Exceptions
AppFabirc Caching/Secondary Host: Total Replication Retries
 
See the Monitoring Templates article to learn more about monitoring the custom performance counters and important services. There, you can also find instructions and useful tips on how to download, import, and apply monitoring templates to computers or computer groups.