Useful queries
There is no recipe for log file investigation. However, there are some queries that prove to be useful or provide useful templates.
All: Finding the log file overlap
There are many instances when an incomplete set of log files is delivered. This query lists the start and end times for the log files in each directory, thereby showing the period of time where log files are available for all computers.
Once the log file overlap is defined, these settings can be used to filter the log file (as discussed in the previous section). It is important to setup the WHERE clause to filter for the log file types that are important to the investigation (each log file type overwrites at a different speed).
This example lists the minimum and maximum time in the IVW logs. The parameter for the call to Left depends on the log directory structure. The parameter should be set to include the directory that indicates the server.
SELECT
Min(DateTime), Max(DateTime), LogFile.Left(68)
FROM VDATA
GROUP BY LogFile.Left(68)
All: Finding errors
Use the following query to search for errors.
SELECT
Mid(LogFile, 54, 14), DateTime, Milliseconds, LogScope, LogLevel, Project, RespondentId, InterviewId, LogEntry
FROM VDATA
WHERE LogLevel * {error, fatal}
Use the following query to search for errors and warnings.
SELECT
Mid(LogFile, 54, 14), DateTime, Milliseconds, LogScope, LogLevel, Project, RespondentId, InterviewId, Mid(LogEntry, 35, 23)
FROM VDATA
WHERE LogLevel * {warning, error, fatal}
Extraneous errors and warnings are typically found in log files. In particular, there are often project errors that are ignored by scriptwriters. These may include missing template errors that show up on the web tier, or missing sample fields showing up on the interview tier. It is best to correct these errors, but when monitoring for stability and performance, you may want to create a query that filters out those persistent errors.
Web tier: Counting the number of IsActive timeouts
IsActive timeouts can indicate that engines are spending too long processing script logic. This can be an indication of high CPU, excessive script logic, or contention on an object used by the script.
The following example groups the IsActive timeouts by hour and minute, allowing for the identification of spikes.
SELECT
DatePart(DateTime, 'h') As Hour, DatePart(DateTime, 'n') As Minute, COUNT(*) As IsActiveTimeouts
FROM VDATA
WHERE Logentry.Find('IsActive') >= 0 AND LogLevel = {warning}
GROUP BY DatePart(DateTime, 'h'), DatePart(DateTime, 'n')
It may be useful to see which engines are causing the spikes. This query lists all the IsActive entries with the web computer reporting the timeout and the engine that caused the issue.
SELECT
Mid(LogFile, 54, 14), DateTime, Milliseconds, LogScope, Project, RespondentId, InterviewId, Mid(LogEntry, 35, 23)
FROM VDATA
WHERE LogEntry.Find('IsActive') >= 0 AND LogLevel = {warning}
If there are many IsActive timeouts, this query can be used to group IsActive timeouts into 15 minute intervals to compare with values from other queries.
SELECT
DateTime.DateOnly() As Day,CText(DateTime.DatePart('h'))+':'+CText(DateTime.DatePart('n')/15 * 15) As QuarterHour,
COUNT(*) As IsActiveErrors
FROM VDATA
WHERE LogEntry.Find('IsActive') >= 0 AND LogLevel = {warning}
GROUP BY
DateTime.DateOnly(),
DateTime.DatePart('h'),
DateTime.DatePart('n')/15
ORDER BY
DateTime.DateOnly(),
DateTime.DatePart('h'),
DateTime.DatePart('n')/15
Interview tier: Checking for engine shutdowns or starts
The following query lists engine shutdowns and starts, allowing them to be compared with times of other issues.
SELECT
Mid(LogFile, 54, 14), DateTime, Milliseconds, LogScope, Project, RespondentId, InterviewId, LogEntry
FROM VDATA
WHERE LogEntry.Find('Session Engine s') >= 0
If there are many shutdowns and restarts, check for spikes using the following query that groups engine shutdowns and starts. The query includes the log file directory and provides information about which computer the engine is on.
SELECT
Mid(LogFile, 54, 14),
DatePart(DateTime, 'h'),
DatePart(DateTime, 'n'),
SUM(LogEntry.Find('Interviewer Session Engine shutdown') = 0) As EngineShutdowns,
SUM(LogEntry.Find('Interviewer Session Engine start') = 0) As EngineStarts
FROM VDATA
WHERE Logentry.Find('Interviewer Session Engine s') >= 0
GROUP BY
Mid(LogFile, 54, 14),
DatePart(DateTime, 'h'),
DatePart(DateTime, 'n')
Use the following query to group engine shutdown/starts into 15 minute intervals in order to compare values from other queries.
SELECT
DateTime.DateOnly() As Day,
CText(DateTime.DatePart('h'))+':'+CText(DateTime.DatePart('n')/15 * 15) As QuarterHour,
SUM(LogEntry.Find('Interviewer Session Engine shutdown') = 0) As EngineShutdowns,
SUM(LogEntry.Find('Interviewer Session Engine start') = 0) As EngineStarts
FROM VDATA
WHERE Logentry.Find('Interviewer Session Engine s') = 0
GROUP BY
DateTime.DateOnly(),
DateTime.DatePart('h'),
DateTime.DatePart('n')/15
ORDER BY
DateTime.DateOnly(),
DateTime.DatePart('h'),
DateTime.DatePart('n')/15
Interview tier: Counting the number of interview starts and restarts
The following query counts the number of interview starts or restarts (in 15 minute intervals).
SELECT
DateTime.DateOnly() As Day,
CText(DateTime.DatePart('h'))+':'+CText(DateTime.DatePart('n')/15 * 15) As QuarterHour,
SUM(LogEntry.Find('Interview start') = 0) As InterviewStarts,
SUM(LogEntry.Find('Interview restart') = 0) As InterviewRestarts
FROM VDATA
WHERE LogLevel = {metric} AND (LogEntry.Find('Interview start') = 0 OR LogEntry.Find('Interview restart') = 0)
GROUP BY
DateTime.DateOnly(),
DateTime.DatePart('h'),
DateTime.DatePart('n')/15
ORDER BY
DateTime.DateOnly(),
DateTime.DatePart('h'),
DateTime.DatePart('n')/15
Web tier: Counting the number of timeouts per question
When a cluster is experiencing failover due to IsActive timeouts, it is possible that there are particular projects that are taking more time than expected. The following query can be used to find specific projects and questions that experience timeous, allowing the projects and questions to be further investigated. The query should be run at a time before the majority of the issues are encountered in order to determine which projects are taking excessive CPU (causing the entire cluster to become slower and possibly causing failovers).
The following query counts the number of timeouts by project and question type; timeouts that occur frequently can be investigated further.
SELECT
LogEntry.Mid(LogEntry.Find('I.Project') + 10, 8) As Project,
LogEntry.Mid(LogEntry.Find('HTTP Client')+ 12, 20) As Error,
IIf(LogEntry.Find('CreateInterview') = -1, IIf(LogEntry.Find('I.SavePoint') >= 0,
LogEntry.Mid(LogEntry.Find('I.SavePoint') + 12, LogEntry.Find('&', LogEntry.Find('I.SavePoint')) - LogEntry.Find('I.SavePoint')-12), '[Post]'), '[CreateInterview]') As SavePoint,
COUNT(*)
FROM VDATA WHERE LogEntry.Find('http') = 0 and LogLevel = {error}
GROUP BY
LogEntry.Mid(LogEntry.Find('I.Project') + 10, 8),
LogEntry.Mid(LogEntry.Find('HTTP Client')+ 12),
IIf(LogEntry.find('CreateInterview') = -1, IIf(LogEntry.Find('I.SavePoint') >= 0,
LogEntry.Mid(LogEntry.Find('I.SavePoint') + 12, LogEntry.Find('&', LogEntry.find('I.SavePoint')) - LogEntry.Find('I.SavePoint')-12), '[Post]'), '[CreateInterview]')
Interview tier: Counting the number of projects cached
The following query counts the number of projects cached in 15 minute intervals.
SELECT
DateTime.DateOnly() As Day,
CText(DateTime.DatePart('h'))+':'+CText(DateTime.DatePart('n')/15 * 15) As QuarterHour,
SUM(LogEntry.Find('cached') >= 0) As CachedProjects
FROM VDATA
WHERE Logentry.Find('cached') >= 0
GROUP BY
DateTime.DateOnly(),
DateTime.DatePart('h'),
DateTime.DatePart('n')/15
ORDER BY
DateTime.DateOnly(),
DateTime.DatePart('h'),
DateTime.DatePart('n')/15
Interview tier: Finding respondents who restart
The following query lists the respondents that restart an interview.
SELECT InterviewId, COUNT (*)FROM VDATA WHERE LogLevel={metric} GROUP BY InterviewId HAVING COUNT(InterviewId)<>2
Interview tier: Investigating a particular respondent
Use the following query when a particular respondent experiences an error.
SELECT Mid (LogFile, 54), DateTime, Milliseconds, LogScope, Project, RespondentId, InterviewId, LogEntry FROM VDATA WHERE RespondentId='1066724411' ORDER BY DateTime
See also