Tuesday, January 14, 2014

Using SQL Server to Report on your IIS Log Data

I shared this post on a SQL Tuesday but am going to modify it a bit so folks better understand what it's all about.  In this post I'm going to share how you can set up a very cost effective process to report on your IIS log data.

Where are your IIS logs?
The first thing you need to do is locate the correct log directory for the website you are interested in.  If you open IIS and click on the Sites, you will see associated ID for that site.  Remember that, then in Windows Explorer browse to C:\inetpub\logs\LogFiles\W3SVC<ID>.  So if the ID of your website is 2, you would find your logs in the W3SVC2 folder (I will use this as the example going forward).

Copy the logs
The next thing I did was to create a batch file and place it in the same folder as the logs.  You don't have to do it this way, but I chose to.  There will be a single line in this batch file:

robocopy C:\inetpub\logs\LogFiles\W3SVC2\ \\DB_SERVER\SHARE\IISLogs\ /maxage:1

I used Windows Scheduler to kick off this batch file every 10 minutes, and since we roll our logs over nightly this will get the current and previous days (maxage:1 means one day).  The reason for the frequency may vary depending on how accurate you want your data.  If our websites have an issue, I prefer to have pretty current data to dive into versus having to open the log itself.

Create the database
Now you'll want to set up your IISLogs database - the schema for the table is as follows:
CREATE TABLE [dbo].[LogData](
[LogFilename] [varchar](255) NOT NULL,
[RowNumber] [int] NOT NULL,
[date] [datetime] NULL,
[time] [datetime] NULL,
[sIp] [varchar](255) NULL,
[csMethod] [varchar](255) NULL,
[csUriStem] [varchar](255) NULL,
[csUriQuery] [varchar](255) NULL,
[sPort] [int] NULL,
[csUsername] [varchar](255) NULL,
[cIp] [varchar](255) NULL,
[csUserAgent] [varchar](255) NULL,
[scStatus] [int] NULL,
[scSubstatus] [int] NULL,
[scWin32Status] [int] NULL,
[timeTaken] [int] NULL,
 CONSTRAINT [PK_LogData] PRIMARY KEY CLUSTERED 
(
[LogFilename] ASC,
[RowNumber] ASC
)

CREATE TABLE [dbo].[IISDashboard](
[LogFilename] [varchar](255) NOT NULL,
[LogDate] [date] NOT NULL,
[Server] [varchar](10) NOT NULL,
[Records] [int] NOT NULL,
[CallsAbandoned] [int] NOT NULL,
[CallsAnswered] [int] NOT NULL,
[ID30Plus] [int] NOT NULL,
[ID10Plus] [int] NOT NULL,
[ID2Minus] [int] NOT NULL,
[ID200Errors] [int] NOT NULL,
[ID500Errors] [int] NOT NULL,
[GoogleBotHits] [int] NOT NULL,
[GoogleBotLoadTime] [int] NOT NULL,
[BingBotHits] [int] NOT NULL,
[BingBotLoadTime] [int] NOT NULL,
[MSNBotHits] [int] NOT NULL,
[MSNBotLoadTime] [int] NOT NULL,
[BotHits] [int] NOT NULL,
[BotLoadTime] [int] NOT NULL,
 CONSTRAINT [PK_IISDashboard] PRIMARY KEY CLUSTERED 
(
[LogFilename] ASC,
[LogDate] ASC,
[Server] ASC
)

Import the logs
Now that the logs are over on the database server, it's time to import them.  I use logparser from Microsoft to do this.  You'll want to create a SQL Server Agent job and run it every 10 minutes or so, making sure you don't bump heads with the copy job from the IIS server.  Stagger them accordingly.  You'll note the script dynamically creates the file name for me - the rest is pretty straight forward.  Yes, I have xp_cmdshell enabled - if you don't allow the use of xp_cmdshell, you may have to cook up an alternative way to import the data.  You'll also note I clear out the current day and reload it all - that's much easier than trying to figure out what's already been imported from a text file and what has not.  Shortly after midnight I reload the entire previous day (the reason I copy two logs each time) - that ensures I have the entire day of data loaded for that log file.

DECLARE @TodaysDate VARCHAR(6)
   ,@LogFileName VARCHAR(100)
   ,@LogParserCommand VARCHAR(1000)
   ,@FileCheckCommand VARCHAR(1000)
   ,@FileCheck TINYINT

SELECT  @TodaysDate = CONVERT(VARCHAR(6),GETDATE(),12)
SET @LogFileName = 'I:\PathTo\IISLogs\u_ex' + @TodaysDate + '.log'

DELETE  FROM IISLogs.dbo.LogData
WHERE   LogFilename = @LogFileName
            
SET @FileCheckCommand = 'DIR "' + @LogFileName + '" /B'
EXEC @FileCheck = master.dbo.xp_cmdshell @FileCheckCommand,NO_OUTPUT
IF @FileCheck = 1 
    PRINT 'File does not exist.'
ELSE 
    BEGIN
        PRINT 'Importing ' + @LogFileName
        SET @LogParserCommand = 'C:\"Program Files (x86)"\"Log Parser 2.2"\LogParser "SELECT * INTO LogData FROM '
            + @LogFileName
            + '" -i:W3C -o:SQL -server:DB_SERVER\INSTANCE -database:IISLogs -driver:"SQL Server" -username:<username> -password:<password>'
        EXEC master..xp_cmdshell @LogParserCommand
    END

-- CLEANUP
EXEC master.dbo.xp_cmdshell 'del I:\PathTo\IISLogs\*.* /F /Q'

Report on the data
Well, the hard part is over.  The data is in SQL Server, and now the fun begins - doing something with it.  Personally I set up an SSRS report that I receive and review each morning.  It takes me a few seconds to identify if there are any performance issues that require investigation.  The primary report is below, as is the query.  I actually roll up the data each iteration of the scheduled job so the dashboard is quick (versus plowing through tons of data each refresh).  You'll note I have two web servers (whited out under Page Hits) load balanced, but this process is very repeatable regardless of how many servers you have.  I also make some pieces of data appear to be hyperlinks, but it's really just blue font and underlined so people know they can further dive into the data.  The call data in the report comes from our I3 phone system - I have removed those from the query to avoid confusion.  If you wish to have a copy of the RDL, feel free to contact me.









DECLARE @Today DATE = GETDATE()
   
DELETE  FROM IISLogs.dbo.IISDashboard
WHERE   LogDate = @Today

INSERT  INTO IISLogs.dbo.IISDashboard
        ( 
         LogFilename
        ,LogDate
        ,Server
        ,Records
        ,CallsAbandoned
        ,CallsAnswered
        ,ID30Plus
        ,ID10Plus
        ,ID2Minus
        ,ID200Errors
        ,ID500Errors
        ,GoogleBotHits
        ,GoogleBotLoadTime
        ,BingBotHits
        ,BingBotLoadTime
        ,MSNBotHits
        ,MSNBotLoadTime
        ,BotHits
        ,BotLoadTime
        )
        SELECT  LEFT(LogFilename,16) AS LogFileName
               ,CONVERT(DATE,DATEADD(hh,DATEDIFF(hh,GETUTCDATE(),GETDATE()),
                                     [Date] + CAST(CONVERT(TIME,[Time]) AS DATETIME))) AS 'LogDate'
               ,SUBSTRING(LogFilename,CHARINDEX('SERVER_PREFIX',LogFilename),9) AS 'Server'
               ,COUNT(*) AS 'Records'
               ,SUM(CASE WHEN timeTaken > 30000 THEN 1
                         ELSE 0
                    END) AS '30Plus'
               ,SUM(CASE WHEN timeTaken > 10000 THEN 1
                         ELSE 0
                    ENDAS '10Plus'
               ,SUM(CASE WHEN timeTaken < 2000 THEN 1
                         ELSE 0
                    ENDAS '2Minus'
               ,SUM(CASE scStatus
                      WHEN 200 THEN 1
                      ELSE 0
                    ENDAS '200Errors'
               ,SUM(CASE scStatus
                      WHEN 500 THEN 1
                      ELSE 0
                    ENDAS '500Errors'
               ,SUM(CASE WHEN csUserAgent LIKE '%googlebot%' THEN 1
                         ELSE 0
                    ENDAS 'GoogleBotHits'
               ,AVG(CASE WHEN csUserAgent LIKE '%googlebot%' THEN timeTaken
                         ELSE 0
                    ENDAS 'GoogleBotLoadTime'
               ,SUM(CASE WHEN csUserAgent LIKE '%bingbot%' THEN 1
                         ELSE 0
                    ENDAS 'BingBotHits'
               ,AVG(CASE WHEN csUserAgent LIKE '%bingbot%' THEN timeTaken
                         ELSE 0
                    ENDAS 'BingBotLoadTime'
               ,SUM(CASE WHEN csUserAgent LIKE '%msnbot%' THEN 1
                         ELSE 0
                    ENDAS 'MSNBotHits'
               ,AVG(CASE WHEN csUserAgent LIKE '%msnbot%' THEN timeTaken
                         ELSE 0
                    ENDAS 'MSNBotLoadTime'
               ,SUM(CASE WHEN csUserAgent LIKE '%bot%' THEN 1
                         ELSE 0
                    ENDAS 'BotHits'
               ,AVG(CASE WHEN csUserAgent LIKE '%bot%' THEN timeTaken
                         ELSE 0
                    ENDAS 'BotLoadTime'
        FROM    IISLogs.dbo.LogData
        WHERE   CONVERT(DATE,DATEADD(hh,DATEDIFF(hh,GETUTCDATE(),GETDATE()),
                                     [Date] + CAST(CONVERT(TIME,[Time]) AS DATETIME))) = @Today
                AND csUriStem LIKE '%.aspx%'
                AND DATEPART(dw, CONVERT(DATE,DATEADD(hh,DATEDIFF(hh,GETUTCDATE(),GETDATE()),
                                                  [Date] + CAST(CONVERT(TIME,[Time]) AS DATETIME)))) BETWEEN 2 AND 6
        GROUP BY LEFT(LogFilename,16)
               ,CONVERT(DATE,DATEADD(hh,DATEDIFF(hh,GETUTCDATE(),GETDATE()),
                                     [Date] + CAST(CONVERT(TIME,[Time]) AS DATETIME)))
               ,SUBSTRING(LogFilename,CHARINDEX('SERVER_PREFIX',LogFilename),9)