The following question came up today: “The activity chart for live user sessions is showing a large number of one-hit sessions and a very large number of “Non-pages”. How can I figure out what is going on”?
You are assuming that just because the portal’s report says “non-Pages (.gifs and .jpgs)”, that the 1-hit sessions really are gifs and jpgs. But in fact, this activity statistics report is misleading – and I can never get a good explanation of “non-pages”. It has something to do with the Content-Type (one of the HTTP Response headers), and the file suffix, and some arcane algorithm the PCA has for deciding if a hit is a page or a “non-page”. This chart is telling you there are 20,000 hits that are considered “non-pages”. You will have to use RTV to get a better understanding. Look first in the Portal for “Sessions per hour” activity chart. Find the hours (probably small of the night), where the session count is smallest, and the curve is “flattest”. Use RTV to Search for sessions having a hit count = 1, for a 30 minute period right in the middle of the quietest time. Most of the sessions returned should be “whatever” is generating these 1-hit non-page sessions.
Once you have the list of 100 such sessions downloaded to RTV, customize the results pane (lower pane), to include the following columns: REMOTE_ADDR; HTTP_USER_AGENT; env\CONTENT_TYPE and env\StatusCode. That set of additional columns, along with the URL already present, should give you a good start towards finding out who/what is hitting the site all day long and generating 1-hit non-page sessions.
One of the most common sources of “non-page” hits is the F5 Load Balancer from BigIP. It uses port 80 by default for “keep-alive” connections to the web servers that it is balancing. It seems that by default, it sends lots (maybe once per second?) per web server. If you suspect this, the Request block, in RTV, will be absolutely minimal; the F5 is just doing a connect/waiting for the Ack, and then doing a disconnect. ReqCancelled=Client, StatusCode=0, and the REOTE_ADDR will be the IP address of the F5 Load Balancer (probably two similar but different REMOTE_ADDR, because these load balancers are normally deployed in pairs – active/standby).
Great article, we have the exactly the same issue at my organsation and this sort of information really has really helped us out.
ReplyDelete