Monday, March 21, 2011

Why do I see multiple similar sessions when displaying search results in RTV

This is caused by session fragmentation, combined with Automerge. Session fragmentation occurs in the persistent data storage (the Long Term Canister LTC) because users walk away from their browser for long periods. They may return over an hour later, but have not closed their browser window. After a period of inactivity, Tealeaf is configured to move the user's hits from RAM (Short Term Canister (STC)) over to the LTC. These hits are given a FragmentID (AKA Session ID) – an integer with values that range from “1” up to around 2 billion. Every hit in this fragment also has a copy of the TLTSID and all hits in this fragment have the same TLTSID value. When the user returns after an hour, and starts sending hits again, Tealeaf holds these hits in STC (RAM) with a different FragmentID, but every hit still has the same TLTSID value as the hits of the previous fragment. This may happen four or five times over the course of a work day. As long as the browser remains open, all hits have the same TLTSID value, but different groups of hits will have a different FragmentID. Now when you search the LTC for a session, the system returns to RTV the FragmentID where the search terms match. RTV then asks the Tealeaf system for all of the hits in that fragment, AND if Automerge is ON, RTV asks the Tealeaf system for all of the hits of all the fragments within an 6-hour window on either side of the fragment having the match. In the RTV search results panes, each line is a session that the search found. The left column is the Session Identifier. If Automerge succeeded, the value shown in this column for the session is the 32 character TLTSID, then a dash, then the FragmentID. If Automerge didn't find any other fragments, the value of this column for the session shows only the FragmentID integer. But why does RTV sometimes show two or more identical rows having nearly identical Session Identifier values? If the search matched in two or more fragments of the same session, then Automerge runs for each fragment that matches. If the session lasts long enough, some fragments are outside of the six-hour window on either side of the fragment that matches. So Automerge is going to reassemble different sets of fragments depending on which fragment the search term was found in. You will see a row for each session fragment that matched, and the Session Identifier column for each row will have the same 32 character TLTSID, but different FragmentIDs! The other columns of each row shows how many hits were assembled on each side of the matching fragment, the timestamp of the first fragment, and the duration of the merged fragments. And this is why you may see multiple rows which look a whole lot alike, and when you replay these, you are seeing the same session -just different portions of it, with a lot of overlap between each merged set of fragments.

Another thing you should be aware of regarding FragmentIDs, AKA Session IDs: These integers are assigned on a per-Canister basis, start at “1” with a brand new canister, and simply increment. If your system has multiple Canister servers, these integers will increment at different rates, depending on how the traffic is sent to each Canister. If a new Canister is added, the FragmentID for that new Canister starts at ‘”1”, regardless of the values in any other Canister. Tealeaf Portal and RTV both let you search for a “Session ID”,, which is searching the value of the FragmentID. But I’d suggest you avoid using this search term – you can end up finding totally disparate sessions if the same integer appears in different Canisters for completely unrelated session fragments from different users.

2 comments:

  1. So we have an event that fires when a user hits a page and is set to count on first visit only. If the user does not close the browser, and comes back to the page within the 6 hour window, will that event fire again? Additionally, this particular page we are talking about has a timeout that kicks you off the page in 20 minutes and we feel that we are seeing inflated numbers in Tealeaf because this double counting is occurring after a user is kicked out of the page and return within that 6 hour window. Thoughts....

    ReplyDelete
  2. The six hour window is solely an RTV artifact - that's how far on either side of a found session taht RTV will search for fragments of that session. That six hour window has nothing at all to do with event evaluation. Event evaluation takes place on the canister, and happens with respect to being within a session fragment.

    Now, you probably have a sesion timeout in tealeaf that matches your 20 minute inactivity timer. When 20 minutes expire, the hits of the current fragment are written to disk, and tealeaf forgets all about that session in RAM (because as far as your web server and tealeaf know, that user left your site). The next time the user (still in the same window) sends in any page, a new session fragment is started in RAM. If the user sends in that particular page again, that event again fires the first time the page is received in the new session fragment.

    So you are seeing an event counted for each time they return to that page after having timed-out. But should you really consider this infalted? after all, then went at least 20 minutes with no activity.

    ReplyDelete