
[Edit 06/05/2014: A PDF version of this drawing can be found here.]
Below are descriptions of each block.
Network Components
This block encompasses all of the non-tealeaf servers that are involved with the duplication and transmission of the data packets.Packet Duplication
Any network component (Tap, Switch Span, Load Balancer) that performs the actual duplication of TCP/IP packets. SSL decryption may be performed at this layer, or lower at the Passive Capture Appliances (PCAs) layerPacket Transmission
The network component (direct-connect crossover cable, switch, Gigamon) that connects the duplicated packet stream to the Passive Capture Appliances (PCAs). These devices may also connect the duplicated packet stream to other devices, such as intrusion-detection systems that need the data.Passive Capture Appliances (PCAs)
Redundant Linux servers connected to the duplicated packet stream that reassemble the packets into request blocks and response blocks, and these into hits. SSL decryption may be performed here, or higher at the Packet Duplication layer. PCI data blocking and/or masking should all be performed at this layer. The TLTSID sessionization value will be inserted here if not already present in a HTTP cookie from the request or response block. PCA servers should be treated as PCI critical components. Below this layer, no PCI data is present. PCA servers are managed via SSH and/or a tealeaf web console (GUI) that may only be accessed by tealeaf admins from the Health-Based Routers (HBRs).Health-Based Routers (HBRs)
Redundant Windows servers connected to the PCAs whose primary purpose is to distribute the traffic stream to multiple Processing/Canister servers. Distribution is session-sticky (using the TLTSID cookie), and normally done with a statistical even-distribution algorithm to send roughly the same number of sessions to each Processing server.Overall, this diagram is for a production tealeaf system, but a good corporate tealeaf implementation includes both development and QA tealeaf systems as well (much smaller of course). Developers of tealeaf events need a small stream of production data sent to the development tealeaf system in order for developers to have data against which to create new events. The HBR servers include the capability to extract specific sessions or a statistical random fraction of sessions from the production stream and send those sessions to the development tealeaf system, as indicated in the diagram.
The HBR servers monitor all of the Processing/Canister servers, and if any Processing server stops responding, the HBR servers take that Processing server out of rotation and re-distributes the traffic to other servers. If a Processing server comes back alive, the HBR servers begins sending traffic back to that Processing server. Processing servers are cycled and self-checked every night, at different times, and the HBR routers must take each Processing server out of rotation, allow sufficient time for most sessions on that server to end, recognize the Processing server has stopped responding (while it self-checks) and recognizes when it comes back on-line, and begin sending it data again.
HBR servers do no data storage, but operate on the data stream. Examples include robot identification (User-Agent or IP based); deletion of hits based on IP address or URL or any combination request or response patterns; rewrite the Remote IP address using the latest HTTP_X_FORWARDED_FOR value; copying cookie values like a SID to the appdata section for indexing; condensing the referrer domain to provide meaningful referrers; extracting price and currency information from a page; normalizing page URLS to remove locale country codes and many other operations.
Processing/Canister servers
Redundant Windows servers connected to the HBRs that provide the storage location for hits and process these hits looking for patterns. The duration that sessions persist for replay and analysis is a function of traffic density and the amount of disk space the Canister servers provide. Sessions are extracted from the canisters on demand to replay in the Replay Server or RealiTea viewer.Processing of hits is shorthand verbiage for the complex pattern recognition performed by the tealeaf Events system. Pattern recognition is done against individual hits, against session metadata, and by combinatorial logic looking at all events that have occurred in the entire session. In addition to looking for patterns, match groups can be defined to extract substrings from the data, and the number of times each substring occurs can be recorded. Metadata such as the “time-into” a session that a pattern occurs can be extracted. Extracted data can be grouped into sets. Event processing and data extraction performed in the Processing/Canister servers can be very complex.
If the optional cxConnect Real-time data extractor is installed, the Processing servers create messages for each event configured for real-time extraction, along with information on each hit the event occurs on, and these messages are delivered via a tealeaf pipeline to the cxConnect server.
Reporting Server
Non-redundant Windows server connected to the Processing/Canister servers that poll the Processing servers for their traffic and event counts, and provides the primary web-based GUI interface for users to see reports on this data. Traffic and Event data is collected from the processing servers, and stored in the Tealeaf SQL Reporting database. Dimension aggregates are calculated and stored in the database during each collection run. Daily, Weekly and Monthly aggregates are also calculated periodically. The Reporting GUI provides an interface to create and view reports on this data. Every tealeaf PCA, HBR, and Processing server reports it’s raw traffic statistics to the Reporting server, which stores this information in the Tealeaf SQL Statistical database, and the information is available in the Reporting GUI.Tealeaf SQL Database server
Non-redundant Windows server running Microsoft SQL Server, or a corporate server running Microsoft SQL server. Tealeaf creates and uses three SQL databases (System, Reporting, and Statistics). The SQL server is usually managed by the corporate DBA team, and not the tealeaf admins. The tealeaf admin team works closely with the DBAs to install, update, maintain, and backup these databases.Replay Server
Usually a service running on the reporting Server, the Replay server may be configured as a stand-alone Windows server. The Replay server is used to create visual replay of user sessions from the data stream of hits (request/response pairs) stored in the Processing/Canister servers. The Replay is web-based, and does not require any executable to be installed on a tealeaf user’s computer. See also the RealiTea Viewer section for an alternative method of replay.cxOverstat
A feature that can be turned on that allows the Replay server to display the heatmaps and other functions provided by the cxOverstat feature.Archive Servers
Optional Windows server(s) whose purpose is to extract a subset of sessions from all Canister servers that meet a selection criteria (most often, a purchase session or a trade session), and store these session for a period of much greater duration that the Canister servers. For example, if Canister servers are ALL storing sessions for 30 days, the Archive servers may store purchase sessions for two years. These can also be configured as “non-tamperable”, which provides a hash-based mechanism to prove that a session replayed from the Archive server is the same session hat was originally captured from the web site.TLI servers
Optional Windows server(s) that store and make available for replay certain static content of the web site. During replay, the images and JS files are loaded from the web site. With a TLI server, these files are stored each day, and during replay of a session for say, three weeks ago, the images and JS files as they were three weeks ago, are used in the replay. This provides better fidelity of replay that is closer to the actual historical session. The drawback is the amount of storage needed to keep historical static content.cxConnect Servers
Optional Windows server(s) whose purpose is to provide the interface that extracts data from the tealeaf Processing/Canister servers and makes that data available to external systems. There are two separate data feeds available. The real-time data feed is a set of (configurable) selected event messages with parameters that are sent by the Processing/Canister servers to the cxConnect servers using a tealeaf transport pipeline. The cxConnect servers have two distribution choices for this data – log to file and/or send to to a TCP/IP listener on an external system. The real-time feed is most often used to feed a Complex Event Processing (CEP) listener system. In turn, these systems drive real-time decisioning systems that modify the web application’s responses to the user based on their past actions in the session. The other data feed available from the cxConnect systems is a scheduled (typically hourly or daily) batch extract of detailed information regarding users, sessions, hits, parameters, events, and dimensions. The information is stored in flat files that conform to the Microsoft SQL Bulk Load (BCP) format. tealeaf provides an example schema for creating a relational SQL database, and script jobs for loading the corresponding BCP files into these tables. This is the reference mechanism provided by tealeaf for putting the data into a fully relational database.The following three pieces of tealeaf code are implemented on the web sites and native mobile applications
Cookie Injector
Very small piece of code running on the web servers that adds the three tealeaf cookies. The TLTSID is a non-persistent cookie whose value does not change as long as the browser window remains open. The TLTHID cookie is a unique identifier assigned to each hit. The TLTUID is a persistent cookie left on the browser, whose value is sent each time the user revisits the site. All three cookies are 32-character GUIDs.Client-Side Recording (CUI/SDK) Library
JavaScript (JS) library added to the web site and called from the web pages. This library implements the recording of the DOM events on the web page, and transmits the page’s DOM event information (mouse actions, keyboard actions, page rendering time, etc.) back to the tealeaf system. Includes a target page to be added to the web site that the library will call back to. The CUI/SDK is a very important piece to implement properly for sites that use anything similar to the “one-page” technique. The implementation of the CUI/SDK provides a much higher fidelity replay of the session.Mobile Library
JavaScript (JS) library added to the Mobile web site and to the mobile device native application (IOS or Android) called by the device.This library implements the recording of the user’s interactions with the device and the page. Screen click, key-press, swipe, pinch, rotation and other page interactions are recorded and transmitted back to the tealeaf system. The implementation of the CUI/SDK provides a much higher fidelity replay of the session.The following executable is installed onto the tealeaf user’s computer for replay of sessions
RealiTea Viewer
This is an executable program that can be installed onto a tealeaf user’s computer to allow for replay of user sessions. In addition to just replay, it allows multiple sessions to be downloaded to the user’s computer, and provides searching and analysis capabilities for patterns across these downloaded sessions. It includes the ability to customize the view panes and data fields displayed for both hits and for session metadata.The following systems are not part of the tealeaf ecosystem, but if they exist in the company, they may be fed with data from the optional cxConnect servers.
SQL Relational Database Server for tealeaf events
Non-redundant Windows server running Microsoft SQL Server, or a corporate server running Microsoft SQL server. Tealeaf provides a reference schema for a relational database that links together sessions, hits, query parameters, events, and dimensions. The cxConnect data extractor populates these tables. These tables provide a very rich source for analytics against the user behaviors.Real-Time decisioning systems
A system that modifies the web page contents based on the user’s past actions. Usually some kind of Complex Event Processing system tied into the web servers.The following software constructs are resident in the corporate Active Directory structures.
Active Directory Security groups for tealeaf
At a minimum there are two AD security groups. One enumerates the userids which are allowed to use the tealeaf reporting GUI and allowed to access the session data stored in the Processing/Canister servers for replay. The other enumerates the userids given access to tealeaf at an administrative level. These may be Global groups or Domain-local, meaning that userids from multiple forests are supported if desired. Additional AD security groups may be created for teams such as fraud investigation teams, which are given access to encrypted PCI data, should the system be configured to encrypt certain fields instead of blocking them. Only member of these specific AD groups will be able to see the PCI data in clear text. Normal tealeaf users and tealeaf administrators see encrypted gibberish. PCI fields that are blocked instead of encrypted are always replaced with ‘X’ for all users.Conclusion
I hope this overview is useful in understanding at a very high level all the components of the tealeaf ecosystem. As with all large complex computer systems, the features and components are evolving, so this post will eventually become obsolete. However, as of tealeaf version 8.8 in the spring of 2014, this should be a pretty complete pictureFeedback, comments, and questions are always welcome. Happy Tealeaf-ing!
Very helpful and well written post Bill! Tealeaf CX is anything but simple to break down to a digestible level, but you've done a fantastic job at providing the grand overview here. Would you by chance be able to share a higher resolution of your architecture diagram? Thanks, and keep up the great work!
ReplyDeleteI've uploaded a PDF of this diagram, and edited the post to include a link to the PDF.
ReplyDeleteGolden breakdown. I've been tardy to your blog lately and you've posted some damn good information!
ReplyDelete