I recently got a comment asking for clarification of the tealeaf privacy rules. These are the critically important part of Tealeaf data processing that eliminates or masks Personal Confidential Information (PCI), things like credit card numbers and passwords. I looked at the IBM FAQs, and was very surprised to learn there was very little information present. Years ago I wrote a technical explanation how to block PCI data in a value attribute and posted it on the tealeaf community site, but it appears that post did not get from viaTeaLeaf into the IBM site, so I’ll re-write and expand upon it here.
Where PCI Blocking occurs
There are three different places in the tealeaf systems where you need to make configuration changes to effectively block PCI data; the hit’s request block, it’s response block, and in the client-side Client User Interface (CUI) recorded data hit. Blocking PCI data in the request and response is accomplished in the tealeaf pipeline. The CUI data blocking is done in the CUI/SDK configuration file. In the pipeline, the privacy session agents (Privacy and PrivacyEx) can block or mask PCI data. These two terms are significantly different to the information security teams. Blocking PCI data means it is destroyed in the data stream – there is no way to recover it. Masking means to encrypt the PCI data in such a way that only authorized users can see it. Blocking is easier to implement, and I’ll use the term ‘block’ through out most of this post. Masking the data requires more implementation steps, and I’ll devote a section to that later in the post.
I always urge clients to implement all the PCI data blocking in the pipeline at the PCA tier – doing it here keeps PCI data off all the downstream servers, and only the PCA servers have to be made PCI compliant and audited by the information security teams at their company
PCI Blocking in the Pipeline
Privacy rules are implemented in the privacy agents, and can be used for much more than just blocking or masking PCI data. Privacy rules are usually doing some kind of search and replace/extract operations. This blog post is going to focus narrowly on just their use in PCI data. Privacy rules aren’t really very hard – they are just search and replace patterns. They are implemented in the privacy session agent of the the tealeaf data pipeline, and can be put into the pipeline at any tier – PCA, HBR, or Processing server. But for PCI blocking, I strongly urge this be done in the PCA.
The privacy session agent reads the privacy.cfg file for its search and replace patterns. Since the PCA servers are Linux boxes and the HBR and Processing servers are Windows boxes, the path names to the privacy files will of course be different. But the contents of the file are identical. In its simplest format, it is a list of [rules]. The PCA has a visual GUI for editing the privacy rules on the PCA. It’s instructive to try different privacy rule formulations in the GUI and see how it affects the privacy.cfg file, but for this post, I’ll be old-school and focus on just the contents of privacy.cfg. You can edit this file with any text editor, and you can put this file under source code control to track version changes.
In a few weeks I hope to post details on how to protect this file from changes, to reduce the possibility that an unscrupulous system admin makes changes to allow harvesting of PCI information.
Blocking the data in the Request block
The first and easiest place you need to block data is in the request block of the hit. Your web application is going to put up a form, and there are going to be <input> tags that define text fields where the users enter passwords, credit card numbers, CVV numbers, new passwords, answers to security questions, old passwords/new passwords, and other PCI data fields. Every HTML input tag has either a name= or an id=attribute in the tag. When the page posts (or ‘gets’), the input data, along with the name or id is passed as either a query parameter or as part of the request body. For privacy, you don’t have to care if the page ‘posts’ or ‘Gets’, tealeaf auto magically blocks both ‘post’-ed or ‘get’-ed data. To block specific data fields in the request blocks, you specify a list of input field names (or ids). It just takes one rule and one action block. The rule is always enabled, it specifies an action block, and that action block specifies the action ‘Block’, the Section ‘urlfield’ and the ValueName = ‘the comma separated list of fields to block’. Together, they look like this:
[Rule1]
Enabled=True
Actions=A_TextBlockURLFields
[A_TextBlockURLFields]
Action=Block
Section=urlfield
ValueName=CreditCard|CardNumber|NewPassword|SecurityAnswer1
I’ve managed ValueName lists that are pretty long. Often the web application is developed with .Net or other frameworks, and the framework assigns the control names. .Net applications in particular have very long control names, like ctl00$ContentInfo$CreditCard$CreditCardNumber$txtCCNum, and ctl00$ContentInfo$SecurityQuestionAnswer$AnswerReType$txtSecurity. You do have to fully specify (no wild cards) the full name of every field to block.
If the web application has accrued lots of pages and lots of PCI-impacting fields from lots of developers over the years, managing the list of PCI-impacting field names can be a pain. But a little creative Excel spread sheeting makes it possible to manage even very long lists of field names to block. I’ve a couple of Excel formulas that can help – comment on this post if you are interested.
Blocking the data in the Response block
The next place to block PCI data is in the response block. This takes a bit more work, and you have to know your application. In particular, where does your application display credit card numbers, passwords, answers to security questions, etc? Usually, there a far fewer places where your application echoes PCI data back to the browser. But you will need to look at example pages where, for example, a credit card number is displayed, or a list of security questions/answers are displayed. What you are going to need to do, is write (a) rule(s) that can match a preamble – sensitive data – postamble. The privacy rules support most Regular Expression constructs, so its pretty easy to write the rules, once you have found some example pages. You don’t have to match every possible combination in one expression, it’s fine to have multiple actions, each matching one or more places where PCI data may appear in the response. Nor do you need to specify the page name (URL) where the PCI data lives – in fact, I never specify a URL when blocking sensitive data – it’s too easy for a URL to get changed. I always write the rules to look for the data patterns, and let the PCA processors look at every page that comes through, looking for the patterns. As long as the PCA CPUs are not breaking a sweat, there's no problem with letting them inspect every page. Modern servers have plenty of CPU cycles – just keep an eye on all the tealeaf pipeline processes on a PCA during a typical peak-busy-hour, and keep their CPU utilizations under 60% or so, just to be safe.
Back to the regular expressions for data blocking… Earlier we had a rule and an action for blocking the request data. We can simply add another action to the rule for blocking the response data. The action will execute on every hit. Again, as long as the PCA’s CPUs are not heavily loaded, there’s no problem with that. What this particular rule is going to do is to block any value attribute of an HTML input tag. Have you ever entered a credit card number on a page, submitted it, had a mistake somewhere on that page and the web site helpfully echoes the credit card number back in it’s input field? This is usually accomplished with the value attribute, so blocking the value attribute prevents tealeaf from recording the PCI data that the web site echoes back in the input field.
The new rule now has two additional comma-separated action
[Rule1]
Enabled=True
Actions=A_TextBlockURLFields , A_BlockCCInResponse1, A_BlockCCInResponse2
And the privacy.cfg file has two added action blocks.
[A_BlockCCInResponse1]
Action=Block
Section=response
Field=body
StartPatternRE=(?-s)<[^>]*?\sname\s*=\s*(["']{0,1})(CreditCard|CardNumber|NewPassword|SecurityAnswer1)\1\s+[^>]*?value\s*=\s*(["']{0,1}).*?\3[\/\s>]
Inclusive=True
BlockingMask=value\s*=\s*["']{0,1}([^"']*)["']{0,1}[\/\s>]
The tealeaf manual section on privacy rules explains the Action, Section, Field, StartPatternRE, Inclusive and BlockingMask parameters, so we will just focus on the StartPatternRE and the BlockingMask values.
Regular Expressions are your friend!
Regular Expressions, or RegExs, tell a computer how to match a pattern. RegExs are a computer’s native language, and can be made very efficient. There are plenty of tutorials on the web for constructing regular expressions, so I’m going to just explain the RegEx above that will mask the value attribute.
The string in the response to be found will look like <input OptionalStuffWeCanIngore name = “FieldName” OptionalStuffWeCanIngore value = ‘123456789123456’ OptionalStuffWeCanIngore > and the spaces around the = characters are optional, and the string may use single quotes or apostrophes. Developers (and development tools) are free to construct their HTML any way they like, as long as it conforms to the W3C standard, so our RegEx needs to be sophisticated enough to match any standard formulation
First, the alternation construct is (a|b|c) which says to match “a or b or c”. Our RegEx uses an alternation construct to list all of the input field names, like (password|newpassword|oldpassword|creditcardnumber|cvv). Alternation constructs will by default create a match group and record which alternative occurred. Alternation constructs are cheap in terms of processing powers, but match groups are much more expensive. We can tell a RegEx not to create a match group for an alternation by placing (?-s) before the alternation. The RegEx will not create any match groups for any alternation in the RegEx until it encounters (?+s) in the RegEx pattern.
Next we have character classes, [“’], which says to match either the single quote or the apostrophe character, and the length modifier {0,1}, which says to match the anything in the character class 0 or 1 times. Together, [‘”]{0.1} says to match either 0 or 1 quote or apostrophe character. Another character class we use is [^>] which says to match any character except the >, and two more length modifiers we use is the *, which says to match the preceding character 0 or more times, and the + which says to match the preceding character 1 or more times.
Any character is matched by the . character. Whitespace (spaces or tabs) is matched by the special sequence \s. Greedy matching is the default for a RegEx, which means that if you have the string “a b c d e f”"”, and you say a\s*, it will match the longest substring – “a b c d e “. If you want to match the shortest substring, add the ? character after the *, so a\s*? will match “a “.
To match the character / itself, it needs to be “escaped”, so \/ matches the / character.
Our final construct is the backreference. Within a RegEx, we can refer to something that matched earlier within the RegEx, if you put what you want to reference in () grouping. This does not create a match group, so it is not ‘expensive”. Whatever matches within the first pair of () is referenced later in the RegEx as \1; the second pair of () is matched with \2, etc. So when the string has a pair of single quotes around a value, or a pair of apostrophes around a value, the W3C standard says the same character (quote or apostrophe) has to be the beginning and end. (["']{0,1})(a|b\c)\1 says to match a or b or c, only if it has no quote pair or apostrophe pair surrounding (that’s the 0) , or if a or b or c has a pair of quote characters or a pair of apostrophe characters surrounding it.
Putting together some these into short substrings, \s+[^>]*? says to match a whitespace character occurring one or more time, then any character that is not the > occurring 0 or more times (non greedy).
Here is the StartPatternRE, and an explanation of it.
(?-s)<[^>]*?\sname\s*=\s*(["']{0,1})(CreditCard|CardNumber|NewPassword|SecurityAnswer1)\1\s+[^>]*?value\s*=\s*(["']{0,1}).*?\3[\/\s>]
(?-s)<[^>]*?\sname : Don’t create any match groups in this RegEx. Start matching at a < character, ignore anything until “whitespace name”, and if a > appears before the string name, stop trying to match
name\s*=\s* :look for the string name, then 0 or more whitespace, the = character, and zero or more whitespace.
(["']{0,1}) : look for either the single quote character or the apostrophe character, occurring exactly 0 or 1 time. Group this to create a backreference. Since this is the first pair of () characters, this backreference can be referred to a \1.
(CreditCard|CardNumber|NewPassword|SecurityAnswer1) : This is a sample of the alternation that lists the exact field names of your application that need to be blocked. Only fields in which the application will echo PCI data using a value attribute need to be listed, separated by the vertical pipe | character.
\1\s+[^>]*?value: : Match whatever the first backreference matches, then one or more whitespace, then any string of characters that is not the > character (non-greedy, that is, the shortest substring) followed by the string value.
\s*=\s*(["']{0,1}) : 0 or more whitespace, the = character, then zero or more whitespace, then either the single quote character or the apostrophe character, occurring exactly 0 or 1 time. Group this to create a backreference. Since this is the third pair of () characters, this backreference can be referred to a \3.
.*?\3[\/\s>] : Any character occurring 0 or more times (non-greedy), then the third backreference, After the third backreference matches, it must be followed by a whitespace character, or the > character, or the / character. Closing a tag in HTML can be either > or />, so after the value attribute, the input tag either continues with more attributes (the whitespace character will follow the backreference), or the input HTML tag will close, so we need to match either > or /.
Whew! That was a lot of explanation. I hope you followed all of that, but if not, there are tools to help you visualize how all of this works.
Javascript Regular Expression Engines and Testing tools
There are web sites and online tools to help you construct Regular Expressi0ons, and test them, You paste in the string to be tested ( a cut’n’paste of a web page snippet that contains the input tag with value attribute), and the RegEx, run the test, and the tool will tell you if the RegEx matched, Good tools even break down each piece of the RegEx and tell you where it matches the string. But be careful – online tools use different RegEx engines! you need to carefully validate the online tool you use produces the same result as the PCA.
Here are my two favorite online tools for testing a RegEx.
http://regexpal.com/ This one is good, but basic.
http://myregextester.com/ My personal favorite. In particular, it has an ‘explain’ function that breaks down your RegEx piece by piece and shows you why/how it matches. But watch out – the tool doesn’t work well in the Chrome explorer. I use IE when I’m on this site.
These test tools are especially good for testing RegExs you might use in an Advanced Mode event. The Event Processing engine is written in JavaScript, and uses the Google JavaScript engine. Both of the tools above use a JavaScript engine (I have no idea which engine). Make sure your testing tool, whatever you select, uses a JavaScript engine, because there are a few differences between the .Net engine, the Perl Compiled Regular Expression (PCRE) engine, and JavaScript engines. The biggest difference to watch out for is use of the ‘.’ character in multi-line matches. If you want to match ‘any character’ and the substring crosses a line boundary (CR,LF character pair), the .* construct won’t work in JavaScript. But [/S/s]* will work. If you don’t follow that after studying the RegEx rules, post a comment, I’ll go into detail if anybody asks…
The Blocking Mask
The section above described the StartPatternRE portion of the rule. This identifies and isolates the input field that contains a specific name attribute and contains a value attribute. But we have not yet told the Action what to block. We use the Blocking Mask for that.Whatever appears in the first grouping () set of parentheses will be replaced with the StrikeCharacter . The default StrikeCharacter is ‘X’
BlockingMask=value\s*=\s*["']{0,1}([^"']*)["']{0,1}[\/\s>]
With the blocking mask as specified above, the response block will be modified to become value = “XXXXXXXXXX”
The second action for blocking a value attribute
Earlier we discussed adding two actions to our rules, both A_BlockCCInResponse1 and A_BlockCCInResponse2. Why a second action? Because we don’t know if the developer (or framework) will put the name attribute first in the input tag, or the value attribute. So we need a variation on our StartPatternRE. The second action is identical to our first action, with the only difference being in the StartPatternRE.
StartPatternRE=(?-s)<[^>]*?\svalue\s*=\s*(["']{0,1}).*?\1\s+[^>]*?name\s*=\s*(["']{0,1})(CreditCard|CardNumber|NewPassword|SecurityAnswer1)\3[\/\s>]
In our second action, everything else, even the BlockingMAsk, does not get changed.
Blocking Data in the Response block that is not a value attribute
The section above discussed how to block a value attribute in an input tag. Occasionally you might find PCI data echoed in the body of the response. Examples I’ve seen are sites that echo back both a bank routing number and an account number, sites that display stored credit card numbers for editing, and similar examples. When you have a site that does this, you need to find an example, cut the HTML code around the PCI data, paste it into a text editor, then modify the real PCI data to be something ‘fake’.Then use the online testing tools, and develop a RegEx that will block the PCI portion of the data.
Blocking Data in the CUI/SDK library
The CUI/SDK library is JavaScript that runs in the user’s browser, records DOM events on the page (it only sees your web application, nothing else) , and send information about the DOM event back to the Tealeaf system. Most customers want to record the keystrokes the user enters, so they can tell if the page design causes problems in data entry, and correct it if so. So, we need to make sure that PCI data is NOT recorded and sent back to tealeaf . Yes, we could write blocking rules to block the data when it is received, but itis much easier to properly configure the CUI/SDK library to exclude certain fields. In the CUI/SDDK, we can use wildcards to match the field names.
AS of Version 8.7, the file you need to look at is the tealeaf.js file. The location may change in the future, so you should search for the string tlFieldBlock. You will find a nested structure by this name, and one of the structure’s members will be “name” : followed by a pipe-delimited | string. The string will have the names of the input fields to block. The list of names should match the same list you have in the request and response blocking rules. An example is
tlFieldBlock: [
{"name": "CreditCard|CardNumber|NewPassword|SecurityAnswer1", "caseinsensitive": true, "exclude": false, "mask": function () { return TeaLeaf.Client.PreserveMask.apply(this, arguments); } }
],
One of the very nice tings about the CUI/SDK blocking is that the list of alternative field names is a true regular expression. password will match password, oldpasssword, newpassword, passwordchanged, etc. You can use the ^ at the start of an alternative, or $ at the end of an alternative, to anchor the alternative string to the start or end of the field name (Go look at the RegEx help online at the testing tools if you don’t know what start and ends anchors are all about).
The best way I’ve ever seen this managed, one customer had a fellow whose job included making fixes to the web site. A really sharp guy, he was not a member of development group, but of operations, and his job was to make fixes to the web site when something was broke, and send those fixes back to the development team for incorporation into subsequent releases. This person was given the responsibility to add the tealeaf CUI/SDK libraries to the web pages. As is usually the case, it was done within a page template (and in fact, was incorporated as part of a tag/library management solution). But since he could make changes to the web pages, he made sure all PCI input fields had clear names which included password or creditcard or cvv or similar. Then it was very easy for him to make sure the CUI/SDK blocking rule names matched these input field names. I hope that your company makes a similar decision, and lets the CUI/SDK implementer change input field names when necessary. It will certainly reduce maintenance costs!
Blocking data in XML Web Services and “one page” designs
Blocking PCI data in a XML web service or ina JSON update to a page’s DOM is no different than blocking data in a classic web application. We are still dealing with Request and Response blocks that make up a hit. During development of the PCI privacy rules, find examples of the tealeaf sessions having hits from the web service or one-page JSON DOM update, inspect the request and response block with the Replay tool (RTV or Replay server), write and test RegExs to block the data. If the XML service is answering requests on just a very small number of URLs, you should consider making the Rule look at just those specific URLs, to keep the rule efficient. No use inspecting other pages for patterns that will never exist, nyeh?
Encrypting data instead of blocking it
Instead of blocking PCI data, the data can be encrypted so that only authorized users are able to see the encrypted data in clear-text. Securing the access to the clear-text is accomplished by using specific Active Directory security groups. Details how to accomplish this are provided in the IBM Tealeaf CX Configuration Manual, in the sub-section Encrypting Data Filter under the section Privacy Session Agent. Please refer to this document for details. A summary of the steps is as follows:
- Create an AD security group and populate it with authorized users
- Use TMS to configure the Search Server Authentication
- Add the AD security group to the Search Server Authentication dialog
- Create a privacy key and assign it to the AD Security Group. Copy the key to the clipboard
- Edit the privacy.cfg file
- Add the privacy key to the bottom of the file in the [Keys] section with a new identifier, e.g. [Key03]
- in the privacy blocking action for the fields you want to encrypt, change the Action from Block to Encrypt
- Add the line Key=Key03 to the Action block
With this, an authorized user will see the clear-text value of a field during replay. Field values are stored in the Canister using the encrypted value, and indexed using the encrypted value, so searching for a clear-text value will not work.
Maintaining the blocking rules
One of the unfortunate facts of life in a tealeaf administrators job is that the web site will change over time. New PCI fields will appear in the application. the Tealeaf privacy rules will need to be maintained and updated to account for these changes. The hardest part of maintenance is getting notified that PCI-impacting changes are part of a new release. Here’s a partial list of some methods I’ve seen used to keep up to speed on this:
- Attend new feature design and release meetings. Try to make sure somebody from the tealeaf admin team attends meetings where new features are being discussed, and they keep an eye (or ear) out for changes that discuss new credit card features or new password features. If a new PCI impacting feature is discussed, make sure the tealeaf team sees QA/Staging versions of the application, and implements new PCI rules BEFORE the application is released to production.
- Setup a tealeaf event that looks for 15+ consecutive digits. This event will trigger if a credit card number comes through in clear-text. Of course depending on the web application, it may also trigger on any other string of 15+ characters, but tealeaf admins for the site should become familiar with the ‘normal’ volume, and keeping an eye on this event may catch new places where CCNums appear.
- Regularly (weekly or bi-weekly) do a full-text search the Canister data for the strings password, passwd, passwrd, and even pass?*. Review the matches to see if new password fields have appeared.
- Publish a regular (monthly) list of PCI-impacting fields you are blocking, and e-mail these to development teams or their management. Reminding these teams on a regular basis exactly what is PCI-impacting can help keep developers, new and experienced both, cognizant of the fact that they need to communicate PCI-impacting changes to the tealeaf team.
Conclusion
Blocking or encrypting PCI data to prevent it from being available to unauthorized tealeaf users is one of the most important tasks of the tealeaf admin staff. I hope you find this blog post useful for blocking the confidential information in your tealeaf capture. Please post feedback, errata or further questions in the comments. Happy TeaLeaf-ing!
This is a great reference on how to write effective rules to block sensitive information in Tealeaf! One other testing tool that I use is the Tealeaf Archive Reader installed locally on my PC. I this article on LinkedIn: https://www.linkedin.com/groupItem?view=&gid=64587&type=member&item=222105609 . It is an easy way to test blocking rules agaisnt an existing session.
ReplyDeleteJoe Tooman
Sr. Web Analyst
Dell, Inc.
Great article Bill! Tons of useful info here for Tealeaf Admins facing the challenge of blocking PII or PCI.
ReplyDelete