
Regular expressions are used to capture textual patterns for extraction and masking. Masked data protects privacy in support of data privacy laws and protecting PII (Personally Identifiable Information). Data extraction enables reports, analytics, and insights regarding human and machine activity, trends and future behavior.
Introducing you to the instructor, Andrew Landen, and the course, Efficient Regular Expressions with applications in Splunk.
An introduction into Splunk (big data, schema-on-the-fly) usage of regex.
regex101 introduction to basics of regex
Learn to use unique anchor characters with basic regex to quickly extract data. Regex anchors of "^" at the start of the line and "$" at the end of the line are a different topic for a later lesson, and unrelated to this use of the word "anchor".
Learn how to use regex in sed to replace patterns for masking and with lookaheads to continue capturing only if certain other patterns are also seen.
Learn how to use optional and non-capture groups when you need to specify a more complex pattern that may or may not exist and does not need to be captured. Capturing always takes more resources.
Learn to use back references and lazy quantifiers.
More work with back references, iterations and look arounds.
Introductions to and applications of character classes, their shortcuts, and their negatives.
Using branch reset groups to reuse the same capture group number for different iterations.
Explosive quantifiers can easily yield catastrophic backtracking/infinite matching. Here we will learn how to use Possessive quantifiers and atomic groups to prevent backtracking into the group and keep the number of steps down. We also cover how to spot issues in alternation and other groupings related to explosive quantifiers.
Application of the explosive quantifier lesson
Learn how to identify a position based on a pattern immediately before and/or after without moving position of the engine so that parts of those patterns can also be matched as needed.
rex extracts fields from raw and fields with the max_match=0 to enable multi-value matches at the SPL line
adding mode=sed enables masking options with sed
regex enables pattern based filtering of events based on matches to either raw or fields
While the auto-extract creates bad regex, it allows you to see your regex applied directly to raw and it enables a quick method to add the extraction with the correct permissions to the correct sourcetype and get loaded into memory for fast application.
Character substitution may be more interesting than useful and it does follow the related sed discussion to a limited degree, but it is always good to have another tool in the belt.
Here we setup automatic extractions for multiple field extractions in a single regex with Splunk.
Matching multiple optional fields in a single regex in Splunk
Using (?R) and (?1) and \1 for pattern recursion and capture matching of Palindromes and Nested Parenthesis
Extracting fields from si_conf for dependence relationships mapping
In this course, you will learn to apply regular expressions to search, filter, extract and mask data efficiently and effectively in Splunk following a workshop format on real data.
Regular expressions enable (with good crafting) very efficient and effective parsing of text for patterns. The most important skills for regex lie in pattern recognition, regex technique mastery, and simplicity for "step" minimization. The simpler regex with solid leading anchors tends to be the more efficient. Increased regex understanding enables access to more effective techniques for keeping it simple. Pattern recognition connects regex code to solve the problem.
We will rely on the regex101 website to assist in crafting, verifying and explaining the process. Splunk will be used to showcase practical applications with big data. A test after the main course section will test some of the more basic levels of understanding.
With as easy as it is to craft terrible regular expressions, the goal of this course is to shine a light on the efficiency of different regex techniques so that you can track the progression and efficiency of your skills. Textual pattern matching can be a very interesting and complicated subject, but the foundations of efficiency and quality control can both greatly improve the speed and effectiveness of your regex.