Yes. When looking for all items that contain a particular pattern of text, wildcards are supported, in the form of regular expressions. These are a very powerful and flexible form of wildcards and, in certain situations, they can make a huge difference to your productivity.
More detailed information about regular expressions can be found on regular‑expressions.info, including a good introduction in the quick-start page. In this FAQ, however, we'll attempt to give a simple example of how you can use regular expressions.
Typically, searches and filters in TransOmics™ Informatics don't use regular expressions. To use regular expressions, simply start your search term with regex: . (Be sure to include the space after the colon.) Anything you enter after that prefix is part of the regular expression; we'll refer to that text as the search term.
Regular expression search terms typically define a sequence of elements that must be found within the text we're searching e.g. in the names of runs. For example, you may want to find runs that contain the word Control, followed by a whole number and then an underscore.
As with all regular expressions, we would begin our search by entering the following:
regex:
We then enter the first bit of text that we want to find explicitly:
regex: Control
Next, we need to specify that a number follows it. This may sound strange, but we need to consider how a whole number is composed in text. In terms of the text, it is represented by one or more digits, each of which is in the range 0 to 9.
To express this in our search term, we make use of characters that have special meanings in regular expressions. First, a pair of square brackets allows us to specify that we want to find a single character in a given set or range:
regex: Control[0-9]
In this case, we're saying that the next character after the word Control must be in the range 0 to 9. The hyphen indicates that we're looking at a range, just as you'd normally type in an email or document. An alternative representation would be to enter [0123456789] — in this case, the lack of a hyphen would mean that we are matching any single character in the given set.
At this point, our search expression will find runs whose name contains any of Control0, Control1, Control2 and so on, up to Control9. Even if we have Control10 or above, that will still be found, as we're only being explicit about the first character after Control.
It might, therefore, seem natural to say that the final part of our search expression is just to put an underscore on the end, like so:
regex: Control[0-9]_
However, this will only find Control0_, Control1_, Control2_ and so on, up to Control9_, but it will not find Control10_. We need to specify that a numeral can appear one or more times before the underscore. We do this by using another character that has a special meaning in regular expressions: +:
regex: Control[0-9]+_
The plus indicates that the preceding character — anything in the range 0 to 9 — must appear one or more times.
So, what have we learned? Many characters, such as the alphabetic characters in Control, can appear in a regular expression just as they would in any normal search term. Other characters, like the square brackets and the plus symbol, have a special meaning in regular expressions.
While all numbers and letters can be used explicitly, there are many punctuation characters that have a special meaning:
The above is far from an exhaustive list, but it should be enough to cope with most regular expression needs.
So far, the most common example usage we've encountered — other than finding runs by naming convention — is to find peptides in our compound identification results. For example, METLIN gives amino acid names in their three-letter form, as seen here in the Review Compounds screen:
Therefore, the following regular expression could be used to find those peptides:
regex: [A-Z][a-z]{2}( [A-Z][a-z]{2})+
Note: the curly brackets represent a repetition of the previous part of the regular expression. In this case, [a-z]{2} means that there should be 2 lowercase letters. Also, note that the space immediately inside the round brackets, is critical; otherwise you'd be finding, as an example, ProIso instead of Pro Iso.
As we encounter more common uses for regular expressions, we'll list them here.