Can I use wildcards when filtering in Progenesis QI for proteomics?

Can I use wildcards when filtering my runs and peptides?

A regular expression in use in Progenesis Yes. When looking for all items that contain a particular pattern of text, wildcards are supported, in the form of regular expressions. These are a very powerful and flexible form of wildcards and, in certain situations, they can make a huge difference to your productivity.

More detailed information about regular expressions can be found on regular‑expressions.info, including a good introduction in the quick-start page. In this FAQ, however, we'll attempt to give a simple example of how you can use regular expressions.

Using regular expressions in Progenesis

Typically, searches and filters in Progenesis don't use regular expressions. To use regular expressions, simply start your search term with regex: . (Be sure to include the space after the colon.) Anything you enter after that prefix is part of the regular expression; we'll refer to that text as the search term.

How do I compose the search term?

Regular expression search terms typically define a sequence of elements that must be found within the text we're searching e.g. in the names of runs. For example, you may want to find runs that contain the word Control, followed by a whole number and then an underscore.

As with all regular expressions, we would begin our search by entering the following:

regex:

We then enter the first bit of text that we want to find explicitly:

regex: Control

Next, we need to specify that a number follows it. This may sound strange, but we need to consider how a whole number is composed in text. In terms of the text, it is represented by one or more digits, each of which is in the range 0 to 9.

To express this in our search term, we make use of characters that have special meanings in regular expressions. First, a pair of square brackets allows us to specify that we want to find a single character in a given set or range:

regex: Control[0-9]

In this case, we're saying that the next character after the word Control must be in the range 0 to 9. The hyphen indicates that we're looking at a range, just as you'd normally type in an email or document. An alternative representation would be to enter [0123456789] — in this case, the lack of a hyphen would mean that we are matching any single character in the given set.

At this point, our search expression will find runs whose name contains any of Control0, Control1, Control2 and so on, up to Control9. Even if we have Control10 or above, that will still be found, as we're only being explicit about the first character after Control.

It might, therefore, seem natural to say that the final part of our search expression is just to put an underscore on the end, like so:

regex: Control[0-9]_

However, this will only find Control0_, Control1_, Control2_ and so on, up to Control9_, but it will not find Control10_. We need to specify that a numeral can appear one or more times before the underscore. We do this by using another character that has a special meaning in regular expressions: +:

regex: Control[0-9]+_

The plus indicates that the preceding character — anything in the range 0 to 9 — must appear one or more times.

So, what have we learned? Many characters, such as the alphabetic characters in Control, can appear in a regular expression just as they would in any normal search term. Other characters, like the square brackets and the plus symbol, have a special meaning in regular expressions.

Which other characters have special meanings?

While all numbers and letters can be used explicitly, there are many punctuation characters that have a special meaning:

. a period represents any single character.
* an asterisk represent zero or more occurrences of the preceding character (unlike the plus symbol, which represents one or more).
? a question mark represents zero or one occurrence of the preceding character.
^ and $ caret and dollar symbols are known as anchors; they don't represent a character in the text being searched, but instead represent the start or end of the text. For example, o$ would match Hello (because the o is at the end, but it wouldn't match World, as the o is in the middle of the word.
^ a caret, when placed immediately inside square brackets, transforms their meaning to be any character not in the square brackets. For example, o[^n] would find gold, but it wouldn't find bond.
\ the backslash indicates that the following character should be treated literally. For example, including \[ in your search expression will look for text values that actually contain an opening square bracket.
( and ) round brackets allow you to group parts of your search term and create back-references, but those are advanced concepts that you probably won't need. Instead, you can look for round brackets by prefixing them with a backslash: \(.

The above is far from an exhaustive list, but it should be enough to cope with most regular expression needs.

An offer of help

While regular expressions are extremely powerful, we realise they're not that easy to use. If you're struggling to build the expression you need, please get in touch and our engineers will be happy to help. If we're asked for the same thing a lot, we'll even add it to this web page as a reference.