Waters | Nonlinear Dynamics

Progenesis LC-MS

A unique approach for label-free LC-MS data analysis
Quantify and identify the significant proteins in your experiment…

Step 4. Identifying proteins

The Protein View screen When we searched for peptide identifications for our features, it's likely that some of our features were matched to more than one peptide. Other features may have been assigned to the same peptide from different proteins. These ambiguous peptide assignments are referred to as ‘conflicts’.

To confidently identify our proteins, we need to resolve the ambiguity in our peptides' origins. The Protein View gives us the information we need to explore and resolve the conflicts.

Before we start, we'll focus the screen on the proteins with the most conflicts:

  1. Click the Protein Resolution tab in the lower half of the screen.
  2. Sort the Proteins table (top-left) on its Conflicts column by clicking the column header twice.

This will put the proteins with the most conflicting peptides at the top (see the image above).

Grouping similar proteins

Often, the results we get from peptide searches will contain proteins whose peptide hits are a subset of those from another protein. Generally, in this kind of situation, we should regard those proteins' hits as all coming from the protein with the greater number of peptide matches. Progenesis LC‑MS gives us the power to do this automatically.

To see the protein grouping in action, let's first look at a pair of proteins that will be grouped. Start by clicking on the first protein in the Proteins list; it has the accession number gi|5668937. The list at the top-right of the screen updates to show us the 12 peptides from this protein that our peptide search found. The bold text below the list also shows us the description of the protein; in this case, it is flagellin [Clostridium difficile].

The protein list and the list of peptides for the selected protein

When we highlight each of these peptides by clicking on it, the lower half of the screen updates to show other proteins that include it i.e. the conflicts. If there are no conflicts, the lists in the bottom half of the screen remain empty.

Now, one at a time, click on each of the peptides in the top-right list. You will see that all conflicts are due to the peptide appearing in one other protein: flagellin subunit [Clostridium difficile 630] (accession number gi|126697810). Even the names of these proteins hint at their relationship.

To group proteins such as these, we need to:

  1. Click the Protein options button, at the bottom of the screen
  2. In the Edit protein building options window, select the Group Similar Proteins option.
  3. Click OK button to apply the grouping.

Now, when we look at the top-left list of proteins, we can see that gi|5668937 is no longer at the top:

The grouped proteins, no showing no conflicts

This is because all of its conflicts have been resolved by the grouping with gi|126697810. Further down the list, we can see that gi|5668937 now has a (+1) after its accession number to indicate that another protein identification has been grouped under it.

Manually resolving a conflict

We'll now look at a simple example of manually resolving a conflict. In the Proteins list at the top-left, find and select the protein with accession number gi|126699140. This protein has a single conflicting feature. Select the feature that has the conflict, being careful not to untick it at this stage (click the image to see the full screen):

The conflicting peptide in protein gi|126699140

The list at the lower left of the screen now shows all identified proteins that contain the conflicting feature. In this case, there are only 2 such proteins: the one we selected (gi|126699140); and gi|126698718. By looking at the two proteins' lists of peptides, we can see that this conflict is, in fact, the only conflict that each protein has.

We now have to decide which peptide identification is more likely to be the true identification for the feature. As well as basing our decision on some of the numbers presented to us, we often also need to use our knowledge of the proteins themselves and their functions.

In this case, however, it's a relatively simple decision:

  1. there's greater evidence for the presence of gi|126698718, as it has matched 5 features, compared to the 2 features matched by gi|126699140
  2. the score returned from Mascot for the peptide in gi|126698718 is signficantly higher than that returned for the peptide in gi|126699140 (67.5 vs. 45.7)
  3. the mass error value for the peptide in gi|126698718 is significantly lower than that in gi|126699140

With these 3 pieces of evidence to back us up, we can reject the peptide identification from gi|126699140. To do so, untick the identification for feature #849 in gi|126699140:

The conflicting peptide in protein gi|126699140 has been rejected

To complete the analysis, we would continue resolving peptide conflicts like this, one protein at a time. As well as looking at search engine scores and the number of identified peptides, we can look at other properties such as individual peptide sequence lengths to help us decide which way a conflict should be resolved. We may even reject peptides from both proteins if the data is not decisive enough.

However, such a thorough analysis is beyond the scope of this tutorial. Instead, the following page summarises the analysis process and offers links to more in-depth guides for Progenesis LC‑MS…