How does the theoretical fragmentation work?
Progenesis QI uses the method described in the following paper, with adjustments made for multiple adduct support and also so that scoring can be compared between different compounds:
Wolf S, Schmidt S, Müller-Hannemann M, Neumann S. In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinformatics. 2010 Mar 22;11:148.
Our source code is public domain and available here:
https://github.com/NonlinearDynamics/MetFrag.NET
How does the MetFrag algorithm work?
Once Progenesis QI has matched a compound from the configured database or databases by mass and retention time/CCS (if applicable), it will, if you have requested theoretical fragmentation, perform in-silico fragmentation for each of the possible identifications of each compound.
For each possible identification, its structure (as defined in your compound database) is passed to MetFrag to perform theoretical fragmentation.
MetFrag takes this compound structure and iteratively breaks bonds to produce a list of possible fragments. When MetFrag encounters a ring, it will split each possible combination of two bonds, since splitting a single bond in a ring would not produce any new fragments. It also incorporates some performance enhancements, such as iterating the fragmentation only a limited number of times, and halting the splitting when it reaches a fragment with a mass less than any of your experimental peaks.
Each of the possible fragments is then compared to each peak in your experimental fragmentation trace. If the in-silico fragment mass matches the mass of an observed peak (within the search tolerance specified), that peak is labelled with the fragment.
How does the Progenesis QI implementation differ from the original MetFrag paper?
Progenesis QI uses the core algorithm described in the paper above, and is based on the open source code provided by its authors. Some modifications have been made, however, which are described below:
Multiple adduct support
The original MetFrag algorithm assumed (in most cases) that the charge carrier on a fragment was a proton (i.e. H+). Thus all theoretical fragment masses would be increased/decreased by the mass of a proton (for positive/negative polarity experiments respectively), before being matched to experimental peaks.
However, since Progenesis QI handles multiple adduct forms, a fragment might also inherit the charge carrier of the precursor adducted form. To handle this, Progenesis QI produces a copy of your experimental fragmentation trace, but shifts all peaks by the mass of the adduct associated with the trace. This copy represents the neutral masses of all the fragments, assuming they all retained the adduct of the precursor.
Progenesis QI also produces a second copy of your experimental trace, which is similarly shifted, except that it is shifted by the mass of an electron. This copy represents the neutral masses of the fragments, assuming they all have an electron as their charge carrier. These two copies are combined into a single trace, which is sent to MetFrag for comparison with theoretical fragment masses.
Our modified version of MetFrag thus removes the assumption of a proton charge carrier, and it no longer modifies the mass of theoretical fragments before comparison with experimental peaks. In effect, then, MetFrag is comparing “like with like” – it compares the neutral mass of the theoretical fragments to peaks representing the hypothetical neutral mass of the peaks in the experimental data.
After MetFrag matches theoretical fragments to peaks, Progenesis QI takes the copied, shifted peaks and shifts them back to their original experimental m/z values. Depending on which peak the fragments are matched to, it can then determine which adducted form of the given fragment that peak represents.
For example, suppose your observed trace contains a peak at 100.0Da, and the precursor is an adducted form M+X (where X has a mass of 30.0Da). Progenesis QI will shift this peak to 70.0Da when it sends it to MetFrag. Then suppose that MetFrag generates an in-silico fragment Y of mass 70.05Da, which it matches to this peak. Progenesis QI knows that the observed peak at 70.0Da was formed by shifting the 100.0Da peak by the mass of X, so it labels the peak at 100.0Da as Y+X (with a mass error of 0.05Da).
Scoring
The scoring method in the original MetFrag paper calculated a score based on weighted peak intensities, combined this with a score based on energy of the bonds broken, and normalised the results so that the best identification for a given compound has a score of 1.
In Progenesis QI, you often have multiple compounds which you are identifying simultaneously, all of which may have many possible identifications. With the original MetFrag scoring algorithm, the top identification for each compound scored 1, so it was not possible to compare scores between compounds.
The algorithm in our modified version of MetFrag is based upon the same score calculated from weighted peak intensities, but we have modified it so that scores are comparable across compounds.
This is achieved by considering the sum of weighted peak intensities across all matched peaks, and dividing it by the sum of weighted peak intensities across all peaks, matched or not. This gives us a relative score of how well the observed fragmentation trace can be explained by a given possible identification – a possible identification that can explain 90% of the peaks is preferred to one that can explain only 40% of the peaks (in practice this is not necessarily the case, because the peaks are weighted by intensity and mass, but it suffices to simplify the explanation).
If you’re comfortable reading code, the scoring algorithm is fairly simple and can be seen here: https://github.com/NonlinearDynamics/MetFrag.NET/blob/master/MetFragNET/Scoring/Scorer.cs.