Waters | Nonlinear Dynamics

Progenesis QI for proteomics

The next generation in LC-MS proteomics data analysis software.
Discover the significantly changing proteins in your samples.

Download

What are data compression and peak modelling and why do you do this?

Online LC-MS can generate very large datasets which can restrict how many samples you can run per experiment before your analysis fails to run in realistic times. To overcome these limitations and allow the handling of large numbers of samples we have developed two approaches that help you to analyse significant numbers of large data files.

Data compression is where we remove the background data and substitute these values with a zero. There is nothing to see in these areas, they are irrelevant to the analysis and by removing these values we dramatically reduce the size of the data we are dealing with.

Peak modelling turns the peak data, which is made up of many points, into 3 points using an intelligent peak-modelling algorithm that can reduce data files by an order of magnitude. Using a wavelet based approach peak models are created that retain all relevant quantitation and positional information. The information needed to record each peak is therefore massively reduced but we are not actually losing any information, which is important point to stress. The original data is discarded as that would result in too much data to store. With this simplified peak model, we can also do the peptide detection which would have proved too difficult using the many data points we had before peak modelling.

As an example an experiment run by Boston University School of Medicine produced 20 files of LC-MS data and the original RAW files were about 2GB each, the mzXML files were 1.7GB each and the resulting mznld files were 25MB each.