What are data compression and peak modelling and why do you do this?

What are data compression and peak modelling and why do you do these?

Online LC-MS can generate very large datasets, which can restrict how many samples you can run per experiment before your analysis fails to run in realistic times. To overcome these limitations and allow the handling of large numbers of samples, we have developed two approaches that help you to analyse significant numbers of large data files.

Data compression is removal of background data, for which values are substituted with zero. “Background” in this sense refers to areas containing no meaningful data at all, and their removal dramatically reduces file size.

Peak modelling turns the peak data, made up of many points, into 3 points per peak using an intelligent peak-modelling algorithm that can reduce data files by an order of magnitude. Using a wavelet based approach, peak models are created that retain all relevant quantitation and positional information. The information needed to record each peak is therefore massively reduced but no quantitative information is lost (an important point to emphasise). The original data are discarded for storage reasons. The simplified peak model also greatly expedites peptide ion detection, which would otherwise be an insurmountable challenge to achieve in a reasonable time frame using the many data points present prior to peak modelling.

As a representative example of file size reduction, our HDMS^E tutorial data set was prepared from raw folders with an average size of approximately 725MB per run; however, the mznld analysis files generated by Progenesis QI from these - two per run, since the runs contain low and high energy MS^E data – were up to 6.38MB in size for LE data, and up to 3.20MB for HE data, giving all runs a cumulative processed size of below 10MB.