## How does database fragmentation scoring work?

To score database fragmentation matches, we use an algorithm based on the well adopted cosine similarity method. A similar method is, for example, implemented by MassBank [pdf].

### Cosine similarity method

The dot product of two 2-dimensional vectors, and is:

It can also be expressed as:

Where is the angle between the two vectors, and .

By equating these two formulae, the "similarity" between the two vectors is given by the cosine of the angle between them, which has the nice property that it ranges from 0 to 1 when all co-efficients are positive:

This method can also be expanded to n-dimensional vectors:

A similarity of 1 means the two vectors are identical, and a similarity of 0 means they are orthogonal and independent of each other.

### Cosine similarity method applied to ms/ms scoring

We apply this method to scoring of ms/ms database matches as follows.

We create two vectors and , where each element of the vector is a weighted peak intensity given by:

We combine all m/z's of peaks from the experimental and database spectra, and go through them in ascending m/z order. For each m/z, there are 3 possibilities:

- There is an experimental peak at the given m/z, but no matching database peak.
- There is a database peak at the given m/z, but no matching experimental peak.
- There is an experimental peak at the given m/z, and a database peak at the same m/z (to within a threshold).

For each of these scenarios, we add elements to the vectors and as follows:

- We add the weighted experimental peak intensity to and a 0 to .
- We add a 0 to and the weighted database peak intensity to .
- We add the weighted experimental peak intensity to and the weighted database peak intensity to .

Finally, we calculate the similarity metric on and as defined above. To obtain a score between 0 and 100, we multiply this result by 100.

### Example

To illustrate this method, suppose we have the following experimental and database spectra:

In this case, the two vectors produced are as follows (where is the weighted intensity function):

The similarity metric is then:

So these two spectra will be given a fragmentation score of **~93** - they are fairly well matched, but there are a few peaks which are either not matched, or not expected to be present, lowering its score.