Expert System for Computer-assisted
Annotation of MS/MS Spectra*□S

Nadin Neuhauser‡¶, Annette Michalski‡¶, Jürgen Cox‡, and Matthias Mann‡§

An important step in mass spectrometry (MS)-based pro-
teomics is the identification of peptides by their fragment
spectra. Regardless of the identification score achieved,
almost all tandem-MS (MS/MS) spectra contain remaining
peaks that are not assigned by the search engine. These
peaks may be explainable by human experts but the scale
of modern proteomics experiments makes this impracti-
cal. In computer science, Expert Systems are a mature
technology to implement a list of rules generated by in-
terviews with practitioners. We here develop such an Ex-
pert System, making use of literature knowledge as well
as a large body of high mass accuracy and pure fragmen-
tation spectra. Interestingly, we find that even with high
mass accuracy data, rule sets can quickly become too
complex, leading to over-annotation. Therefore we estab-
lish a rigorous false discovery rate, calculated by random
insertion of peaks from a large collection of other MS/MS
spectra, and use it to develop an optimized knowledge
base. This rule set correctly annotates almost all peaks of
medium or high abundance. For high resolution HCD data,
median intensity coverage of fragment peaks in MS/MS
spectra increases from 58% by search engine annotation
alone to 86%. The resulting annotation performance sur-
passes a human expert, especially on complex spectra
such as those of larger phosphorylated peptides. Our
system is also applicable to high resolution collision-in-
duced dissociation data. It is available both as a part of
MaxQuant and via a webserver that only requires an
MS/MS spectrum and the corresponding peptides se-
quence, and which outputs publication quality, annotated
MS/MS spectra (www.biochem.mpg.de/mann/tools/). It
provides expert knowledge to beginners in the field of
MS-based proteomics and helps advanced users to focus
on unusual and possibly novel types of fragment
ions. Molecular & Cellular Proteomics 11: 10.1074/mcp.
M112.020271, 1500 –1509, 2012.

In MS-based proteomics, peptides are matched to peptide
sequences in databases using search engines (1–3). Statisti-
cal criteria are established for accepted versus rejected pep-

tide spectra matches based on the search engine score, and
usually a 99% certainty is required for reported peptides. The
search engines typically only take sequence specific back-
bone fragmentation into account (i.e. a, b, and y ions)
and some of their neutral losses. However, tandem mass
spectra— especially of larger peptides— can be quite com-
plex and contain a number of medium or even high abun-
dance peptide fragments that are not annotated by the search
engine result. This can result in uncertainty for the user—
especially if only relatively few peaks are annotated— be-
cause it may reflect an incorrect identification. However, the
most common cause of unlabeled peaks is that another pep-
tide was present in the precursor selection window and was
cofragmented. This has variously been termed “chimeric
spectra” (4 – 6), or the problem of low precursor ion fraction
(PIF)1 (7). Such spectra may still be identifiable with high
confidence. The Andromeda search engine in MaxQuant, for
instance, attempts to identify a second peptide in such cases
(8, 9). However, even “pure” spectra (those with a high PIF)
often still contain many unassigned peaks. These can be
caused by different fragment types, such as internal ions,
single or combined neutral losses as well as immonium and
other ion types in the low mass region. A mass spectrometric
expert can assign many or all of these peaks, based on expert
knowledge of fragmentation and manual calculation of frag-
ment masses, resulting in a higher degree of confidence for
the identification. However, there are more and more practi-
tioners of proteomics without in depth training or experience
in annotating MS/MS spectra and such annotation would in
any case be prohibitive for hundreds of thousands of spectra.
Furthermore, even human experts may wrongly annotate a
given peak— especially with low mass accuracy tandem mass
spectra— or fail to consider every possibility that could have
resulted in this fragment mass.

Given the desirability of annotating fragment peaks to the
highest degree possible, we turned to “Expert Systems,” a
well-established technology in computer science. Expert Sys-
tems achieved prominence in the 1970s and 1980s and were
meant to solve complex problems by reasoning about knowl-

From the ‡Department of Proteomics and Signal Transduction,
Max-Planck Institute of Biochemistry, Am Klopferspitz 18, D-82152
Martinsried, Germany

Received May 5, 2012, and in revised form, July 19, 2012
Author’s Choice—Final version full access.

Published, MCP Papers in Press, August 10, 2012, DOI
10.1074/mcp.M112.020271

1 The abbreviations used are: PIF, Precursor Intensity Fraction;
FDR, False Discovery Rate; MS/MS, Tandem mass spectrometry;
HCD, Higher Energy Collisional Dissociation; PEP, Posterior Error
Probability; PDF, Portable Document Format; IM, immonium ion; SC,
side chain fragment ion; Th, Thomson.

Technological Innovation and Resources

Author’s Choice © 2012 by The American Society for Biochemistry and Molecular Biology, Inc.
This paper is available on line at http://www.mcponline.org

1500 Molecular & Cellular Proteomics 11.11


edge (10, 11). Interestingly, one of the first examples was
developed by Nobel Prize winner Joshua Lederberg more
than 40 years ago, and dealt with the interpretation of mass
spectrometric data. The program’s name was Heuristic
DENTRAL (12), and it was capable of interpreting the mass
spectra of aliphatic ethers and their fragments. The hypothe-
ses produced by the program described molecular structures
that are plausible explanations of the data. To infer these
explanations from the data, the program incorporated a the-
ory of chemical stability that provided limiting constraints as
well as heuristic rules.

In general, the aim of an Expert System is to encode knowl-
edge extracted from professionals in the field in question. This
then powers a rule-based system that can be applied broadly
and in an automated manner. A rule-based Expert System
represents the information obtained from human specialists in
the form of IF-THEN rules. These are used to perform oper-
ations on input data to reach appropriate conclusion. A ge-
neric Expert System is essentially a computer program that
provides a framework for performing a large number of infer-
ences in a predictable way, using forward or backward
chains, backtracking, and other mechanisms (13). Therefore,
in contrast to statistics based learning, the “expert program”
does not know what it knows through the raw volume of facts
in the computer’s memory. Instead, like a human expert, it
relies on a reasoning-like process of applying an empirically
derived set of rules to the data.

Here we implemented an Expert System for the interpreta-
tion for high mass accuracy tandem mass spectrometry data
of peptides. It was developed in an iterative manner together
with human experts on peptide fragmentation, using the pub-
lished literature on fragmentation pathways as well as large
data sets of higher-energy collisional dissociation (HCD) (14)
and collision-induced dissociation (CID) based peptide iden-
tifications. Our goal was to achieve an annotation perform-

ance similar or better than experienced mass spectrometrists
(15), thus making comprehensively annotated peptide spectra
available in large scale proteomics.

EXPERIMENTAL PROCEDURES

The benchmark data set is from Michalski et al.2 Briefly, E. coli,
yeast and HeLa proteomes were separated on 1D gel electrophoresis
and in gel digested (16). Resulting peptides were analyzed by liquid
chromatography (LC) MS/MS on a linear ion trap - Orbitrap instrument
(LTQ Velos (17) or ELITE (18), Thermo Fisher Scientific). Peptides were
fragmented by HCD (14) or by CID, but in either case fragments were
transferred to the Orbitrap analyzer to obtain high resolution tandem
mass spectra (7500 at m/z 400). We scanned tandem mass spectra
already from m/z 80 to capture immonium ions as completely as
possible. Data analysis was performed by MaxQuant using the An-
dromeda search engine (8, 9). Maximum initial mass deviation for
precursor peaks was 6 ppm and maximum deviation for fragment ions
for both the search engine and for the Expert System was 20 ppm.
MaxQuant preprocessed the spectra to be annotated by the Expert
System in the same way as it does for the Andromeda search engine:
Peaks were filtered to the 10 most abundant ones in a sliding 100 m/z
window, de-isotoped and shifted to charge one where possible. From
this data, sequence-spectra pairs were selected that had a certainty
of identification of 99.99% PIF values (7) larger than 95% and that
were sequence unique (more than 16,000 peptides).

The Expert System was written in the programming language C#,
using the Microsoft .NET framework version 3.5 and the Workflow.
Activities library, which contains a rule engine to implement an Expert
System (Microsoft Corporation, Redmond, WA).

MaxQuant contains the Expert System as an integrated option in its
Viewer—the component that allows visualization of raw and anno-
tated MS data. MaxQuant can freely be downloaded from www.
maxquant.org. It requires Microsoft .NET 3.5, which is either already
installed with Microsoft Windows or can be installed as a free Win-
dows update. In our group we have implemented the Expert System
both on a Windows cluster and in a desktop version. Additionally, we
provide an Expert System web server, which can be accessed at

2 Michalski, A., Neuhauser, N., Cox, J., and Mann, M., unpublished
data.

FIG. 1. Basic concept of the Expert System. A, An Expert System is constructed by interviewing an expert in the domain (here peptide
fragmentation and the accumulated literature) and devising a set of rules with associated priority and dependence on each other. The
knowledge base contains the rules whereas the rule engine is generic and applies the rules to the data. B, Data are automatically processed
following the steps depicted.

Expert System for Annotation of MS/MS Spectra

Molecular & Cellular Proteomics 11.11 1501


www.biochem.mpg.de/mann/tools/. Although MaxQuant allows the
Expert System annotation of arbitrary numbers of MS/MS spectra, the
webserver is currently limited to the submission of one MS/MS spec-
trum at a time. After upload of a list of peaks with m/z value and their
intensities—together with the corresponding peptide sequence—the
spectrum with all annotations is displayed. This can then be exported
in different graphical formats.

RESULTS AND DISCUSSION

Construction of the Expert System—Human experts per-
form a generic set of tasks when solving problems such as the
interpretation of an MS/MS spectrum. These rules have to be
codified in the Expert System, mainly in the form of a series of
IF-THEN rules. Fig. 1 shows the major steps involved in build-
ing and using the Expert System. It is important to acquire all
relevant rules to interpret MS/MS spectra as comprehensively
as possible. However, to avoid over-annotation leading to
false positives (see below), the number of rules and their
interactions should not become too large. This balance was
struck by evaluating the performance of different set of rules
on large data sets in conjunction with human experts.

Rules were encoded in a table-like structure, where they
could be activated, deactivated or modified. To create the
knowledge base, the extent of interactions of the rules also
had to be determined—for instance, which combination of
neutral losses to allow. After iterative construction of the
knowledge base, the rule engine then applied the encoded
knowledge to MS/MS spectra and displayed the result to the
user (Fig. 1A). The processing steps that are performed on the
raw MS and MS/MS spectra are shown in Fig. 1B (see also
EXPERIMENTAL PROCEDURES). Note that the workflow is
entirely automated and that user interaction is possible but
not required. Arbitrary numbers of annotated spectra of pep-
tides of interest can be produced as interactive screen images
or high resolution, printable PDF files. The Expert System is
very fast, and 16,000 spectra can be annotated in less than
four hours on a desktop system.

The IF-THEN constraints of our Expert System can be di-
vided into four major parts (Fig. 2). At first the Expert System
calculates any specific backbone fragments (a, b, and y-ion
series), the charged precursor ion, the immonium ions as well
as side chain fragments in the low-mass region and places
them into a queue. In the second part of the workflow every
element in this queue is filtered with respect to the actual
MS/MS spectrum. Even if there is a peak corresponding to a
calculated item in the queue, it may still be filtered out
(symbolized by missing annotations after the filter in Fig. 2).
For instance, a b1 ion is only allowed in very restricted
circumstances.

In the third step, neutral losses and internal fragments for
the filtered values are calculated and added to the queue.
They are then subjected to the same filtering rules as in step
2. Step 3 is iterative, as several subsequent neutral losses
may be allowed.

In the fourth and last step each potential annotation is given
a priority. If there is more than one possible annotation, the
one with the highest priority is chosen (i.e. the one that trig-
gered the rules with higher priority). However, in this case the
Expert System provides a pop-up (or “tool-tip”) containing the
other possibility when hovering the mouse over the peak.
(This can still happen if the FDR is properly controlled and is
then typically caused by two different chemical designations
for the same ion; or by different ions with the same chemical
composition, such as small internal fragments with different
sequence but the same amino acids).

Determining a False Discovery Rate for Peak Annotation—
Use of a very high threshold for peptide identification
(99.99%) ensured that virtually none of the peptides in our
collection should be misidentified. However, when building

FIG. 2. Work flow of the Expert System. ➀ From the database
sequence of the peptide identified by the search engine, a list of
possible fragment ions is created. ➁ Peaks from the measured spec-
trum are compared with the possible fragments and preliminarily
annotated if they pass the rules of the Expert System. ➂ Neutral
losses and internal fragments are generated from the candidate,
annotated peaks and exposed to the Expert System rules. ➃ Potential
conflicts are resolved via the priority of the annotations and peaks are
labeled. Note that possible internal fragment ‘CA’ is crossed out
because the b2 ion has the higher priority.

Expert System for Annotation of MS/MS Spectra

1502 Molecular & Cellular Proteomics 11.11


FIG. 3. Calculation of false discovery rate for peak annotations. A, The upper panels represent a large number of identified MS/MS
spectra from which annotated peaks are drawn to form a large peak collection of possible fragment masses. From each identified spectrum
in the data set, 10 random fragments are inserted and the number of annotations by the Expert System is counted. This process is repeated
500 times for each peptide. B, Median FDR as determined in A as a function of peptide length distinguished by the mass difference of fragment
ion and theoretical mass. The FDR for peak annotation rises with peptide length and is strongly dependent on the mass difference. Box plot
at the bottom shows that 50% of the peptides were between 12 and 18 amino acids long. The box plots on the right summarize the range of
FDR values regardless of peptide length. C, Graph of the median FDR as a function of peptide length but separated by intensity classes of the
false annotated fragment peaks. Most false positives come from the low abundant peaks (blue) rather than the medium (green) or high
abundance fragment peaks (yellow). D, Same plot as above but differentiated by the fragment ion type of the false positives. Getting lower
number of false positives from regular fragment annotations (blue), compared with internal fragment (green) and neutral loss annotations
(yellow).

Expert System for Annotation of MS/MS Spectra

Molecular & Cellular Proteomics 11.11 1503


the Expert System, we noticed that it was still possible to
over-interpret the MS/MS spectra. This was initially surprising
to us because our large scale data set had good signal to
noise and peaks was only candidates for annotation when
their calculated mass was less than 20 ppm from the ob-
served mass. The over-interpretation became apparent through
conflicting annotations for the same peak, and was typically

caused by a combination of rules, such as several neutral losses
from major sequence specific backbone or internal ions. Be-
cause conflicting or wrong annotations would undermine the
entire rational for the Expert System, we devised a scheme to
stringently control the false discovery rate for peak annotation.

The false discovery rate (FDR) is meant to represent the
percent probability that a fragment peak is annotated by

FIG. 4. Example spectra before and after Expert System annotation. A, Based on the search engine result, 34% of the fragments by peak
intensities and 24% by peak number are explained, whereas the Expert System almost completely annotates the spectrum (for further
explanation see main text). Posterior Error Probability (PEP) a statistical expectation value for peptide identification in Andromeda. Apart from
the large fraction of a-, b-, and y-ions (pale blue/dark blue/red) and ions with neutral losses (orange), one can find internal fragment ions (purple)
and in the low mass region one immonium ion of Isoleucine (green) and a side chain loss from arginine (turquoise). B, Expert System annotation
of a phosphorylated peptide. Apart from the internal ions, several phosphorylation-related fragment ions were found. The asterisk (*) denotes
loss of H3O4P with a delta mass of 97.9768 from the phosphorylated fragment ion.

Expert System for Annotation of MS/MS Spectra

1504 Molecular & Cellular Proteomics 11.11


chance because its mass fits one of the Expert System rules
for the peptide sequence. To calculate a proper FDR, we
therefore needed to provide a set of background peaks that
would represent false positives when they are labeled by the
Expert System. Producing realistic background peaks turned
out to be far from trivial because they need to have possible
masses that can in principle be generated from peptide se-
quences and they need to be independent of the sequence of
the peptide in question. The principle of our solution to this
problem is shown in Fig. 3A. From the large data set under-
lying this study, we collect the m/z values of all annotated
peaks, except those coming from immonium or side chain

ions. They were stored in a large peak collection of several
million entries, together with the respective peptide se-
quences and the relative intensity of the peak. For each spec-
trum in which we wanted to determine the FDR, we then
inserted a random set of 10 peaks from the collection, where
after we checked if the sequence of the selected peaks was
independent from the sequence of the current spectrum. If
one of the inserted peaks overlapped with an existing peak, it
was discarded. By definition these 10 peaks represent pos-
sible peptide fragments and, because they are chosen ran-
domly from millions of other peaks, they collectively represent
a good approximation to a true background set. This would

FIG. 4 —continued

Expert System for Annotation of MS/MS Spectra

Molecular & Cellular Proteomics 11.11 1505


not be the case for permutation of the sequence of the pre-
cursor in question, for instance, because many of the frag-
ment peaks in permutated sequences are identical. Whenever
the Expert System annotated one of these peaks, it was
counted as a false positive. To find the number of repeats
necessary to obtain a stable FDR for this procedure, we chose
a set of spectra and simulated a thousand times on each one.
We found that the FDR was constant after 500 iterations. For
the final FDR calculation, for each spectrum we added a
different set of 10 random peaks from the collection and
repeated this 500 times. This was then applied to each of the
more than 16,000 pure (high PIF) spectra in the large scale
data set.

Beyond providing a solid FDR estimate for each rule set,
this procedure also allowed us to identify the rules or rule
combinations that were responsible for miss-annotation, i.e.
the rules that falsely annotated the inserted peaks. These
mostly turned out to be chains of subsequent neutral losses.
In conjunction with detailed evaluation of the frequency of ion
types, we iteratively designed an optimal rule set (supplemen-
tal Table S1). For instance, neutral losses from a particular
amino acid were allowed if they occurred in more than five
percent of the fragment sequences that contained that amino
acid. Likewise, of a set of about 42 possible neutral side chain
losses, only six were sufficiently important to retain them in
the Expert System. The Figs. 3B–3D show the results of the
median FDR as a function of the peptide length based on this
final rule set. The overall FDR—indicated in red—is the same
in all plots and shows a clear growing trend in the number of

false positives with the length of the peptides. For small
peptides of 12 amino acids or less, the FDR was less than
2.1% and all peptides in the range investigated had a peak
annotation FDR of less than 5%. With these settings, the
annotations are correct in more than 97% of the cases for the
vast majority of MS/MS spectra. The Expert System could of
course be pruned to provide a lower FDR by narrowing the
mass tolerance window; however, this would come at the ex-
pense of discarding correct annotations. To explore the influ-
ence of mass accuracy on potential false positive annotations,
we repeated these calculations with required mass deviations
no larger than 5 ppm or no larger than 10 ppm. As can be seen
in Fig. 3B, this further reduced possible errors to less than
1%, or less than 0.3%, respectively. This highlights the value
of high mass accuracy in unambiguously identifying fragment
mass identity.

Furthermore, peaks with a low signal to noise are more
likely to be miss-annotated than more intense peaks. In Fig.
3C we sorted the peak intensity of the false positives into
three intensity classes (Fig. 3C). The median FDR of peaks
with high or medium abundance are only 0.1 or 0.5%. For low
abundance peaks it is higher but still with a median of no more
than 2.1%.

Next we separately investigated the FDR as a function of
peptide length for the different fragment ion types. As can be
seen in Fig. 3C, regular ions and internal fragments contribute
very little to overall false annotation (0.4 and 0.5%), whereas
neutral loss ions are wrongly annotated in 1.8% of the case or
even more.

FIG. 5. Expert System performance on a large data set. Median sequence coverage by summed fragment ion intensity is plotted as a
function of identification score. Statistics is based on more than 16,000 spectra. For every identification score, the Expert System adds a large
proportion of explainable peaks. Box plot below the graph indicates that 50% of peptides in the set have an Andromeda score between 98
and 140. Box plots on the right indicate the range of values for the intensity coverage for standard and Expert System annotation.

Expert System for Annotation of MS/MS Spectra

1506 Molecular & Cellular Proteomics 11.11

http://www.mcponline.org/cgi/content/full/M112.020271/DC1
http://www.mcponline.org/cgi/content/full/M112.020271/DC1


Performance of the Expert System—Fig. 4 shows an illus-
trative example of an HCD fragmented peptide before and
after Expert System evaluation. The peptide was identified
with an Andromeda score of 136 and posterior error proba-
bility (PEP) of 1.1E-21 (the corresponding Mascot score was
83). The spectrum features an uninterrupted b-ion series from
b2 to b9 and an uninterrupted y-ion series from y1 to y12,
together covering the entire peptide sequence. Despite this
unambiguous identification, the peaks used by the search
engine to identify the peptide only accounted for 35% of the

summed intensity of the peaks in the fragmentation spectrum.
Coverage by number of explained peaks was even lower at
24% (allowing up to 10 peaks per 100 Th in the measured
spectrum see EXPERIMENTAL PROCEDURES). There is a
series of high abundance, high m/z fragments as well as a
large number of low abundance peaks in the low and me-
dium m/z range that are unexplained by the search engine.
After annotation by the Expert System, this situation
changes entirely. The high m/z series is revealed to be a
prominent loss of CH4SO from oxidized methionine. The low

FIG. 6. Web interface for the Expert System. A, Text field to paste the spectrum in text format (m/z value; intensity in arbitrary units). B,
Form to enter the peptide sequence, modifications and their positions. C, Detected backbone fragments and their neutral losses are indicated
in the peptide logo. Scalable spectrum annotated by the Expert System. Note that neutral loss peaks are very small compared with the major
backbone fragments. The spectrum can be downloaded with the desired resolution and in the desired graphical format.

Expert System for Annotation of MS/MS Spectra

Molecular & Cellular Proteomics 11.11 1507


mass ions are neutral losses, internal fragments and com-
binations between them and they were unambiguously and
correctly assigned. Altogether, the Expert System ac-
counted for almost all prominent ions and explained a total
of 88% of the ion current. Manual annotation of this spec-
trum would have been possible but would have been very
time consuming.

Interpretation of phosphorylated peptides, especially large
ones, is more difficult than that of unmodified peptides. Fur-
thermore, accurate placement of the phosphorylation site can
be challenging. We used literature knowledge (19, 20) and the
results of a large-scale investigation into the fragmentation of
phosphorylated peptides to derive suitable fragmentation
rules for the Expert System. This led to an additional six rules,
which were easily integrated, illustrating the extensibility of
the Expert System. Fig. 4B depicts an example annotation of
the relatively complex fragmentation spectra typical of phos-
phorylated peptides. The large ion series from the low mass
range to about mass 1000 is caused by an extensive and
uninterrupted internal ion series starting from the proline in the
second position of the peptide sequence. As these internal
fragments contain several glutamines, they lead to additional
water and ammonia losses. However, there are also newly
annotated fragments resulting from neutral losses in addition
to loss of the phosphorylation site. Moreover, the neutral loss
of HPO3 is annotated.

Large-scale Evaluation of the Performance of the Expert
System—We used the population of 16,000 spectra with high
PIF—identified with a false discovery rate of 0.01% by the
search engine—and annotated them automatically using the
Expert System. For each spectrum we calculated the intensity
coverage obtained by the fragments used by the search en-
gine and the fragments explained by the Expert System.
Higher scoring fragmentation spectra would be expected to
have a larger fraction of their ion current annotatable than
lower scoring peptides. Fig. 5A shows a plot of the median of
these values for all search engine scores. A total of 95% of
these Andromeda scores are within a range of 96 to 138. Here
the median intensity coverage by standard annotation varies
from 55% at 96 to 64% at 138. The Expert System, in con-
trast, annotated between 86 and 89% of the total ion current
in the fragment spectra of the same peptides. This repre-
sents an average increase of 28%. There was only a small
percentage of peptides that were lower scoring than 96 and
for these the increased annotation percentage of the Expert
System was even larger (34%). Interestingly, even in very
high scoring HCD fragment spectra there are still many
peaks not directly annotated by the search engine. For
these, the average increase of annotated ion current be-
cause of the Expert System was still 23%.

The rule set of the Expert System was derived from HCD
data. However, HCD and CID appear to produce similar ion
types, although with different abundances. We therefore
tested if the derived rule set was also applicable to high

resolution CID data. This was indeed the case, and a total of
85% of the ion current in high resolution CID spectra ex-
plained by the Expert System, although in CID spectra a
higher percentage (79%) of the peaks are already accounted
for by standard ion types. Therefore we conclude that the
Expert System can be used equally well for high resolution
HCD and CID data although the benefits for CID are not as
large as they are for HCD.

Webserver for Expert System Annotation of Spectra—The
Expert System is now part of the Viewer component of Max-
Quant, which is freely available at www.maxquant.org. In this
environment, the Expert System can annotate arbitrarily large
data sets of identified peptides and visualize and export them
in different graphical formats such as PDF. Additionally, we
established a webserver to make the Expert System available
to any proteomics scientist, regardless of the computational
workflow that he or she is using. The webserver is located at
http://www.biochem.mpg.de/mann/tools/and its graphical in-
terface is shown in Fig. 6. The user needs to supply a mass
spectrum in the form of an m/z and peak intensity list as well
as the sequence of the identified peptide (Figs. 6A, 6B). Com-
mon modifications and their position in the sequence can also
be specified. The webserver then provides an annotation of
the spectrum within the stated mass tolerance as shown in
Fig. 6C. The graph is scalable to enable detailed study of
complex fragmentation spectra. Mass deviations in ppm (cal-
culated mass – measured mass) can also be depicted. This
annotated spectrum can be downloaded in a number of
graphical formats for use in publications.

CONCLUSION AND OUTLOOK

Here we have made use of Expert Systems—a well-known
technology in computer science—to automatically but accu-
rately interpret the fragmentation spectra of identified pep-
tides. We have shown that the Expert System performs very
well on high mass accuracy data, annotating the large major-
ity of medium to high abundance peaks. For HCD spectra it
explains on average 28% more of the peak intensities than the
search engine results alone. We derived a rigorous false pos-
itive rate, ensuing that less than 5% of peaks can be miss-
annotated—this rate is even lower for spectra with at least
median scores and fragment ion intensities of at least mod-
erate abundance. The rule set was derived by iterative inter-
pretation of large HCD data set but we show that the Expert
System is equally applicable to high resolution CID spectra.

We envision different uses for the Expert System: For be-
ginners in MS-based proteomics, it enables efficient training
in the interpretation of MS/MS spectra without requiring much
input from a specialist. For advanced users, it allows focusing
on unusual and potentially novel types of fragments. One
caveat is that the Expert System currently cannot explain
fragment peaks that belong to cofragmented precursors; a
very common occurrence that we deliberately avoided here
by selecting only pure MS/MS spectra. This limitation can be

Expert System for Annotation of MS/MS Spectra

1508 Molecular & Cellular Proteomics 11.11


addressed if both precursors are identified and communi-
cated to the Expert System. Such a feature might be partic-
ularly useful for instruments that allow deliberate multiplexing
of precursors, which leads to complex MS/MS spectra (21).

The Expert System has been in routine use in our laboratory
for a number of months. During this time we have found that
it provides helpful confirmation of the identification of the
peptide and the identity of the previously unlabeled fragment
ions. This is particularly welcome in the case of complicated
spectra of important peptides, such as the ones regulated in
the biological function in question. Compared with a human
expert, the principal advantages of the Expert System are its
speed, its ability to check for all supplied rules in a consistent
manner as well as its rigorously controlled false positive rate.
Obviously, the Expert System is limited to the knowledge
supplied whereas an experienced mass spectrometrist can go
beyond these rules and discover the origin of novel fragmen-
tation mechanisms.

As we have shown here, Expert Systems can readily be
applied to problems in computational proteomics. Given their
relative ease of implementation, they may become useful in
other areas in MS-based proteomics, too.

Acknowledgments—We thank Forest White for critical comments
on this manuscript.

* This work was supported by funding from the European Union 7th

Framework project PROSPECTS (Proteomics Specification in Time
and Space, grant HEALTH-F4-2008-201645).

□S This article contains supplemental Table S1.
¶ These authors contributed equally.
§ To whom correspondence should be addressed: Department of

Proteomics and Signal Transduction, Max-Planck Institute of Bio-
chemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany.
E-mail: mmann@biochem.mpg.de.

REFERENCES

1. Steen, H., and Mann, M. (2004) The ABC’s (and XYZ’s) of peptide sequenc-
ing. Nat. Rev. Mol. Cell Biol. 5, 699 –711

2. Nesvizhskii, A. I., Vitek, O., and Aebersold, R. (2007) Analysis and validation
of proteomic data generated by tandem mass spectrometry. Nat. Meth-
ods 4, 787–797

3. Granholm, V., and Käll, L. (2011) Quality assessments of peptide-spectrum
matches in shotgun proteomics. Proteomics 11, 1086 –1093

4. Houel, S., Abernathy, R., Renganathan, K., Meyer-Arendt, K., Ahn, N. G.,
and Old, W. M. (2010) Quantifying the impact of chimera MS/MS spectra
on peptide identification in large-scale proteomics studies. J. Proteome
Res. 9, 4152– 4160

5. Zhang, N., Li, X. J., Ye, M., Pan, S., Schwikowski, B., and Aebersold, R.
(2005) ProbIDtree: an automated software program capable of identify-

ing multiple peptides from a single collision-induced dissociation spec-
trum collected by a tandem mass spectrometer. Proteomics 5,
4096 – 4106

6. Bern, M., Finney, G., Hoopmann, M. R., Merrihew, G., Toth, M. J., and
MacCoss, M. J. (2010) Deconvolution of mixture spectra from ion-trap
data-independent-acquisition tandem mass spectrometry. Anal. Chem.
82, 833– 841

7. Michalski, A., Cox, J., and Mann, M. (2011) More than 100,000 detectable
peptide species elute in single shotgun proteomics runs but the majority
is inaccessible to data-dependent LC-MS/MS. J. Proteome Res. 10,
1785–1793

8. Cox, J., and Mann, M. (2008) MaxQuant enables high peptide identification
rates, individualized p.p.b.-range mass accuracies and proteome-wide
protein quantification. Nat. Biotechnol. 26, 1367–1372

9. Cox, J., Neuhauser, N., Michalski, A., Scheltema, R. A., Olsen, J. V., and
Mann, M. (2011) Andromeda: a peptide search engine integrated into the
MaxQuant environment. J. Proteome Res. 10, 1794 –1805

10. Giarratano, J. C., and Riley, G. (2005) Expert systems: principles and
programming. PWS Pub. Co., Boston

11. Liao, S. H. (2005) Expert system methodologies and applications - a dec-
ade review from 1995 to 2004. Expert Syst. Appl. 28, 93–103

12. Schroll, G., Duffield, A. M., Djerassi, C., Buchanan, B. G., Sutherland, G. L.,
Feigenbaum, E. A., and Lederberg, J. (1969) Applications of artificial
intelligence for chemical inference. III. Aliphatic ethers diagnosed by their
low-resolution mass spectra and nuclear magnetic resonance data.
J. Am. Chem. Soc. 91, 7440 –7445

13. Russell, S. J., Norvig, P., and Davis, E. (2010) Artificial intelligence: a
modern approach. Prentice Hall, Upper Saddle River, NJ

14. Olsen, J. V., Macek, B., Lange, O., Makarov, A., Horning, S., and Mann, M.
(2007) Higher-energy C-trap dissociation for peptide modification anal-
ysis. Nat. Methods 4, 709 –712

15. Bin, M., and Johnson, R. (2012) De novo sequencing and homology search-
ing. Mol. Cell. Proteomics 11, O111.014902

16. Shevchenko, A., Tomas, H., Havlis, J., Olsen, J. V., and Mann, M. (2006)
In-gel digestion for mass spectrometric characterization of proteins and
proteomes. Nat. Protoc. 1, 2856 –2860

17. Olsen, J. V., Schwartz, J. C., Griep-Raming, J., Nielsen, M. L., Damoc, E.,
Denisov, E., Lange, O., Remes, P., Taylor, D., Splendore, M., Wouters,
E. R., Senko, M., Makarov, A., Mann, M., and Horning, S. (2009) A dual
pressure linear ion trap Orbitrap instrument with very high sequencing
speed. Mol. Cell. Proteomics 8, 2759 –2769

18. Michalski, A., Damoc, E., Lange, O., Denisov, E., Nolting, D., Muller, M.,
Viner, R., Schwartz, J., Remes, P., Belford, M., Dunyach, J. J., Cox, J.,
Horning, S., Mann, M., and Makarov, A. (2012) Ultra high resolution linear
ion trap Orbitrap mass spectrometer (Orbitrap Elite) facilitates top down
LC MS/MS and versatile peptide fragmentation modes. Mol. Cell. Pro-
teomics 11, 10.1074/mcp.O111.013698

19. Boersema, P. J., Mohammed, S., and Heck, A. J. (2009) Phosphopeptide
fragmentation and analysis by mass spectrometry. J. Mass Spectrom.
44, 861– 878

20. Kelstrup, C. D., Hekmat, O., Francavilla, C., and Olsen, J. V. (2011) Pin-
pointing phosphorylation sites: Quantitative filtering and a novel site-
specific x-ion fragment. J. Proteome Res. 10, 2937–2948

21. Michalski, A., Damoc, E., Hauschild, J. P., Lange, O., Wieghaus, A., Ma-
karov, A., Nagaraj, N., Cox, J., Mann, M., and Horning, S. (2011) Mass
spectrometry-based proteomics using Q Exactive, a high-performance
benchtop quadrupole Orbitrap mass spectrometer. Mol. Cell. Proteomics
10, 10.1074/mcp.M111.011015

Expert System for Annotation of MS/MS Spectra

Molecular & Cellular Proteomics 11.11 1509

http://www.mcponline.org/cgi/content/full/M112.020271/DC1