In Search of Computer Music Analysis: Music Information Retrieval, Optimization, and Machine Learning from 2000-2016
In Search of Computer Music Analysis: Music Information Retrieval, Optimization, and Machine Learning
from 2000-2016
Felicia Nafeeza Persaud
Thesis submitted to the
Faculty of Graduate and Postdoctoral Studies
In partial fulfillment of the requirements
For the MA degree in Music
Department of Music
Faculty of Arts
University of Ottawa
© Felicia Nafeeza Persaud, Ottawa, Canada, 2018
ii
Table of Contents
Abstract ........................................................................................................................................ vii
Acknowledgements .................................................................................................................... viii
Glossary ......................................................................................................................................... ix
Chapter 1- Introduction and Literature Review........................................................................ 1
1.1.1 General Mission Statement 2
1.1.2 A Critical Overview of Computer Music Analysis: Music Information Retrieval,
Optimization, and Machine Learning ...................................................................................... 3
1.1.3 Persaud’s Five Critical Issues ......................................................................................... 4
1.2 A Sketch of the Relationship between Computers and Music 9
1.2.1 Composition and Performance ....................................................................................... 9
1.2.2 Applications in Music Theory and Analysis ................................................................ 12
1.2.2.1 Recurrent features: Databases ................................................................................... 12
1.2.2.2 Structural Models: Analysis and Counterpoint ......................................................... 14
1.2.3 Music Information Retrieval Versus Optimization ...................................................... 15
1.3 Literature Review 17
1.3.1 David Temperley The Cognition of Basic Musical Structures (2001) ......................... 17
1.3.2 David Temperley and Christopher Bartlette “Parallelism as a Factor in Metrical
Analysis” (2002) .................................................................................................................... 19
iii
1.3.3 David Temperley Music and Probability (2007) ......................................................... 20
1.3.4 David Huron “Tone and Voice: A Derivation of the Rules of Voice-Leading from
Perceptual Principles” (2001) ................................................................................................ 21
1.3.5 Darrell Conklin and Ian H. Witten “Multiple Viewpoint Systems for Music Prediction”
(1995)..................................................................................................................................... 22
1.4 Conclusion 23
Chapter 2- Music Information Retrieval .................................................................................. 25
2.1 Introduction 25
2.1.1 MIR Overview and Applications .................................................................................. 26
2.2 The MIR Tools 30
2.2.1 Vocalsearch .................................................................................................................. 30
2.2.2 SIMSSA ........................................................................................................................ 32
2.2.3 Donnelly and Sheppard Bayesian Network Algorithm ................................................ 37
2.3 Critical Analysis 38
2.3.1 VocalSearch .................................................................................................................. 38
2.3.2 SIMSSA ........................................................................................................................ 39
2.3.3 Bayesian Networks ....................................................................................................... 41
Chapter 3-Optimization ............................................................................................................. 42
3.1 Preference Rules 45
3.1.1 Metrical Structure ......................................................................................................... 45
iv
3.1.2 Contrapuntal Structure .................................................................................................. 49
3.1.3 Tonal-Pitch Class Representation and Harmonic Structure ......................................... 51
3.1.4 Melodic Phrase Structure.............................................................................................. 53
3.1.5 Parallelism .................................................................................................................... 54
3.2 Probabilistic and Statistical models 55
3.2.1 Introduction .................................................................................................................. 55
3.2.2 David Temperley’s use of Bayesian Probability .......................................................... 57
3.2.3 Statistics and Harmonic Vectors ................................................................................... 61
3.2.4 Distinctive Patterns using Bioinformatics and Probability ........................................... 63
3.3 Critical Analysis: Optimization 65
3.3.1 Preference rules: Metrical Structure ............................................................................. 66
3.3.2 Preference Rules: Counterpoint .................................................................................... 66
3.3.3 Preference Rules: Tonal-Class representation and Harmony ....................................... 67
3.3.4 Melodic Phrase Structure and Parallelism .................................................................... 68
3.3.5 Probability and Statistics .............................................................................................. 69
Chapter 4-Machine Learning .................................................................................................... 71
4.1 Introduction to Machine Learning 71
4.2 Outline of Selected Tools 72
4.2.1 Ornamentation in Jazz Guitar ....................................................................................... 72
4.2.2 Melodic Analysis with segment classes ....................................................................... 73
v
4.2.3 Chord sequence generation with semiotic patterns ...................................................... 74
4.2.4 Analysis of analysis ...................................................................................................... 75
4.3 Summary 78
Chapter 5- Conclusion ................................................................................................................ 80
5.1 Further Temperley Research and Probability 80
5.2 Machine Learning as a means to an end 81
5.3 CompMusic as an example of Intersection 82
5.4 Five general areas for improvement in the field 83
5.5 Persaud’s Five Critical Issues with Solutions 86
Bibliography ................................................................................................................................ 89
vi
Table of Figures
Figure 1 Graphic representation of the five critical issues ............................................................................ 8
Figure 2 Graphic representation of MIR…………………………………………………………………..29
Figure 3 Graphic Representation of Optimization ..................................................................................... 44
Figure 4 Beat Hierarchy .............................................................................................................................. 46
Figure 5 Graphic of five critical issues with solutions ................................................................................ 87
vii
Abstract
My thesis aims to critically examine three methods in the current state of Computer Music
Analysis. I will concentrate on Music Information Retrieval, Optimization, and Machine
Learning. My goal is to describe and critically analyze each method, then examine the
intersection of all three. I will start by looking at David Temperley’s The Cognition of Basic
Musical Structures (2001) which offers an outline of major accomplishments before the turn of
the 21st century. This outline will provide a method of organization for a large portion of the
thesis. I will conclude by explaining the most recent developments in terms of the three methods
cited. Following trends in these developments, I can hypothesize the direction of the field.
viii
Acknowledgements
I have appreciated all the help I have had in this thesis writing process. From professors,
to friends, to family, everyone deserves a thank you.
Firstly, I must thank my thesis supervisor Dr. P. Murray Dineen who has guided me
throughout this process. His feedback and support has helped me immensely to improve as a
writer. I am grateful that Dr. Dineen has helped me to gain invaluable skills over the last two
years in my Master of Arts.
I would also like to thank my committee members, Dr. Roxanne Prevost and Dr. Jada
Watson, who have provided amazing feedback and discussion. They have helped me greatly in
creating the final thesis. I would like to thank Dr. Julie Pedneault-Deslauriers as well for serving
as a member of the committee for the thesis proposal.
I am grateful to the rest of my professors and colleagues at the University of Ottawa for
everything I have learned at the University of Ottawa. It has helped to guide me in creating this
thesis and has helped me improve myself.
My friends and family also deserve a thank you for going through sections and drafts
throughout this process. A special thank you to my dad, sister and fiancé who went through my
first draft. It has come a long way since then.
ix
Glossary
Algorithm: a set of steps followed in calculations or problem-solving operations to achieve
some end result.
Computer Music Analysis: analysis of music using a computing software or algorithms. This is
a ‘catch all’ term referring to all of the smaller aspects using computers for music analysis
including, Music Information Retrieval (MIR), Optimization, and Machine Learning. According
to a 2016 book, entitled Computational Music Analysis, by David Meredith, a general definition
is “using mathematics and computing to advance our understanding of music […] and how
music is understood.” (Meredith 2016)
Machine Learning: teaching of a computer to analyse and find features, so as to gain
knowledge of musical conventions. Machine learning is a route that is parallel to MIR,
Preference Rule Systems (PRSs), and Probabilistic models. Like a human learning, “a computer
learns to perform a task by studying a training set of examples.” (Louridas and Ebert 2016)
Following this, a different example is given, and the effectiveness is measured in several ways
depending on the task.
Music Information retrieval (MIR): research concerned with making all aspects of a music file
(melody, instrumentation, form etc.) searchable. MIR will eventually lead to a search engine for
music.
Optimization: a term used in calculus or business that refers to maximizing use of space or
resources. Resources are still important in the musical sense, but they refer to time and energy.
This is done through accessibility and more efficient computer tools and algorithms. Examples
given below to show that it is possible to optimize analysis by integrating more mathematics and
computer tools.
Piano-roll input: a graphic representation of a score with notes on the vertical axis and timing in
millisecond s on the horizontal.
Preference rule system (PRS): a set of instructions for a computer in a hierarchy. These can be
created as a system where there are multiple sets with a hierarchy within “criteria for evaluating
possible analysis of a piece.” (Preface Temperley and Bartlette 2002) This is known as a rule-
based grammar in Manning et al 2001.
Parallelism rule (as a type of preference rule): the idea that the similar construction of a
musical element be regarded as important in a PRS.” Prefer beat intervals of a certain distance to
the extent that repetition occurs at that distance in the vicinity.” (Temperley and Bartlette 2002,
134)
Probabilistic Methods: a method of analysis based in probability. The word “Probabilistic”
means for an idea to be based on or adapted to a theory of probability, this term encompasses
even distant uses of probability in computer models. This is a term used by Temperley referring
to a computational method that uses probability.
Chapter 1- Introduction and Literature Review
1.1 Overview
My interest in Computer Music Analysis stems from my fascination with
interdisciplinarity in music analysis. Computer Music Analysis intersects with mathematics,
computer science, psychology, and, of course, music. My thesis will take a small sampling of
interdisciplinary tools in Computer Music Analysis from Music Information Retrieval (MIR),
Optimization, and Machine Learning. MIR aims to make music searchable, primarily through
online databases. Optimization encompasses many different tools with the eventual goal to
understand human perception of music. Machine Learning, on the other hand, teaches the
machine, often a computer, to perform a task, making the tool itself the end goal.
For this thesis, I preface my work with Peter Manning’s entry, entitled “Computers and
Music,” in the Grove Dictionary of Music and Musicians as a way to understand the existing
conventions and uses of computers in music prior to the year 2000. Manning does not offer a
specific definition, but instead discusses the common uses and devices of the computer as it
relates to music. He states, “Computers have been used for all manner of applications, from the
synthesis of new sounds and the analysis of music in notated form to desktop music publishing
and studies in music psychology; from analysing the ways in which we respond to musical
stimuli to the processes of music performance itself.” (Manning et al 2001). This quote
exemplifies how interdisciplinary Computer Music Analysis is. Manning’s work touches on
composition, performance, and analysis addressing a key critical issue: human error. A computer
is only useful because of its human programmer no matter what the application. With every new
application of the computer—or tool—there are more issues and limitations. For example, a tool
1
2
that identifies duple metrical structures cannot identify compound meter and has a margin of
error. The idea of the human creation of a computer model, and its limitations, is the focus of my
thesis and is explored in three branches of Computer Music Analysis: Music Information
Retrieval (MIR), Optimization, and Machine Learning. Manning’s entry coupled with the
Literature Review provide a foundation on which I build this thesis.
1.1.1 General Mission Statement
This thesis aims to critically examine specific tools in Music Information Retrieval
(MIR), Optimization –a term referring to improvements in Preference Rule Systems and
Probabilistic Models– and Machine Learning individually. The exploration of MIR,
Optimization, and Machine Learning will do two things: act as a survey of the literature and
show trends within these subfields. In the conclusion, I show how the three aspects can interact.
Most branches in Computer Music Analysis run in parallel (Meredith 2016), and few researchers
take inspiration from the parallel branches. It is not my intent to show that there is no interaction,
but merely to show opportunities for more interaction.
To survey the literature, I first look at the developments—prior to the turn of the 21st
century, the period when the field of Computer Music Analysis was born. The background
comes primarily from David Temperley’s book the Cognition of Basic Musical Structures (2001)
as well as from works covered in the Literature Review and the Sketch of the Computer-Music
Relationship sections. To explore current trends, I restrict myself primarily to the literature from
2000 to 2016. These texts build from the turn of the century and show how researchers utilize
new technology to push the field further. This area constitutes the body of the thesis and shows
3
where the field has gone and where it is going. Additionally, using a critical examination of the
literature, I explore recent trends in Computer Music Analysis and offer points of entry for new
research. I use models drawn from World Music Research. I concentrate on the three areas of the
field, as mentioned above. This research can be applied to other similar areas like Mathematical
Music Theory, which represents basic musical structures in a mathematical form, or
Computational Musicology, which investigates the simulation of computer models in music.
1.1.2 A Critical Overview of Computer Music Analysis: Music Information Retrieval,
Optimization, and Machine Learning
The current state of the field in Computer Music Analysis sees a shifting of positions
among the three areas: Music Information Retrieval (MIR), Machine Learning, and
Optimization. Music Information Retrieval is the most rapidly evolving field of the three; due in
large part to developments in and the spread of computers and the Internet – specifically an
increase in computing capacity. The second field is Machine Learning; this is similarly due to
computing capacity and the Internet, but also because of its widespread use in other disciplines,
which music researchers are drawing from at greater and greater lengths. The third field is
Optimization, which has stagnated. However, Optimization borrows from other disciplines, and
contributes to the advances made by MIR and Machine Learning. As such, we can see that
Optimization is currently evolving, even if other two fields are moving at a much greater pace.
To sketch in greater detail, there are crucial differences and overlapping areas between
the three fields that explain their current situations. Machine Learning is a precise endeavor that
aims to create specific tools to meet well-defined goals or serve finite tasks. MIR, on the other
hand, works with large bodies of data and serves goals that are often ill-defined if not undefined.
4
Conversely, Optimization is presently in a state of coming together– in fields other than music –
and, therefore, would appear not to be advancing as quickly. But, in fact, Optimization in its
current state is laying a framework for major developments.
Though there is overlap between MIR, Optimization, and Machine Learning, it is limited
to a few researchers and projects. Examples include the following: Darrell Conklin using
probability and bioinformatics in conjunction with Machine Learning; Giraud et al, who are
creating a tool for MIR and Optimization; and, most notably, CompMusic—a database for six
subsets of World Music—that uses both Optimization and Machine Learning to create an MIR
database. These will be discussed in the later parts of the thesis.
1.1.3 Persaud’s Five Critical Issues
From the critical perspective adopted in this thesis, several issues arise. Some of which
have been addressed in the literature surveyed. Unfortunately, they have not been brought
together in such a fashion to yield an overall critical perspective of the current field. To this end,
I have isolated five central critical issues, which I address here. During the remainder of the
thesis, I make reference to these from time to time, by means of a numbered list set out below
and in Fig. 1. I refer to these as Persaud Critical Issues, since, to my knowledge they have not
been catalogued in this fashion.
Persaud’s Critical Issue 1. Human Error.
Firstly, data entry is still largely human-dependent and with large amounts of data—like
with an MIR database—a person will often make mistakes. This was discussed by both Peter
Manning in his definition and David Huron about The Humdrum Toolkit. As Huron and Manning
5
explain, the machine is limited by the programmers themselves. Outside of research, artificial
intelligence (AI) is being used to complete simple tasks and can learn, by itself, various other
tasks. Similarly, quantum computers are becoming more common instead of using simple binary
code. Both of these devices are making their way into day-to-day life and eventually will end up
in multidisciplinary research. In terms of what is being used currently, data entry could be
improved by the application of Machine Learning. Certain parameters could be handled by
machine input rather than human input. These advances are being made elsewhere but have not
been seen in the area of music research, except in world music database creation [see conclusion
of the thesis]. We need to see more inroads made by Machine Learning in the analysis of
Western music and Ancient music.
Human limitations are not only evident in data entry but also in setting parameters, in
annotations, and in the creation of algorithms in general (Huron 1988). Setting parameters is a
vital aspect of Optimization. It enables the most accurate analysis of the data provided and,
therefore, generate more accurate outcomes. Because the parameters are calibrated by humans,
there is an implicit limitation. This similar to the annotation of pieces in MIR databases and the
creation and application of an algorithm in Machine Learning.
Persaud’s Critical Issue 2. Input Specification
Input modes are not well-defined by researchers to be easily understood. To a certain
extent, this is a problem of writing and communication, one that arises from research silos. This
could be resolved by creating common standards and modes of discourse for describing
computer research in music, and specifically the modes of input involved. Complementary to
input specification due to research silos, input modes change from generic type to type. For
6
example, popular music is not often scored, while ancient music is not performed in its original
form. As such, the input for popular music would most likely be an audio file, while for ancient
music, an image of a score is more likely. Furthermore, the input could differ from a full form,
such as all tracks on a song, to a simpler form, such as main melody only. This further
complicates the situation.
In addition to genre, input modes depend upon translation into computer compatible
formats. Though an MP3 audio format is widely available, it is not easily readable for analytical
us. As a work around, researchers use either a MIDI format, or the input is further broken down
into tracks. In the study of ancient music, image data cannot be read by a computer and must
endure multiple passes of analysis using computer-based algorithms and processes, but this
method still yields errors.
Persaud’s Critical Issue 3. No Consistent Mode of Evaluation for Non-MIR Tools
Music Information Retrieval Evaluation eXchange (MIREX), is a method of formally
evaluating MIR systems and algorithms. This does not exist for other branches of Computer
Music Analysis like Optimization and Machine Learning. These unknown standards for
algorithms and tools result in an end-product that may not have any further use beyond its
creation. Furthermore, without a widespread knowledge of the tools and algorithms, they cannot
be used for MIR or other branches of Computer Music Analysis simply due to unknowingness.
Persaud’s Critical Issue 4. The Interdisciplinary Problem (Downie 2003)
The Interdisciplinary Problem is one that is examined and discussed by Stephen J.
Downie in his article “Music Information Retrieval.” Though this is an issue in MIR specifically,
7
it extends to other branches of Computer Music Analysis such as Optimization and Machine
Learning. It simply refers to the lack of coordination between researchers and research fields
when it comes to creating a tool and the different uses of the same terminology. Some tools and
systems are made overly difficult for someone without programming knowledge, even though
the outcomes of the tool would be useful to them.
Persaud’s Critical Issue 5. “What’s the point?” Lack of Defined Goals and Frameworks
Research in Computer Music Analysis often comes as small creations and discoveries
rather than a large finished tool. As Computer Music Analysis often concentrates on the method
to an output, these smaller steps cannot be used by another researcher until it is completed.
Furthermore, the specific usage of the individual step is unknown or has very few applications, if
any, so the “What’s the point?” argument returns. This argument also does not take into account
the full potential of each field and is created by a lack of understanding for the goals of each
branch in Computer Music Analysis.
8
Figure 1 Graphic representation of the five critical issues
Critical
issues
1. Human Error
-data Entry
-human limitations
2. Input Specification
-undefined
-generic change
-computer compatible
5. "What's the point?"
-undefined goals and
framework
4. The
Interdisciplinary
Problem
-lack of coordination
-terms used differently
3. Consistent Evaluative
principles
-other than for MIR
9
1.2 A Sketch of the Relationship between Computers and Music
1.2.1 Composition and Performance
Music and computers have a lengthy history that touches on three fields: composition,
performance, and music research. To understand the current state of Computer Music Analysis,
the history needs to be discussed. In fundamental terms, the above-mentioned disciplines helped
shape Computer Music Analysis
In terms of composition, computer music was one of the principal areas of early research.
One main source for understanding this research was the Computer Music Journal, founded in
1977. This journal examines crossroads between computers and music such as composition with
computers, MIDI, synthesizer theory, and analytical models using the computer (Computer
Music Journal). Though the material is broad, there have been specific issues that address
analytical models included in this thesis. This publication includes articles about CompMusic—
an organization committed to database creation for World Music—, which I will return to in my
conclusion. The publication also includes Donnelly and Sheppard’s “Classification of Timbre
Using Bayesian Networks” which is one of the few instances of cross-branch research.
While the original inroads made into computer music composition were slow and
burdened by clumsy and awkward hardware, this situation soon changed. Curtis Roads is a
composer of electronic music and an author. His 1985 book, Composers and the Computer, is
interview-based to get the composer’s perspective. According to Appleton’s review, Roads’s
main point is that arts and science are becoming closer to create new music (Appleton 1986).
Furthermore, Appleton explains the importance of understanding the means in music creation
and the method of computer usage is vital for listening to computer music compositions. “If […]
10
the principles of serial technique are necessary to an intelligent hearing of the works of Webern,
Carter, Babbitt, or Boulez, then surely an appreciation of the principles of algorithmic
compositional techniques and the possibilities of digital sound synthesis are required for the
through audition of works by Xenakis, Chowning, Risset, and Dodge (Appleton 1986, 124).”
This quote situates the importance of method in music and how the new computer capabilities
enhance the composition process.
In 1986, a symposium on computer music composition was held and a review was
written in the Computer Music Journal. This symposium was a “product of a questionnaire sent
in 1982,1983, and 1984, to over 30 composers experienced in the computer medium” (Roads et
al, 40). The review examines, in a similar manner to Roads’ book, what brought the composer to
the computer and how they choose to use it. The review states that “articles in Computer Music
Journal and other publications point to the broad application of computers in musical tasks,
especially to sound synthesis, live performance, and algorithmic or procedural composition”
(Roads et al 1986, 40).
Music Representation Languages (MRLs) are another important milestone in the history
of Computer Music Analysis. An MRL is a type of format that the computer can understand
(Downie 2003), and these are vital to composition. An example of this is Musical Instrument
Digital Interface commonly known as MIDI. MIDI revolutionized sound processing by enabling
the user to store real input, such as playing on a synthesizer, into movable and changeable blocks
of sound easily understood by the computer. It has two-way variability because there is a
disparity from the player of an external synthesizer and the producer can move and change the
blocks of sound after the player has played (Manning et al 2001). It provides more control to all
parties for its end result and MIDI is now widely used.
11
Another significant creation in computer music composition is music notation software.
This software, like Finale, often include a MIDI playback. According to Manning “it quickly
became apparent that major composition and performance possibilities could be opened up by
extending MIDI control facilities to personal computers” (Manning et al 2001, 169). This new
MIDI playback on music notation software gave the composer the ability to create music
digitally with the option to hear what it would sound like.
Computer music composition, of course, continues today. Recent developments include
ChucK, a programming language specifically for music and is prevalent for laptop orchestra use
(Wang et al 2015), and melodic idea generation and evaluation—which is the creation of a
motive and the assessment of it (Ponce de Leon et al 2016). Both tools are used for the creation
of musical ideas. ChucK, for example, can create a complete piece in real time. Though
computer music composition is important to the relationship between computers and music, it
will not be further discussed in this thesis. The field of Computer Music Analysis has moved
away sufficiently to be treated as a separate endeavour, at this point.
It should be noted that composition with computers is only one aspect of computer assisted
musical creation. According to Manning’s “Computers and Music”, the uses of computers in
music can be separated into two branches: performance and music theory. For performance,
MIDI is highlighted as a major development, but more performer-like methods are being
developed such as DARMS (Manning et al 2001). DARMS is a “comprehensive coding system
[…] which has the capacity to handle almost every conceivable score detail” (Manning et al
2001, 176). For current performances, Laptop Orchestra is becoming more prevalent at
universities. Though computer use in performance is important, I will not be concentrating on it.
12
1.2.2 Applications in Music Theory and Analysis
Music research uses for computers are more complex and have been based around two facets:
1. The first is identification of recurrent features. Recurrent features are an important aspect
of analysis as it can show that a set of items is a pattern rather than a coincidence. “One
of the earliest uses of the computer as a tool for analysis […] involves the identification
of recurrent features that can usually be subjected to statistical analysis.” (Manning et al
2001, 174). Statistical analysis further strengthens a pattern by utilizing quantitative
measures. Statistical analysis is still present today and will be discussed in Chapter 3.
2. The second concerns the application of two kinds of “Rule-based analysis.” Analysis
used for generative purposes and analysis used in and of itself or as an analytic method.
As Manning describes rule-based analysis in general: “rule-based analysis methods
presuppose that the processes if composition are bound by underlying structural
principles that can be described in algorithmic terms. […] At this level it becomes
possible to establish links with computer-based research into musical meaning” (Manning
et al, 174).
Now I will present examples of both facets. Both show, in a simple fashion, the above two ideas
and, also, demonstrate the main sources of error and limitation in Computer Music Analysis.
1.2.2.1 Recurrent features: Databases
A major database software for computer music research was the Humdrum Toolkit
created by David Huron and its files finished revision in 2001. Huron is based at the Ohio State
University School of Music and commonly researches Music Cognition, Computational
13
Musicology, and Systematic musicology. The Humdrum Toolkit runs using UNIX software
tools, but it is compatible with previous versions of Windows and Mac platforms. This database
gives the public access to information on scores, and renotes scores in a format that is useable
with the Humdrum Toolkit. It is also possible to import or export files from Finale software for
scores that are not available in the database. “Humdrum” itself is composed of the Humdrum
Syntax and Humdrum Toolkit. The syntax, like other programming language, enables the user to
search for files and other elements using the Humdrum Toolkit. This programming language,
however, must be learned to adequately use the software.
The Humdrum Toolkit is used for recurrent features because of its capabilities. The
capabilities of Humdrum include searching between sets of pieces for motives, syncopation,
harmonic progression, dynamics, pitch, and meter. These elements of music can be searched by
genre, by composer, and by any other grouping for an overarching and statistical analysis,
therefore, this use for computers in music aligns with Manning’s definition in Grove. However,
some of the above-mentioned elements are more easily found using the Humdrum Toolkit
software than others. Firstly, this is due to “the interdisciplinary problem” since some queries
need a complex search using programming language. Programming knowledge is something that
is not consistent between all database users. Secondly, human error is always a possibility with a
completely manmade database. Like all tools, this one is imperfect. Huron found three reasons
for mistakes when using computers because of “Humdrum” (Huron 1988). They are as follows:
1. Errors in actual score
2. Errors in transcription of score
3. Errors by program
14
These errors according to Huron, are human. 1
1.2.2.2 Structural Models: Analysis and Counterpoint
P. Howard Patrick in 1978 used computers for analysis of suspensions in the Masses of
Josquin Des Prez. Patrick made an important distinction between music theory for the
composition student and music theory for the computer rule-based structural model: music
theory is often a description, but a computer needs a set of steps to follow. To get the computer
to properly parse and identify the data, Patrick looked at the errors and changed criteria as
needed. (Alphonce 1988)
Arthur Mendel inspired Patrick’s study in a seminar by looking for the criteria of
structure in Josquin’s work. Patrick outlined the goal of this project as getting computer
programs to print a reduction of a score by, first, going through a succession of tests and then
finding the “most consonant pitch” (Patrick 1974, 325). Patrick tested three randomly selected
texts to outline the problems that he described as “Non-Suspensions (Patrick 1974,326)” and
“Problem Suspensions.” (Patrick 1974, 328) These errors were due to the computer’s now
‘preconceived’ notion of what a suspension is, but the largest error, as explained by Patrick, are
the questions that people ask the computer.
Criticism for this type of analysis is that it only yields a result that can be found by a
person doing the research by hand and thus is susceptible to the same kinds of errors humans
might make. As stated by Patrick, “The limitations of the computer are overshadowed by the
inherent limitations of the user.” (Patrick 1974,321) This means that the computer can find any
1 These sources of error are paraphrased from Huron 1988, 254
15
solution, but only if it can be fathomed by the user. Some larger scale problems are too difficult
to solve without help from another source, such as a computer. In this sense, Patrick thought the
computer-aided analysis route was the most useful. This set the groundwork for development in
Computer Music Analysis that do not mimic “research by hand.”
1.2.3 Music Information Retrieval Versus Optimization
Music Information Retrieval (MIR) is interdisciplinary, due to its computer-based
information, and originated from the same point as Optimization. But, the two fields have
different goals.
By Music Information Retrieval, I mean the sector of Computer Music Analysis that aims to
create a database, either analytical or non-analytical, drawn from characteristics of a musical
document such as a score, so as to further research. MIR aims to look into musical documents to
find features or commonalities between different works of music. MIR approaches recurrent
features by creating a database with annotations, or another searchable method, so a user can
search for a specific feature.
Optimization, which concerns itself with preference rules, probability, and statistical models,
does not detach itself from the human experience. The following quotation demonstrates the
distinctiveness of Optimization for MIR: “Computational research in music cognition tends to
focus on models of the human mind, whereas MIR prefers the best‐performing models regardless
of their cognitive plausibility” (Burgoyne et al 2016, 214). In summary, Optimization is tied to
music cognition (Burgoyne et al 2016) while MIR is not.
MIR has turned into an ever-growing and prevalent field due to the internet (Fujinaga and
Weiss) and is present in commonly used items like Google Books (Helsen et al 2014), but it
16
originally came from a small field of research in comparison. According to Burgoyne et al, in
1907, C.S. Myers studied Western folksong using MIR, which required tabulation done by hand
examining the intervals present in folksongs. Similarly, in ethnomusicology a year earlier, 1906,
a similar method was used to find features in Non-Western music to differentiate it from Western
music (Burgoyne et al 2016). The practice of “Finding Features” has become a standard use for
Computer Music Analysis. These are the earliest examples of Music Information Retrieval even
though the term itself was not used until the 1960s.
From 1907 to the 1960s Music Information Retrieval was ignored, but, “interest grew in
computerized analysis of music” (Burgoyne et al, 215) because of the prevalence and
accessibility of computers. The beginning of MIR concentrated on methods to input music into
the computer (Burgoyne et al) such as notational software or standardized audio file formats like
MP3 and MIDI (Fujinaga and Weiss). This made it possible for the computer to ‘understand’ the
musical items. These methods grew into more complex software applications like Humdrum
which was discussed in section 1.2.2.1
This history of MIR is written in brief, however it gives a basic outline of its developments
that is important to the thesis. Since, this field re-emgered because of the internet and increasing
availability of computers, the tabulations could be done using a software instead of by hand.
After creating a form of music that can be understood by a computer, databases, like Humdrum,
were more easily produced. Creating a database of music recognizable by a computer, according
to Andrew Hankinson—a Digital Humanities and Medieval Music researcher—, is the first step
in a large retrieval system (Helsen et al 2014). Large databases of different varieties will be
further discussed Chapter 2.
17
1.3 Literature Review
I aim to explore the major works I use for this thesis in the literature review. The order is
to mirror the order of the thesis: first Optimization then Machine Learning. MIR has a more
complicated Literature base, so I discuss it in Chapter 2. I commence with David Temperley’s
works in chronological order because I incorporate their organization tools and major ideas into
Chapter 3. Parallelism is highlighted because it grows from a single- line preference rule to a
multi-level set of ideas. Since perception is key to Optimization, I include David Huron for the
link from computers to perception. Huron’s paper examines voice-leading rules, which are
common knowledge and vital to music theorists, thus act as a stable starting point. The final
work is Darrell Conklin and Ian Whitten’s paper investigating the multiple-viewpoint system.
This article is one of the first that examine Machine Learning in music and should, therefore, be
included.
1.3.1 David Temperley The Cognition of Basic Musical Structures (2001)
David Temperley is centred at the Eastman School of Music and writes extensively on
music theory and music cognition. I will concentrate on specific sections of his book The
Cognition of Basic Musical Structures (2001), that explain Preference Rule Systems or
Computational models. Temperley outlines the following six Preference Rule Systems in the first
half of the book, Metrical Structure, Melodic Phrase Structure, Contrapuntal Structure, Tonal-
Pitch-Class Representation, Harmonic Structure, and Key Structure, and the second half explores
the expectation of the listener, Rock Music, African music, composition, and recomposition. The
first half of the book is where I will concentrate this review. Temperley states that the goal of the
18
book is to explore the “’infrastructural’ levels of music,” meaning the basic building blocks of
music perception, because there is very little research on the subject.
Before presenting the Preference Rule System (PRS), Temperley outlines previous
research on musical structure as it relates to each section. For example, Temperley describes at
length the Desain and Honing model for beat induction in the chapter on Metrical Structure. The
specificities of each section is discussed in Chapter 3 of this thesis . He notes that each PRS is
based on a piano-roll input for the computer. The PRS itself is a group of rules the computer
follows to narrow a set of possible choices. Within each rule there is a preference—hence the
name preference rule. The end choice is selected because more rules are preferred in a specific
hierarchy.
After presenting Preference Rule Systems, Temperley describes the tests he goes through
to ensure well-functioning systems. Meter, unlike the others, has had plenty of research
concerning theoretical and computational models. Temperley builds upon the Lerdahl and
Jackendoff Generative Theory of Tonal Music (1983) by adapting it for a preference rule
approach. The meter section takes the Well Formedness definition from Lerdahl and Jackendoff
where grouping and hierarchy are most important and Temperley explains it as “every event
onset must be marked by a beat [and] that a beat at one level must be at all lower levels”
(Temperley 2001,30). This is used in all successive PRSs. Similarly, for Key Structure there is
sufficient research from music cognition and computational methods to improve upon.
Temperley uses the Krumhansl-Schumckler Key-Finding Algorithm and discusses problems and
solutions.
The other four PRSs take a list of rules and within each have a list of preferences in a
specific order, so the computer knows which item is the most important or most common. For
19
example, the Phrase Structure Preference Rules (Temperley 2001 Melodic Phrase Structure
Chapter pp. 68-70) comprise of three rules.
1. Gap Rule: Prefer to Locate phrase boundaries at
a. Large inter-onset intervals and
b. Large offset-to-onset intervals
2. Phrase Length Rule: Prefer phrases to have roughly 8 notes
3. Metrical Parallelism Rule: Prefer to begin successive groups at parallel points in the
metrical structure
This is for only well-formed, by the previously mentioned definition, monophonic melodies. For
implementation of each of these rules, a formula, score or other quantification is applied. The
best “score” is the best analysis for a melody.
Temperley’s Preference Rule Systems gives me multiple examples of how the computer
evaluated different problems which I can then relate to other models for evaluation. In this
regard, Temperley’s 2001 book acts as a springboard for my thesis. It gives important
background information in Computer Music Analysis and shows me how Temperley’s
subsequent work has built upon it. The book will be further discussed in Chapter 3:
Optimization.
1.3.2 David Temperley and Christopher Bartlette “Parallelism as a Factor in Metrical Analysis”
(2002)
This text builds upon the previous Temperley book by adding further information to the
“Metrical Parallelism Rule.” (Temperley 2001,70). The “well-formedness rule,” as mentioned in
Temperley 2001, still applies in this article, as does the need for monophony. The goal of this
20
article is to build upon the book for clarity, accuracy and precision when dealing with
Parallelism.
Temperley and Bartlette examine the effect of Parallelism and realized that the definition
must be modified. Parallelism is defined as a repetition either of the exact sequence or the
contour. The Parallelism Rule is now redefined to “prefer beat intervals of a certain distance to
the extent that repetition occurs at that distance in the vicinity.” (Temperley and Bartlette 2002,
134) This is useful to the thesis because it gives a more inclusive definition to Paralellism as a
term and a rule and, also, because of the influence it had on the later treatment of parallelism.
1.3.3 David Temperley Music and Probability (2007)
Though Temperley was content with the 2001 book, it seemed like more should be added
to the approach because preference rule models could not be applied to “linguistics or vision”
(Temperley 2007, ix). The goal of the 2007 book is to use specific Bayesian probability tool, as a
link between perception and style. In the perception of linguistics and vision, Bayesian
probability techniques such as probability of an event following another are more common in
computer analytic tools. To quote Temperley, “I realized that Bayesian models provided the
answer to my problems with preference rule models. In fact, preference rule models were very
similar to Bayesian models” (Temperley 2007,x) meaning that the existing PRSs can be easily
turned into Bayesian models.
The book shows a new trend in Computer Music research: probability. It uses the Essen
Corpus, also known as the Essen Folksong Collection, 2 to test for the central distribution of the
2 The Essen Folksong collection is a set of folksongs from Germany, China, France, Russia and more collected by
Helmut Schaffrath. http://essen.themefinder.org/
21
aspects of music (and relies on a method of representation created by Lerdahl and Jackendoff in
1983, which, by this point, was familiar to music theorist). The book itself touches on Rhythm,
Pitch, Key, Style, Composition, and, like the first computer music analytic tools, error detection
in its main chapters.
1.3.4 David Huron “Tone and Voice: A Derivation of the Rules of Voice-Leading from
Perceptual Principles” (2001)
I have included this work in the literature review because we must remember that all
computer models tie back to perception, in some way, to be correct. It should be noted that
Huron’s text was also referenced in Temperley’s work because the psychological principles
behind musical aspects make computational modelling difficult.
Huron’s 2001 work shows the relationship between voice-leading and auditory
perception using perception. The article presents a set of the voice-leading rules, then derives
them from the perception principles, and finally it makes ties to genre. Each voice leading rules
is scrutinized under three questions:
1. What goal is served by the following rule?
2. Is the goal worthwhile?
3. Is the rule an effective way of achieving the purported goal? (Huron 2001, 1)
Huron brings up the important concept of culture. With analysis, it remains unknown if
these principles of auditory perception are inherent in all people or if they are created by
cultures. However, Huron notes that “perceptual principles can be used to account for a number
of aspects of musical organization, at least with respect to Western music” (Huron 2001,1) and
22
concludes that six principles in perception account for most voice leading rules in Western
Music.
Another important aspect brought up is the compositional goals because the composer
plays with the perception of the listener. For example, Huron mentions “Bach gradually changes
his compositional strategy. For works employing just two parts, Bach endeavors to keep the parts
active (few rests of short duration) and to boost the textural density through pseudo-polyphonic
writing. For works having four or more nominal voices, Bach reverses this strategy” (Huron
2001, 47). This deceives the listener because a four-voice work may sound more sparse while a
two-voice work sounds more dense making these voice-leading rules more like compositional
options.
1.3.5 Darrell Conklin and Ian H. Witten “Multiple Viewpoint Systems for Music Prediction”
(1995)
Darrell Conklin concentrates on research in Machine Learning and Music at the
University of Basque Country in Spain. This article has been cited in Temperley’s works such as
The Cognition of Basic Musical Structures (2001). The paper takes an “empirical induction
approach to generative theory” (Conklin and Whitten 1995, 52) by exploring previous
compositions for style and patterns. More specifically, this article uses Bach Chorale as a starting
point for choral music.
Conklin and Whitten describe Machine Learning, applied to music research, as follows:
“Machine learning is concerned with improving performance as a specific task. Here the task is
music prediction” (Conklin and Whitten 1995, 55). Since much of Machine Learning uses
context models, but that requires exact matches. Music does not always use exact matches
23
because similarity is enough for auditory perception, Conklin and Witten take a multiple-
viewpoint system. Each viewpoint is an aspect of music, to derive musical ideas that take style
into account.
Conklin and Whitten describe the next steps in this field as:
1. Research on prediction and entropy of music
2. The creation of “a general-purpose machine learning tool for music” (Conklin and
Whitten 1995,71) for all musical genres
Their work adds to the thesis by providing the beginning of Machine Learning. From this, the
rest of the accomplishments in Machine Learning and music can be put into perspective.
1.4 Conclusion
In the introductory chapter of this thesis, I have described my goal: to critically examine
aspects of Music Information Retrieval (MIR), Optimization, and Machine Learning. Between
MIR and Optimization there is a common starting point, but they differ in goal. MIR aims to
create a database or multiple databases for further analysis while Optimization uses a computer
model to understand the human perception of a musical structure. Machine Learning is different
than the other two since it concentrates on the creation of a tool and not necessarily the uses.
I have surveyed specific literature in the field of computer music analysis for a
background and inroad to the research from 2000 to 2016. For a historical context, I have
brought in Manning’s multi-faceted explanation of the relationship between computers and
music. This mentions composition, performance, and analysis and displays the many important
developments prior to the turn of the century. The developments include Music Representation
Languages (MRLs)—like MIDI—and notation software because they created a widespread
24
usage. This literature touches on MIR, Optimization, and Machine Learning and, also, exposes
some critical issues in Computer Musical Analysis.
I have set out a list of five critical issues, that I use to gain critical perspective on the
field. The first issue is Human Error which refers to human limitations and the capacity to make
mistakes. This was brought up by both Peter Manning and David Huron. Second is input
specification, which is a recurring issue since articles do not specify what input is used for a tool.
The input is largely genre-based due to availability. Consistent Evaluative Principles are needed
for all branches of Computer Music Analysis, so that there is a reliable set of algorithms and
methods to be drawn upon. The Interdisciplinary Problem is an issue with term usage and level
differences in tools creation and is evident through all of the authors in the literature review. This
is because each author uses their own set of terms based on their usual field of research. “What’s
the Point?” refers to the lack of reason for a specific tool because, for a branch like Optimization,
the tools are working towards understanding human perception. This means a specific tool may
not have a specific usage at its inception. Using this chapter as a basis, I begin my analysis of
specific tools in each of the three subfields starting with Music Information Retrieval.
25
Chapter 2- Music Information Retrieval
2.1 Introduction
Music Information Retrieval (MIR) is a subsection of Computer Music Analysis that is
growing exponentially because of current technology. MIR is concerned with examining music,
either by locating or by analysing, and often aims to make music searchable. The locating branch
is often aimed at examining the metadata of a large set of works. The analysis/production branch
concerns itself with a smaller number of pieces but goes into much greater detail (Downie 2003)
as is stated by Downie: “Analytic/Production systems usually contain the most complete
representation of music information” (Downie 2003, 308). Databases created for MIR can be
accessible through the internet, so they are used by all researchers if they have the background
knowledge needed.
The goal of this chapter is to begin a critical comparison of tools and problem-solving
methods in MIR. This will be accomplished by discussing three projects: a large completed tool,
a large tool in progress, and a small tool. These tools are just the “tip of the iceberg” when it
comes to MIR, but they have been chosen to show different stages within the evolution of a tool.
The large completed tool is VocalSearch where song lyrics can be searched to identify their
presence in a song. The in-progress tool is a research project called the Single Interface for
Music Score Searching and Analysis (SIMSSA). The small milestone studied here is Patrick
Donnelly and John Sheppard’s approach to timbre identification using probability. In fact,
Donnelly and Sheppard’s project provide a solution to a specific problem which in turn can
provide help to a larger database. This final milestone will show how smaller projects in
Computer Music Analysis can help solve larger problems and thus help move the field forward.
26
2.1.1 MIR Overview and Applications
The purpose of this section is to give a description of major terms in Music Information
Retrieval (MIR) and to show the different systems at work in MIR. I will not be going in depth
about all systems, but I would like to show the complexity of MIR. I will first explain the two
main types of MIR systems: locating and analytic/production. Then I will outline the different
types of data. I will then explain how the different types of musical information fit into each of
the data categories and systems.
MIR examines multiple facts of music information in many different forms. According to J.
Stephen Downie— the creator of MIREX and specializing in information sciences at the
University of Illinois—there are two different types of MIR systems: locating and
analytic/production (Downie 2003) as mentioned in the introduction.
The locating systems are used by people searching for music either as a consumer on a
website or as a researcher in a recordings database. A locating system looks at many works, but
does not go in depth, and often locates information on the title, composer, performer etc. This
type of information is called metadata. An analytic/production system generally looks at a small
number of works, but in much greater detail. These systems, for example, can look at audio
recordings, pictures of scores, and/or symbolic forms of scores. (I will not go into detail about
specific systems at this point since they will be discussed later in the chapter.)
The different types of possible data in music, as mentioned above, are metadata, audio,
symbolic, and image. Metadata is simply data about data, so, in music, this is information about
the performers or pieces performed, such as title, composer, etc. Audio data is a recording. Most
commonly, MP3 files are used for audio data because they are easily read by computers and this
27
is often the data used for popular music. In certain regards, images and symbolic forms are
similar; image data refers to images of scores, while symbolic data is a format that a computer
can understand, such as a score notated in Finale or some other notation file. These different
types of data have specific limits and uses. For example, metadata, which was explained above,
is used in all search engines that look through bibliographic data. Audio on the other hand is not
as easy to search but is very easy to obtain in standard MP3 format.
According to Burgoyne et al in Chapter 15 of the 2016, A New Companion to Digital
Humanities, audio data is difficult for feature extraction—when a user aims to identify a
particular query—because it comes in the form of large files. Historically “query-by-humming”
(Burgoyne et al 2016) has been a popular MIR for feature extraction if it has been properly
annotated. For query-by-humming, a user hums a tune in a microphone and the tune is matched
with a piece. This, however, is by no means a complete picture of what audio can be used for. If
an audio recording could be transferred to symbolic data, it would be more useful to MIR
(Burgoyne et al 2016).
Symbolic Data, often is in the form MIDI or a readable score format, is easily recognizable
by a computer and is used for information retrieval, classification, music performance, and music
analysis. A symbolic form can retrieve sets of pitches (together making themes), rhythms,
harmonic progressions, and more. Classification using symbolic formats identifies stylistic
“emblems” such as a specific harmonic progression or the usage of specific intervals. This
emblem is a defining characteristic. In terms of music performance, symbolic data is also used
for expressive timing studies. Finally, for music analysis, symbolic format is used for automated
analysis (this also overlaps with optimization) and for pitch spelling when MIDI is used
(Burgoyne et al 2016).
28
Image data, like audio data, is difficult for a computer to recognize, and at present there is no
consistent recognized form for sharing it. A score itself can be transcribed or turned into a MIDI
format but that is time consuming. Optical Music Recognition (OMR) was created to solve this
issue. OMR is a tool that can identify musical characters much like Optical Character
Recognition can identify letters in typed images. This renders score images readable by
computers (This will be further discussed in the section on SIMSSA, Single Interface for Music
Score Searching and Analysis).
MIR is a multifaceted, multicultural, multidisciplinary tool. There are also seven facets of
music information (Downie 2003):
1. Pitch
2. Temporal
3. Harmonic
4. Timbral
5. Editorial
6. Textual
7. Bibliographic
In the following graphic, I have given a representation of the overall shape of MIR, as it
currently stands. The reader will note the breakdown into two large parts, locating and
analytic/production, as discussed above. And within these, the reader will find the various of
these fields as described above.
29
Figure 2 The second row and the last set of facets are the two categories of MIR system
explained by J. Stephen Downie in his 2003 article. The four types of data are from chapter 15
by Burgoyne et al 2016
Though the graphic looks as if it represents a concrete situation, these lines are blurring
due to changes since the turn of the century. These changes are being examined by ISMIR, the
Music Information
Retrieval
Locating
Metadata
Bibliographic
Analytical/Product
ion
Audio
Pitch
Temporal
Harmonic
Timbral
Image
Editorial
Textual
Pitch
Temporal
Harmonic
Symbolic
Editorial
Textual
Pitch
Temporal
Harmonic
30
International Society of Music Information Retrieval, and MIREX, the Music Information
Retrieval Evaluation eXchange (Burgoyne et al 2016), but, as stated in their names, they only
look at MIR tools (this is one of my five critical issues). This graphic representation has been
included as a comparison point for the rest of Chapter Two, so I will be referring these types of
data (Metadata, Audio, Image, Symbolic), facets (Pitch, Temporal, Harmonic, Timbral, Editorial,
Textual, Bibliographic), and systems (Locating, Analytical/Production).
2.2 The MIR Tools
In this part of the thesis I shall look at several tools in MIR. Some of which are to be used
by researchers in MIR and others for layperson use. First, I start with VocalSearch, which is now
unavailable online but gives valuable information to the thesis. Next, I discuss three Single
Interface for Music Score Searching and Analysis (SIMSSA) tools: Search the Liber Usualis,
Cantus Ultimus, and Electronic Locator of Vertical Interval Successions (ELVIS). Finally, I
examine a smaller tool which is Donnelly and Sheppard’s Bayesian Network Algorithm that
investigates timbre identification.
2.2.1 Vocalsearch
Vocalsearch is a web-based tool which was available to everyone and is used to identify
unknown songs without metadata (Pardo et al 2008). Metadata is the information about the song
such as title, artist, album, etc (Burgoyne et al 2016) and, without it, it is difficult to identify a
song (Orio 2008). Vocalsearch was created by teams from University of Michigan and Carnegie
Mellon University (Birmingham, Dannenbery, and Pardo 2006). I have chosen to include it as a
31
tool that is ‘complete’—as research grows this project may change, but it is a complete database
when compared to the tools that follow in my discussion. This tool lets the user search—by
humming a segment, by providing music notation, and by providing lyrics—using Melodic
Music Indexing and Query-by-Humming technology.
Melodic Music Indexing is a way for the computer to understand the melodic content of a
song. A song is annotated with the melodic content; often this is done through MIDI sequencing.
MIDI is easily understood by a computer because it gives both pitch and duration. When a query
is hummed, the computer matches it to the corresponding song. Song matching is problematic.
Often, when a query-by-humming platform does not work, it is because the user did not hum the
melody clearly or chose a different song layer, perhaps another instrument or vocal line
(Dannenberg et al 2007). The tool must also equalize and understand the query, and, for
Vocalsearch, this is done using a probability algorithm (Birmingham, Dannenberg Pardo 2006).
The approach measures the similarity between the MIDI and the sung query for the large
database.
Within MIR, Vocalsearch builds upon the existing audio data recognition and locating
systems. It lets a specific song or number of songs be located using various queries recognized
both through a typed search and a hummed audio search. Vocalsearch uses usual metadata
searches if needed but seems to be more useful for unusual queries like, humming or notational
search. The database itself is used for music with a lyrical content, hence the name, but the site is
now unavailable, so the data from a user’s perspective is limited. A common issue with a
database is that music is constantly being created, but this database of music will keep growing
because a user can add songs (Pardo et al 2008).
32
2.2.2 SIMSSA
As I mentioned above, the in-progress tool is a research project called the Single
Interface for Music Score Searching and Analysis (SIMSSA). In this section of the thesis, I
describe three SIMSSA projects: “Search the Liber Usualis,” “Cantus Ultimus,” and “ELVIS.”
These all have different goals and technologies, so including all three gives a well-rounded view
of what goes into a tool.
2.2.2.1 Search the Liber Usualis
The Liber Usualis contains valuable information for those working on early music. The
text is over 2000 pages, so it is difficult to locate the needed information. To solve this problem,
SIMSSA decided to render its contents searchable and make it all available online. This tool lets
researchers search the text for pitch sequences (either transposed or exact),neumes, contour,
intervals, and, of course, text (Search the Liber Usualis Website is located at liber.simssa.ca). To
do so, SIMSSA has used Optical Text Recognition (OTR), sometimes referred to Optical
Character Recognition (OCR), and Optical Music Recognition (OMR).
OMR, as previously mentioned, is a computer method involved in “turning musical
notation represented in a digital image in a computer-manipulable symbolic notation format
(Vigliensoni et al 2011 423).” Using OMR with neumes, or square-note notation, is difficult
because it is a precursor to standard musical notation. Because this notation is a precursor, there
is no standard notation software, so the tool must translate the square-note notation to the
standardized one. OMR must be configured to translate the first notation to the required notation.
The translation to standard notation requires computer understanding of eleven neumes. SIMSSA
33
decided to use the ‘Music Staves Gamera Toolkit’ as a bank of algorithms to perform an analysis
on 40 test pages of the Liber Usualis. The test pages were manually classified and annotated to
double check the output of the algorithms. The algorithms used did the following tasks: created
the staff lines, removed the staff, added ledger lines, and classified the types of neumes. When
classifying neumes, the algorithm did not work 100%, so the final version was examined by a
human to ensure perfection. These algorithms, however, do not tackle clef recognition and note
identification.
Note Identification was made possible using horizontal projection of neumes, but this
only worked for a subset of the eleven neumes. In conjunction with the algorithms used prior for
determining types and placement of neumes relative to the staff, the starting pitch of the neume
was identified using the average size of the neume and its “center of mass (Vigliensoni et al
2011, 426).” The clef was then identified and each neume was given a pitch relative to the clef.
This was possible because the clef is always the first neume-like image in the line. The
remaining set of neumes often have multiple pitches, so they were treated as exceptions to the
above-mentioned method. These neumes were first split so the resulting output would correctly
identify the multiple pitches. In conclusion, a different algorithm from the Music Staves Gamera
Toolkit was used for each of the procedures, but, together, the algorithms rendered the scores
from the entire book searchable.
The scores were made searchable using algorithms, then, the text was searchable through
OTR technology in a simpler fashion to the scores. The “Search the Liber Usualis” project fits in
the MIR chart above by being analytical and as a tool for locating scores and text. It is analytical
because it uses an image of a text and looks at contour and interval, these being elements of
34
analysis and locating because it finds specific ideas based on the searched criteria. This is
possible because of the computer’s ability to ‘read the music’ once the algorithms translate it.
2.2.2.2 Cantus Ultimus
The “Search the Liber Usualis” can be seen as an initial test, laying the groundwork for
the Cantus Ultimus. Their goals, however, are different. For the Liber, the goal was to make it
searchable and make it easy for researchers to use the book. With the Cantus Ultimus, the aim is
to preserve the ancient manuscripts digitally before they deteriorate further. The database shows
images of the searched score, with typed lyrics, and standard notation on the side bar (Cantus
Ultimus is located at cantus.simssa.ca/). Only a few sets of images have been added, but this
project is still growing.
The Cantus Ultimus is part of SIMSSA primarily located at McGill University. This tool
builds upon the existing Cantus Database with more digitized scores and Optical Music
Recognition (OMR) technology. Researchers and plainchant enthusiasts can search through the
database by text, genre, office, and by reference to the associated liturgical feast. Text queries
include lyrics of the chant and the metadata for each. They can also make musical search using
“Volpiano searches” which are searches using notes specifically. This can either be a normal
search where A-B-C would show results for A-B-C, D-E-F, and any other series with the same
intervals or a literal search where only A-B-C sequences would be shown (cantus.simssa.ca/).
Each query can yield multiple results, so, in effect, it is a locating system. The system
locates based on notes, and lyrics, but, more importantly, it is an image searching database. The
35
ability to search through images was made possible through OMR and OCR with all of the
algorithms used in the “Search the Liber Usualis.”
2.2.2.3 Electronic Locator of Vertical Interval Succession ELVIS
The Electronic Locator of Vertical Interval Succession (ELVIS) was created to give
counterpoint the attention it deserves. In fact, a presentation on ELVIS, by Christopher Antila,
won first prize at the 2014 Montreal Digital Humanities Showcase and is funded by a Digging
into Data Challenge award (located at https://elvisproject.ca/). The goal of ELVIS is to look at
musical style in terms of changes in counterpoint (Antilla and Cumming 2014). ELVIS is a set of
downloadable scores in a database, a web-based application, and a downloadable tool. These
three aspects have taken many people to create it. Most of the people, such as Ichiro Fujinaga
and Peter Schubert, are from McGill University in Montreal, those working on the harmonic side
of counterpoint are headed by Ian Quinn from Yale University, and the University of Aberdeen
has also been involved with this project. But, the software for the downloadable tool, music21,
was created by Myke Cuthbert at the Massachusetts Institute of Technology (Music21)
Music21 is a python based “toolkit for computer-aided musicology” (music21) that
allows the user to search though any imported scores using basic programming language. What
this means is, by using commands such as if x then y, then a desired output can be found. This
works especially well for big data queries in MIR (Antilla and Cumming 2014). Using the
ELVIS database, the scores can be imported and searched using music21. The scores in the
database can be searched through the ELVIS website and, using the web app, patterns are
located. The Downloadable software is a VIS, Vertical Interval Succession—meaning a set of
36
harmonic intervals in a particular order—, framework used on music21 (ELVIS project). The
framework uses n-grams when referring to the number (n) of vertical interval successions. This
analysis uses intervals without quality instead of note names to compare many works regardless
of key (Antilla and Cumming 2014). This software is used on Python, a standard programming
language, so those with a knowledge of programming commands can get the most out of it. For
those who do not have programming knowledge there is a Counterpoint web app
(counterpoint.elvis.ca).
The application for ELVIS is called the Counterpoint Web App on their website (ELVIS
project) and is specifically for pattern recognition. This web app continues to use a VIS
framework, but it is more limited in query possibilities than the downloadable extension for
music21. Getting to the application through the website is problematic because of a broken link
or, perhaps, the web application is not finished. As previously mentioned, SIMSSA is building
tools and many of the tools are still in progress.
Music Sonification is used in the ELVIS project to turn the music notation data into
sound but can be manipulated by the researcher. Accessibility, in this case, was the main concern
because not all researchers will have in depth knowledge of recording or sound mixing software.
To solve this problem, the ELVIS team have created a graphic user interface. This is a graphic
representation of music and the most useful audio tools for interval analysis. The concentration
on interval analysis is because ELVIS is for contrapuntal analysis and pattern recognition
(ELVIS project). ELVIS is both a locating and analysis tool. The locating part is from the web
app because it only locates patterns. The analytical axis, however, is much more in depth and is
available for a wide variety of early music using the VIS Framework and the programming
language. Though the intention of the project was for counterpoint alone, the VIS Framework,
37
music21, and the use of pandas libraries—where the scores themselves are kept—make
possibilities endless (ELVIS project).
2.2.3 Donnelly and Sheppard Bayesian Network Algorithm
Donnelly and Sheppard—researchers from University of Notre Dame and Montana State
University respectively—found that timbre has not been fully explored in MIR, so they have
modified an existing algorithm derived from Bayesian probability Networks. This new system of
steps identifies different timbres in music. This can be used to establish another way of
organizing and searching through music in a large corpus. In Donnelly and Sheppard’s article,
“Classification of Musical Timbre Using Bayesian Networks,” nearest neighbour and vector
machine as timbral identification models are compared to this new model. Upon comparison to
the other models, the Bayesian algorithm better differentiates strings, but still has drawbacks.
The other models better differentiate between aerophones, like woodwinds and brass, but,
together, it appears the models can differentiate all instruments together. This seems to still be
useful as a method for categorizing string instruments and, in conjunction with the other tools,
can categorize all instruments.
The target audience for this method, are researchers and others who want to organize a
database using instruments within a musical track. This can grow the locating section for audio
as an alternative to metadata, but this would be for smaller tasks examining instruments. This is
included as a smaller technology that has capabilities for MIR and to show the possibilities for
connection between MIR and Optimization, which is the following chapter.
38
2.3 Critical Analysis
This chapter thus far has explained what each of the tools do. This section examines each
tool critically. I discuss the assumptions made, and further extensions of the tool that were not
examined in the articles themselves. I go through each of the tools in this order that they were
previously presented, so first I examine VocalSearch, SIMSSA—Cantus Ultimus, Liber Usualis,
and ELVIS—and finally the Bayesian Networks presented by Donnelley.
2.3.1 VocalSearch
VocalSearch takes audio input, which is difficult because audio input must be taken apart
to match a specific line in a song. However, it is not mentioned if a melody sung in a different
key from the original will match a song to the input. Though melodies are often remembered in
the original key, the user may not have the range to do so. Also, this article does not mention the
matching of a song from the database to a slightly inaccurate input, so it likely would not work in
such a case.
VocalSearch achieves its goal of being able to reach a large audience using the internet
and having multiple ways of searching queries. Setting up such a database takes a large body of
songs, but to keep a database like this current, new songs must be added regularly. To do this, the
makers of VocalSearch have included a function that allows users to add content to the database.
There are a few issues with users adding content. As previously stated in the Introductory
chapter, the errors made by a computer program are due to human error. This human error can be
in the programming itself, but more often it is in the input for the program. As mentioned with
VocalSearch, there are multiple methods of searching, so the person who inputs a song must
39
enter all correct information. If incorrect information is added, then the tool will not work
correctly rendering it inutile.
2.3.2 SIMSSA
SIMSSA has multiple projects, so I will critically analyze each of the projects from
SIMSSA. Overall, SIMSSA uses scores images and creates databases using OMR, OTR, and
other technologies.
2.3.2.1 Search the Liber Usualis
Using optical text recognition (OTR) and optical music recognition (OMR) the Liber
Usualis is searchable. Meaning that, by typing in a search bar, matching text or music is
highlighted and, by using the colour coating available on the web-based tool, multiple searches
can be highlighted at once. This is useful for researchers who need specific information from this
1000+ page text. More information on the tool can be found in section 2.3.
OMR and OTR are used when the file format ha come from images and are, therefore,
not searchable. These technologies make the document searchable by translating the image data
into a format recognized by the computer. For OTR, this translates the image of a letter to the
letter itself while OMR must attach the letter name and the function of note. This increases the
margin of error. An issue I have found when using the tool is that coloured highlighting box
around the searched content is not completely accurate. With some searched content, the box is
around a set of words that do not contain the searched item. Also, an assumption made is that the
user wants the entire sentence highlighted when searching for a specific word or group of words.
40
This calls into question how OTR works because if it turns a text searchable, then it should only
highlight what is searched.
2.3.2.2 Cantus Ultimus
Cantus Ultimus uses digitized scores and OMR, to create an interactive and searchable
score database. This not only gives a researcher the access to the database, but also lets them
search the score in multiple ways. Furthermore, the database gives the researcher access to the
manuscript image online with the typeface version in the righthand menu. For example, if there
are neumes in very small writing on the score image then the right-hand menu will give the
modern notation of the score.
Currently, there are only a few scores or manuscripts, so the obvious improvement is to
have more scores. The process to add a score, however, is very long even using OTR and OMR
because all scores should be checked. Because the manuscripts have aged, can be faded, or
overall difficult for a computer to read, checking is imperative to a proper database entry. What
could help are Machine Learning and Optimization models that are discussed in further chapters.
2.5.2.3 Electronic Locator of Vertical Interval Successions (ELVIS)
ELVIS gives counterpoint priority in research by combining a database with a web app
and music21. The database gives the user access to a set of scores while the web app and
music21 lets the researcher search through the scores. The web app is designed for a non-
41
programmer to find recurring patterns, and music21 has more features and the entire score can be
searched using programming language.
This tool attempts to cater to both the programmer and the non-programmer by using
music21, that is based on python—a common programming language--, and the web app.
However, the web app only allows the user to find recurring features, so a non-programmer has
limited usage with this tool. It is assumed that a non-programmer will only want to use this tool
to find recurring features while they could, also, be looking for specific vertical interval
successions, or a specific set of notes.
2.3.3 Bayesian Networks
As previously stated, this model gives timbre attention because it can be used to add in a
search. This model is, however, limited in its ability to distinguish between aerophones, but can
better differentiate between strings. To approach this problem, the tool must be combined with
others to achieve greater accuracy.
The goal of this tool is to differentiate between instruments and, eventually, search
through a database and render it searchable by instrument. Another way to approach this is to
look at the metadata which often contains instrument data. Using an OTR-like algorithm, the
metadata can be searched for contributing artists and musicians. This would render a set of works
searchable by the contributors which will often contain the name of the instrument each
contributor plays and, therefore, the set of works would be searchable by instrument.
This approach is useful specifically for works where the contributors’ instrument is
unknown, and the unknown instruments are stringed. Upon combining this method with other
similar methods, the usefulness will increase because all instruments can be identified.
42
Chapter 3: Optimization
I use the term “optimization” to refer to the increase of output for less time and energy in
music analysis—the optimization of effort so as to achieve a result. More specifically, this
section will look at Preference Rule Systems (PRSs) and Probabilistic and Statistical Models.
The goal in optimization is to understand and reproduce a human perception of an input. My
goal is to show that, by integrating more mathematics and computer tools, analysis can be
optimized. This term was inspired by its customary use in the areas of Calculus or Business,
where the optimization of space and resources is described in term of optimization problems. In
music, the term pertains to David Temperley’s progression in analytical approaches.
Temperley’s the Cognition of Basic Musical Structures (2001) took a preference rule
approach to musical elements. For each element, a set of Preference rules were outlined for a
computer tool to analyze a piece of music for information. Following this, Temperley took a few
of the elements examined in the 2001 book and applied a probabilistic approach to them using
Bayesian Probability —a term referring to extensions of the acceptance of Bayes’ Rule3— to
match the approach of similar perceptual fields. The 2007 book, Music and Probability, aims to
build upon the previous set of preference rules and move further in the research. This is the
method of Optimization to be addressed here.
This section of the thesis will explain a previous way of approaching a problem and
explain how a new method has helped to optimize the older one. Like both of Temperley’s
approaches, there will be a section on organization by Preference Rules and a section examining
Probability and Statistical models. In the Preference Rule section, Temperley’s approach will be
3 Bayes’ rule is expressed as follows: P(A|B) =
𝑃(𝐵|𝐴)𝑃(𝐴)
𝑃(𝐵)
where probability is P and items A and B are distinct and
different. Upon acceptance of this theorem, a branch of probability is built called Bayesian Probability
43
discussed first. Following this, other preference rule methods and computer tools will be
presented as they relate to Temperley’s Cognition of Basic Musical Structures (2001). The
second section will show various approaches to music analysis that involve different aspects of
Probability and Statistics. Some of these approaches, like Temperley, use Bayesian Probability,
and others concentrate on statistical analysis. Though the two sections are split in this thesis they
are related since the hierarchy built in a PRS carries through into Probability. I have separated
them in the thesis to better explain how a newer model has built upon Temperley’s work bit they
are related. This is represented graphically in figure 3 where the dashed line represents the
implicit link between the two main sections, even though they are distinct in their principal focus
(i.e. a PRS or application of probability). The items under each of the main headings are the
topics that are covered in this chapter. Bayesian Probability can encompass all of the
subheadings under Preference Rules, but Harmonic Vectors and the application of
Bioinformatics later in the chapter relate more specifically to other subheadings. This is also
represented through dashed lines.
44
Figure 3 This is a graphic representation of the aspects of the field I concentrate on. It shows
that Preference Rules and Probability and Statistics are not completely separate from each
other.
Optimization
Preference
Rules
Metrical
Structure
Contrapuntal
Structure
Harmonic
Structure
Melodic Phrase
Structure
Parallelism
Probability and
Statistics
Bayesian
Probability
Harmoinc
Vectors
Bioinformatics
45
3.1 Preference Rules
This section on Preference Rules will start by outlining David Temperley’s Preference
Rule Systems (PRSs) from The Cognition of Basic Musical Structures (2001). I concentrate on
the first section of the book. Temperley uses a piano roll input for the computer and, based on the
subsection in question, specific tests are performed to examine the usefulness of the approach.
The subsections of this book—Metrical Structure, Melodic Phrase Structure, Contrapuntal
Structure, Tonal-Pitch-Class Representation, Harmonic Structure, and Key Structure—will serve
as subsections of the following chapter. Parallelism is the final subsection in this chapter and it
was added because of a 2002 Temperley and Bartlette article, “Parallelism as a Factor in
Metrical Analysis,” that further explains the importance of parallelism (This article also gives a
broader definition to parallelism which is important to further research). For each subsection,
Temperley’s findings from 2001 will be presented followed by the research that has built upon
the findings.
In this part of the thesis, I take Temperley’s model and examine how the next 16 years of
research has built upon it. I will present a set of the comparable models and give a brief
explanation of the element of Temperley the model builds upon. Following this section, I will
critically examine the newer models and tools through comparison. I begin, however, with
Temperley’s 2001 book The Cognition of Basic Musical Structures.
3.1.1 Metrical Structure
46
As David Temperley explains in The Cognition of Basic Musical Structures (2001), the
computer must concentrate on that beat induction when examining metrical structure. Beat
induction is when the computer must understand or tap the beat. In some senses, the term refers
to a ‘foot tapping’-like induction, but for the Temperley PRS it is for inferring meter. The meter
is shown in a Lerdahl Jackendoff graphic model with different hierarchies of beats as shown in
figure 4. This is a Metrical grid for 2/4 time where the lowest set of dots indicates the eighth note
lever (the division of the beat level), the middle set of dots are the main beat (1, 2, etc.) and the
highest level is the strong beat (the downbeat).
Figure 4 This is a beat hierarchy and described by Lerdahl and Jackendoff
For finding metrical structure, Temperley outlines the rules as followed:
1. Event rule: Prefer event on a strong beat onset
2. Length rule: prefer long events on strong beats
3. Regularity rule: prefer evenly spaced beats at each level
4. Grouping rule: “Prefer a strong beat at beginning of groups (Temperley 2001,
38)”
5. Duple bias rule: Prefer duple or triple levels (for example 3/4 instead of 6/8)
6. Harmony rule: strong beats align with harmonics change
7. Stress rule: prefer strong beats with loud events
8. Linguistic stress rule: prefer stressed syllables on strong beats
2
4
47
9. Parallelism rule: prefer the same metrical structure to the same segments
What these rules consolidate to is a set of preferences for a computer system to go through to
find the “best-fit” for metrical structure. The computer will attempt to fit different meters onto a
piece of music and chose a version where the most parameters are preferred. Because these are
preference rules, in other words the computer does not have to have all of them true when
choosing a meter, so the “best-fit” refers to the meter with most of the preferences.
Tempo is a bottom-up and a top-down process depending on how long someone listens to
a piece of music in the same tempo. It is a bottom-up process because we need a few notes to
perceive a tempo, but following these few notes it is a top-down process because we apply the
tempo we have perceived to the music—as evident through foot-tapping, head bobbing etc.
However, if the tempo were to change suddenly for expressivity, a person could catch it quickly.
According to Desain and Honing, “beat induction is a fast process [since] only after a few notes a
strong sense of beat can be induced” (Desain and Honing 1999,29), and, therefore, a computer
inducing tempo is a large undertaking.
In Temperley’s writing he mentions the “most important work in the area of quantization
(Temperley 2001, 27)” is a 1992 Desain and Honing study entitled “Time Functions Function
Best as Functions of Multiple Times.” I mention this article because of its comparative approach
and use of much the same rule-based models as Temperley. The 1992 article is a connectionist
approach that uses stationary units and interactive units that change based on the surrounding
material. The approach does not keep the length of notes the same but keeps the onset the same,
which is problematic for Temperley. Even though this model offers multiple beats per time
interval it cannot handle expressive timing (Temperley 2001)
48
The 1999 Desain and Honing study, “Computational Models of Beat Induction: The Rule
Based Approach” used a rule-based model for beat induction of musical input and aims to
explore the perception of tempo in people and in computers. The goal of this article is to look at
rule-based models and provide an understanding of how these models create an initial beat
structure. Desain and Honing examined the contribution and robustness of rules in different rule-
based models. The important aspect taken from this article is that models, regardless of year
created, can work more optimally with rules taken from other models. This points towards the
mixing of rules and ideas which is in fact what Temperley has done to create his PRS.
Smith and Honing (2008) explains how the problem of expressive timing could be
overcome. This study used rhythmically isolated segments –meaning that there was only rhythm
as input—to incorporate expressive timing. This accounts for the fact that a person can easily
change their original beat structure to incorporate expression. A technique based on Morlet
Wavelengths was used to do so because of its similarity to human hearing and prediction4. This
remains consistent with the overall goals of Optimization, which is to explain with greater and
greater efficiency perception and human signal processing. These wavelets, however, are best
used for short bursts of input similar to that of expressive timing at the ends of phrases.
The article first looks at the analytical techniques and the application of Morlet Wavelets
to create a continuous wavelet (one that uses expressive timing). A wavelet is a representation of
the repetitive rhythmic structure, such as a repeated rhythm or time signature. Then it puts the
rhythmic findings into a hierarchy. Following this, the article finds the “foot tapping rate” (Smith
and Honing 2008, 83) which is the basic tempo and, finally, the model is complete by showing
4 Definition taken from an Online Dictionary on time frequency. https://cnx.org/contents/SkfT37_l@2/Time-
Frequency-Dictionaries
https://cnx.org/contents/SkfT37_l@2/Time-Frequency-Dictionaries
https://cnx.org/contents/SkfT37_l@2/Time-Frequency-Dictionaries
49
the incorporation of expressive timing (Step 1 with Step 3). Overall, this model will provide an
accurate analysis of foot-tapping. It will be further discussed in Section 3.2.
Hardesty in 2016 goes a different direction in building upon Temperley as well as Huron
and Lerdahl and Jackendoff’s A Generative Theory of Tonal Music (1983). His approach aims to
identify rhythmical features and examine music prediction from the rhythmic and parallelism
point of view. This will be further discussed in the parallelism section.
3.1.2 Contrapuntal Structure
As mentioned in Chapter 2 with the ELVIS project, counterpoint often does not get the
attention it deserves. Temperley examines counterpoint with the goal of understanding the
perception behind it. It is worth mentioning that the PRS for contrapuntal structure is geared
towards a piano roll representation of a piece. Temperley uses the concept of “streams” which
are a group of ideas in the same voice with minimal white squares. The white squares refer to
moments of silence. Temperley’s Preference rules are as follows:
1. Pitch Proximity Rule: prefer to avoid large leaps in a stream
2. New Stream Rule: prefer the least number on streams
3. White Square Rule: prefer the least number of white squares in a stream
4. Collision Rule: prefer cases where a square is in one stream
5. Top Voice Rule: prefer a single voice as the top voice, so there is minimal voice
exchange
I would like to clarify that a stream does not refer to a phrase because, in contrapuntal structures,
a stream can have multiple phrases. For example, one voice in a 4-part fugue would start with the
50
melody which can be multiple phrases, then the same voice will play contrapuntal variations
with multiple phrases; this voice acts as one stream
A 2015 Komosinki article examined analysis of counterpoint for compositional research
by using a method called “dominance relation.” This is a method that uses multiple criteria to do
analysis like a PRS. It specifically looks at first species counterpoint and can produce an output
of a composition. Because this is a composition tool, I will concentrate on the evaluative module
of the method. The model will first always generate the first species counterpoint, but each item
is evaluated by the following criteria:
1. Direct motion
2. A repeated note
3. A vertical imperfect consonance
4. A skip
5. A vertical perfect consonance reached by direct motion
6. Skips by tritone or larger than P5 except m6
These criteria are examined through the generated piece and they are all counted.
The output produced by a dominance relation will be either “dominated” or “non-
dominated.” Using rules based upon the counterpoint method of Fux (Fux 1965), dominated
counterpoint will have another counter point that is ‘better’ and this evaluation will repeat until a
final, non-dominated counterpoint is found. This article builds upon Temperley’s rules but only
in a general sense. Temperley’s rules are used to narrow down choices and find the best fit, while
this method tests all rules on each counterpoint, and eventually finds the counterpoint that most
exemplifies the rules.
51
Giraud et al in 2015 builds upon research on fugues. The input has the voices in the fugue
already separated. This is much the same as Temperley’s streams and uses “generic MIR
techniques” (Giraud et al 2015, 79). I have decided to put this into the Optimization section for
two reasons. First, it is an example of work lying between optimization and MIR. Secondly, it
acts more as an Optimization tool than an MIR tool because of its small scale. The goal is not to
create a database. Instead, the goal is to be used as an evaluative model for fugues.
This tool needs input that is already separated for computer use, so it uses files from the
Humdrum toolkit because they have been previously separated into voices. This method
concentrates on using tools to examine pattern repetitions and gives a complete analysis. It does
so by identifying the subject, and countersubject(s), the key for individual occurrences, harmonic
sequence, cadence, pedals, and overall structure. Giraud et al tested this method on 36 Bach and
Shostakovich fugues. They found that, for some pieces, the analysis was complete and correct,
but the method still gets false positives. Other results were completely unusable, but these were
mostly double and triple fugues. More specifically, if the subject was correctly identified the
overall analysis was more correct. Like any computer method, this one can be made better and
Giraud et al makes suggestions on how. To make this optimal, Giraud et al suggests that the
current method can be combined with probabilistic models. Probabilistic models will be
discussed in the following section.
3.1.3 Tonal-Pitch Class Representation and Harmonic Structure
Tonal-Pitch Class Representation is important to the PRS of Harmonic Structure. The
term Tonal-Pitch Class is taken from Temperley and I have understood it to mean the set of pitch
classes creating a tonal structure (i.e. key area). Tonal-Pitch Class representation is the sorting of
52
the pitches in a piece to a specific key. The Preference rules outlined by Temperley are as
follows:
1. Pitch Variance Rule: prefer to label such that nearby events within the same key
2. Voice-leading Rule: events a half step apart are preferred to be different letter names
3. Harmonic Feedback Rule: prefer a Tonal Pitch Class where the harmonic structure is
good (meaning that there is a logical progression)
These rules help to decide a specific key and minimize notes outside of a chosen key. All keys
would be tested for a specific idea and the best-fit would be chosen. The PRS for Harmonic
Structure builds upon this assignment by adding roots and chords to the piece. These rules create
a hierarchy of possibilities for the individual chords and, because the last rule for Tonal-Pitch
representation considers harmonic progression, the progression is relatively accurate. This does
not eliminate the analyst, however, because this is not 100% accurate. The PRS for Harmonic
Structure are as follows:
1. Compatibility Rule: Prefer roots in the following order-> 1,5,3, flat3, flat7, flat5, flat9,
ornamental (all others)
2. Strong Beat Rule: prefer chords on strong beats
3. Harmonic Variance Rule: prefer the next root to be on the circle of fifths
4. Ornamental Dissonance Rule: [ornamental dissonance is “if it does not have a chord-tone
relationship to the chosen root] Prefer ornamental dissonances where the next or prior
note is a tone or semitone away and/or on a weak beat
The PRS for Harmonic Structure still considers chords that are not part of the original key, and
thus modal mixture and other temporary key changes are possible. This method also considers
proximity, so modulation can be addressed.
53
To add to this, De Haas et al in 2013 created HarmTrace which stands for Harmonic
Analysis and Retrieval of Music with Type-level Representation of Abstract Chord Entities. This
tool is useful for tonal works to separate data using harmonic similarity estimation, chord
recognition, and automatic harmonization. To explain further, this tool can recognize chords and
show that different aspects of a piece are similar because of the harmonic structure or
progression. This tool can do so by taking all the chord possibilities into consideration for the
specific beat and extracting the most correct one. (The tool can also harmonize a progression
which is useful for the performer, but not within the scope of this paper.) This article was
included because it furthers Temperley’s PRS: it can provide the automatic harmonization and
similarity estimation. It does not need the previous Tonal-Pitch class representation PRS to
figure out the specific chords. Instead it puts the possibilities into a hierarchical structure. The
authors claim that this model can be used for MIR because it moves beyond theoretical uses and
is practical as an internet-based method (De Haas et al 2013).
3.1.4 Melodic Phrase Structure
Melodic Phrase Structure is involved in multiple levels of a piece because melody itself often
adheres to specific rules and works with other musical structures such as meter and harmony
(Temperley 2001). Thus, Temperley’s PRS must take all of these into account to be accurate.
The rules are as follows:
1. Gap Rule: prefer boundaries either at time between intervals or at a time at a rest before
and interval
2. Phrase Length Rule: prefer 8 note long phrases
54
3. Metrical Parallelism Rule: prefer phrases that start at the same point in the metrical
structure
The first rule refers to the time that could be between phrases or in a phrase. The Gap Rule is to
make phrase boundaries at a rest or after a longer note value because these are both possibilities.
An extension of this model will be discussed in 3.1.5 parallelism.
3.1.5 Parallelism
In The Cognition of Basic Musical Structures (2001), parallelism was mentioned and
treated, and was revisited in Temperlay and Bartlette 2002 article. Parallelism was redefined as
follows:
a) Parallelism: repetition either exact sequence or contour
b) Parallelism rule: “prefer beat intervals of a certain distance to the extent that repetition
occurs at that distance in the vicinity” (Temperley Barlette 2002, 134)
This twofold definition kept the existing definition but added contour and sequence in essence.
Emilios Cambouropoulos, from Aristotle University of Thessaloniki, in 2006 explored
parallelism and melodic segmentation using a computer. Cambouropoulos wanted to incorporate
parallelism into this method because it is often forgotten by analysts and has an impact on
parsing data. Cambouropoulos used the pattern boundary strength profile (PAT) and the Local
Boundary Detection Model (LBDM) to find phrase boundaries that take parallelism into account.
PAT was first only able to extract patterns that are exactly the same, but Cambouropoulos
modified it to extract patterns that are similar. The goal of this modification is to provide a more
general application of parallelism which is exactly what Temperley wanted to do with the
modification of his prior definition. Cambouropoulos was able to create a basic method for
55
melodic segmentation that incorporates parallelism, but it is not perfect as it does not provide the
final segmentation of the piece.
As previously mentioned, Hardesty in 2016 published an article on music prediction and
generation for rhythm. This method was based on finding parallelism, Lerdahl and Jackendoff’s
1983 publication –A Generative Theory of Tonal Music (1983) –, and the psychological
understanding of music. The psychological aspect of rhythm is based on “rhythmic anticipation
and parallelism” (Hardesty 2016, 39). This method was only conducted on binary rhythm where
strong and weak beats alternate, so the assumption is that an attack on a weak beat is followed by
an attack of the strong beat. The method takes derivation of a rhythm to find the underlying
operations to generate rhythms. The goal is to “[define] a collection of rhythmic building blocks
(Hardesty 2016 abstract)” while taking psychological aspects of rhythm and meter and
parallelism into account. The result is a hierarchy of rhythms based on duration. An interesting
point is that the final outcome can still be the same if the input is different so long as they are
derived from the same rhythm.
3.2 Probabilistic and Statistical models
Though this is a separate section from Preference Rules, Probability and Statistics
encompasses the same hierarchical structure as a Preference Rule System. Often in Computer
Music Analysis, different methods are layered to create an optimal outcome. The incorporation
of Probability and Statistics stems from Temperley’s move away from PRSs to a model that is
more similar to other fields studying perception.
3.2.1 Introduction
56
In 2010, the Journal of Mathematics and Music published a special edition examining the
first movement of Brahms’ String Quartet in C Minor Op. 5, no. 1 to show different perspectives
on Computer Music Analysis (referred to in the article as “computer-aided analysis”). The
edition brought to light three major developments I explore further: Music Information Retrieval,
Optimization, and Machine Learning. This section, however, will concentrate on Optimization in
terms of probability and statistics. This will touch on work by David Temperley, Philippe Cathé,
and Darrell Conklin. I will also introduce a method of using probability to assist in MIR,
introduced in the previous chapter.
Temperley sought to improve Preference Rules with Bayesian Probability because it can
do the job of preference rules. Preference rules are not used in other perception relation fields,
like linguistics, so Temperley took their methods and adapted it for music. Temperley changed
from a preference rule approach to a more generative approach using Bayesian Probability,
which stems from the accepting of Bayes Rule as correct. This is when the probability of another
event happening changes based on the occurrence of a previous event. Combining his previous
work with that of Music and Probability (2007), Temperley created Melisma Version 2.0,
available online for analysis.
Philippe Cathé located at L’Université Paris-Sorbonne looks primarily at Harmonic
Vectors and uses a computer to perform the statistical analysis. The computer, however, does not
perform the analysis itself, but, instead, treats each as a data file. Cathé attempts to keep the
music in mind by, after the statistical analysis, explaining the interaction between the music and
the vectors. With Harmonic Vectors, the changes can be heard in recordings making the
statistical analysis seem more factual.
57
Darrell Conklin also employed probability, as well as bioinformatics for efficient pattern
recognition. Finding patterns is an integral part of analysis but becomes subjective when
choosing patterns for study. The goal of Conklin’s work is to create an algorithm to find the
distinctive patterns, which are patterns frequent within the piece, the corpus, and infrequent in a
selected set of pieces, the anticorpus. This gives the analyst a set of patterns that may be
important.
3.2.2 David Temperley’s use of Bayesian Probability
In the Cognition of Basic Musical Structures (2001), David Temperley created a set of
Preference Rules inspired A Generative Theory of Tonal Music (1983) by Lerdahl and
Jackendoff. Similarly, Music and Probability (2007) takes a generative approach and combines it
with Bayesian Probability. The reason for using probability was to use similar tools to language
and vision because preference rules were not being used in these similar domains. Bayesian
probability is a subset of probabilistic rules where the probability of an occurrence is affected by
the occurrence of a previous event. This subsection will concentrate on select chapters from
Music and Probability (2007).
The approach to analysis here is to first do a probabilistic analysis of the Essen Folksong
Collection to find the probability of various musical building blocks, such as meter, keys—both
in monophonic and polyphonic music—, and melodic ideas. This analysis sets the parameters for
the computer program, so that the rest of the pieces analyzed will have a higher accuracy. Using
the Essen Folksong Collection5, the parameters are set, and the analysis is completed through a
5 The Essen Folksong Collection is a collection of 10,000 folksongs collected by Helmut Schaffrath. These are
located at http://essen.themefinder.org/
http://essen.themefinder.org/
58
generative process. A generative process works by finding a structure based on the surface
content of a work and then generating a surface in multiple choices (keys, meter, etc.). After
generating a surface, the program will decide which is the highest probability based on the
underlying structure. This simplified method will now be explained for meter, key—both
monophonic and polyphonic—, and melodic ideas.
Meter has been well studied prior to Temperley’s work in Music and Probability (2007),
so this model aims to build upon previous models with a generative approach. A ‘metrical grid’
is generated from the piece based on the parameters set from the remainder of the Essen
Folksong Collection, but there are many different possibilities of metrical grids for any given
piece. As noted above metrical grid refers to the graphic representation of beats, strong beats,
and main beat divisions in three levels as shown in figure 3 (Section 3.1.1)
The following steps are used in creating the optimal grid:
1. Decide time signature: choices between duple and triple meter and the individual time
signatures within each category
2. Generate the tactus: this is the middle or second level of beats and is based on the notes
that are present (simultaneously with 3)
3. Addition of upper level beats: indicates the actual beat division and is the highest level in
the metrical grid (simultaneously with 2)
4. Addition of lower level beats: indication of the subdivision required for the excerpt. This
is the lowest set of points on the metrical grid
5. Generate note onset: solid vertical lines that indicate where the actual notes line up on
the metrical grid. (not in figure)
59
After generating many metrical grids, the tool would test the probability of the onset, with the
assumption that the grid was correct. It would then multiple the grid by the probability of the grid
itself. This would yield a probability value of statistically less than one and the highest scoring
grid would be selected.
Upon testing this model on multiple pieces, Temperley compared it to the software that
previously used preference rules to find the best fit. The tests showed that the PRS was more
accurate when compared to the Bayesian model. Temperley hypothesized reasons for this. The
reasons for higher accuracy with the first model is because the perception of rhythm is based on
harmony, note lengths, and parallelism as well. Longer note lengths most often occur on strong
beats such as the beginning of the measure and the Bayesian Model at the time could not take
that into consideration.
In creating a computer model that perceives key, the musical facets the mind isolates
must be taken into consideration. A key, at least in monophonic pieces, is composed of both
pitch proximity and range and Temperley poses the question “What kind of pitch sequence
makes a likely melody?” (Temperley 2007). This, once again, is a generative process where all
keys are tested, but there is no obvious starting point when examining key, so Temperley relies
upon previous research on key-profile. The key-profiles are heavily based on the Krumhansl and
Kessler 1982 experiment. The experiment asked participants to rate the degree to which audible
pitches belonged to an established key and, from this, a correlation was created. This experiment
was successful in major keys, but minor keys were problematic because there are multiple
versions of a minor key. Temperley made the needed changes to the established key profiles to
incorporate minor keys and began constructing a model using Bayesian Probability.
60
To construct a generative process for key finding, Temperley used the key profiles as a
starting point. He did an overall analysis of the Essen Folksong Collection to find a normal
distribution, or bell-curve, of the pitches. Following this, a pitch is chosen at random from the
peak area of the bell-curve to construct a range profile around it, and then, it is combined with a
proximity profile. All keys are tested in this way and the key with the highest probability will be
chosen as the key for the melody. This same method for key-finding is problematic for
polyphony. This approach takes the structure from the surface material, but the surface of a
polyphonic piece is dense and contains notes acting as passing or neighbouring tones. When
examining a piece, many notes are not the tonic of a scale, so this would skew most computer
programs. Temperley aimed to overcome this obstacle by segmenting the piece on the
assumption that pieces stay in the same key for a little while. This assumption is based on the
perception-based concept of ‘inertia’ where there is a lack of movement in an item (Larson
2004). In this case, it means that the key will stay the same for the amount of time affected by
inertia. This also helps with the second problem of modulation.
Modulation occurs when a new key is introduced for an indeterminate amount of time.
This is difficult for computer because two, or more, notes act as the tonic at different times in the
piece. In the case of polyphony, this is overcome by segmenting the piece into smaller sections,
as is already needed to look at polyphonic works. The smaller segments will show a higher
probability to one key and a section that modulates will show a higher probability of another key.
The segmentation, in turn, will assist in both, identifying modulation and key-finding in
polyphonic works.
Melodic ideas in this case often involve expectation or error detection where the model
attempts to answer this question: ‘does this pitch work in this sequence?’ Pitch expectation is
61
tested in two ways. The first is if the participant expects a pitch and the second is whether a
participant can add a pitch. 6 Temperley is concentrating on the first type of test and uses the
Cuddy and Lunney (1995) experiment where participants rated the ‘fit’ of the next note in a
corpus, not the Essen Corpus, from one to seven. The numbers were converted, by Temperley,
into values to use the probability model. The values were used to test the strength of the fit of the
note to explore the capabilities of the computer tool and to examine pitch sequence. Here,
Temperley realized that the parameters work best if they were created by other pieces from the
same corpus. The strength of best fit is much higher (from 0.729 to 0.870 in terms of correlation
coefficient), but this shows that the computer tool does not work equally for all music but can
give some insight.
3.2.3 Statistics and Harmonic Vectors
Harmonic Vectors is a newer harmony theory influenced by Riemann that aims to take a
generative and systematic look at tonality that can be used for statistical analysis (Meeùs 2003).
Nicolas Meeùs used this term from 1988 and wrote extensively on it into the twentieth century.
My primary source for background information on Harmonic Vectors is a 2003 Meeùs article
entitled “Vecteurs harmoniques.” This takes the motion of scale degrees and systematically sorts
them into either Dominant (V) or Subdominant (SD) Vectors. The two types of vector are based
on classification of progressions from Schoenberg and Sadaï, who wrote an extension of
Schoenberg’s work. The reason for this analysis is the assumption that a chord alone has no
meaning but creates its function within a succession of chords; therefore, the meaning is
6 Temperley refers to this as either the perception paradigm or the production paradigm
62
generative. These vectors can be graphically represented and can be used for statistical analysis
but may not be representative if done on few works (Meeùs 2003).
Philippe Cathé took Harmonic Vectors and combined it with Computer Music Analysis
to dig deeper into a set of works. There are three levels of research with Harmonic Vectors:
finding regularities, finding pendulums, and finding correlations between the other two levels
and the music (Cathé 2010a). Cathé expands on vectorial pairs (Meeùs 2003), an analysis
looking at the pairs of side-by-side vectors, and mono vectorial sequences, meaning the same
vector repeated, as methods for finding regularities. Pendulums help to further differentiate
composers based on their vector use. A pendulum is a series of three vectors where the first and
third vector are the same and the second vector is different. The final level of research brings
back the music and aims to find correlations between the music and vectors found. The goal is to
understand why a vector is used (Cathé 2010a). These three stages help to further explore a set of
works.
The application of harmonic vectors for statistical analysis was mentioned and used by
both Meeùs and Cathé. Both expressed the analysis in a table of percentages, organized by
movement of scale degrees, the types of vector, and level, or with graphic representation, as line
diagrams or graphs. The diagrams express the amount of each vector (Meeùs 2003), vector pair,
mono-vectorial sequences, or pendulums (Cathé 2010a), most often in percentage, and break this
down by era and composer. The computer has assisted Cathé in the three-level analysis by
cutting down on the time and making the output as unbiased as possible. To perform
comparisons, Cathé uses ‘Charles.’ ‘Charles’ is a computer program based on Microsoft Excel
that gives proportions vectors (pair, pendulums, etc.) for a certain piece or a set of pieces, or data
files. The output is expressed most often in charts or linear graphics. This gives the analyst
63
another method to represent the data and makes comparison easier between eras, composers, and
compositions.
The idea that works of music taken from different eras sound different is not new.
Harmonic Vectors aims to show this through the change in proportions between eras. Each era
has a different average of each vector, vector pairs, pendulums etc. that can be identified through
larger scale comparative analysis (Cathé 2010b) and represented in the form of statistics. In
addition to eras, a comparative statistical analysis of harmonic vectors can also be applied to
composers and compositions. All composers and compositions are slightly different, so Philippe
Cathé took ten versions of Vater unser im Himmelreich and compared the usage of Harmonic
Vectors (Cathé 2010b). A composer uses different amount of each vector (pair, pendulums, etc.)
by piece, but the percentage remains very close (Cathé 2010a). This can also be used to show the
degree of difference between two composers meaning that a composer’s use of vectors is
consistent by composer.
3.2.4 Distinctive Patterns using Bioinformatics and Probability
Looking for patterns is needed in all analyses and finding patterns that are distinctive is
paramount. According to Darrell Conklin, a distinctive pattern is one which is frequent within
the corpus when compared to the frequency within the anticorpus. The algorithm that was
created aims to find the distinctive pattern within the corpus to narrow down the possibilities for
the analyst (Conklin 2008). The corpus is a specific piece or set of pieces that are examined, so
the distinctive patterns found is over-represented in the corpus. The anticorpus, on the other
hand, is a piece or a set of pieces, often by the same composer, where the distinctive pattern is
64
under-represented. The frequency needed for distinctiveness, the corpus, and the anticorpus are
all determined by the analyst. I will now explain a few applications of distinctiveness.
In this section, I will look at two different applications of this method done by Darrell
Conklin. The first is on the Essen Folksong Collection and the second is on Johannes Brahms’
String Quartet opus 51 no.1. The reason for choosing Conklin’s application is to look at the
approach of a researcher who commonly examines Music and Machine Learning (from Basque
University Webstire http://www.ehu.eus/cs-ikerbasque/conklin/) and to further explain
distinctiveness with an analysis. Both of the analyses use the similar following formula:
ΔP ≝
𝑝(𝑃 ⊕⁄ )
𝑝(𝑃 ⊖⁄ )
=
𝑐⊕(𝑃)
𝑝(𝑃 ⊖⁄ ) ×𝑛⊕
The middle expression (between the two equal signs) refers to the probability of a pattern in the
corpus (⊕) or in the anticorpus (⊖). The last expression is used to find the value of ΔP, also
known as likelihood of P (I(P)). The numerator is the total number of a pattern in the corpus and
the denominator is the probability of a pattern in the anticorpus multiplied by the total number of
events in the corpus.
The first analysis was conducted on the Essen Folksong Collection, the same collection
used by Temperley in his Music and Probability (2007), and, more specifically, the Shanxi,
Austrian, and Swiss folksongs. Conklin was searching for the “maximally general distinctive
patterns,” (Conklin 2008,1) which are patterns that can be used for classification but are not so
general that they occur in almost all pieces. For a pattern to be considered interesting, or
frequent, it must be in a minimum of 20% of the corpus. The likelihood (I (P)), also known as Δ
P in later works, must be greater or equal to 3. This study showed that, for each region, there is a
maximally general distinctive pattern that can be used for classification purposes (Conklin 2008).
http://www.ehu.eus/cs-ikerbasque/conklin/
65
The second analysis was on the first movement of the Brahms String Quartet, opus 51 no
1, and the anticorpus used was the string quartets no 2 and no 3. For the best comparison,
Conklin only uses the first movement of no 2 and no 3. The goal was to show that the motives
Forte found in his ariticle “Motivic design and structural levels in the first movement of
Brahms’s string quartet in C minor” (1983) are found as distinctive using this analysis,
excluding two motives that cannot be maximally general. This is converse to when David Huron
revisited the same analysis in 2001, where Huron found that only the alpha motive was
distinctive (Conklin 2010).
I will now outline what was determined by the analysis. The minimum frequency, in this
study, for a pattern is 10 and the likelihood of a pattern, renamed to the ΔP, is minimum 3 to be
considered distinctive. The Humdrum kern formats were used for an easily available and
computer compatible format. When the analysis was completed, all of Forte’s motives, not
including the mentioned exception, were labeled as distinctive (Conklin 2010). This shows that
the tool can be used to identify likely distinctive motives, but the analyst will still need to analyse
the data for a complete picture.
3.3 Critical Analysis: Optimization
The chapter thus far shows the progression made in research in general and specifically
that David Temperley made from The Cognition of Basic Musical Structures (2001) to Music
and Probability (2007) by exploring the previous research, reasons for looking at probability,
and the use of Bayesian Networks. In essence, the recent research in Optimization builds upon
what Temperley provides or upon developments mirrored by Temperley. (Temperley has more
recent publications, but these will be discussed in the conclusion of the thesis.)
66
3.3.1 Preference rules: Metrical Structure
The Smith and Honing use of Morlet wavelets was discussed in 3.1.1 as a method to
incorporate expressive timing into beat induction. This method has its limitations. Firstly, the
method does not work by exposing the tool to the music because the input must be in an isolated
rhythm form. This means the tool cannot perform beat induction on a non-separated piece.
Another issue is the selection of tempo is not as sensitive as needed. This method has made leaps
and bounds in testing and creation but cannot currently work as a stand-alone program. And,
because of its current limitations, the method cannot be a simple online application at this point,
so it is only useful to a small number of people.
The first improvement is to make it either a stand-alone program or an addition to another
larger tool. As its own stand-alone program, it would have much use to a researcher but may be a
teaching aid for a student to learn expressive timing or beat induction. A more wide-spread use
of this tool would be in playback software for scores to determine the efficacy of a playback. If
the tool could not find the tempo of a piece as played in a playback, then it would show that the
playback is not as similar to human playing. However, this tool does help to further the goal of
Optimization by getting closer to human beat induction. In time, if work on beat induction
continues, researchers may understand how people can find the beat and adapt it quickly.
3.3.2 Preference Rules: Counterpoint
The extensions of counterpoint from 2000 to 2016 have concentrated on the evaluative or
compositional side, but they are still useful to analysts. The Komosinki article concentrates
67
heavily on composition but it gives an evaluative approach for the generated composition. On a
smaller scale this tool is useful for evaluating a first species counterpoint which is taking an
opposite direction than Temperley. It has been included to show a different use for Temeprley’s
Preference Rules. It is useful to an analyst by giving a general outline of evaluative criteria
needed by a computer. On its own, it needs to stay with a generative model because of the
dominated vs. non-dominated output, but it is a good model for future evaluations of generative
models.
The tool proposed by Giraud et al gives the analyst a strong head start on fugue analysis
if the subject is properly identified. This tool is best used on a larger corpus of similar fugues
(i.e. by the same composer in the same era) if it were to be combined with probabilistic models.
The best probability of subject length, key notes used etc., is found when the corpus is evaluated
independently. This was a trend in probability, because probability of certain gestures change
based on the composer. This tool would indeed be best used in conjunction with a probabilistic
model, but extra work needs to be done to separate a set of fugues into streams or voices. To
separate the voices Temperley’s preference rules to examine streams can be used if they are
indeed one in the same. However, neither of these tools examine fugues with multiple
instruments. This is left for further work.
3.3.3 Preference Rules: Tonal-Class representation and Harmony
HarmTrace can estimate the harmonic similarity, recognize chords, and automatically
harmonize an input. This tool does not need a set of Tonal-Pitch Class rules or key profiles.
Instead it uses a hierarchical structure to narrow down its choices. The authors of the article
further say that this model can be used for MIR because it is practical as an internet-based
68
method. An issue that is not addressed is what kind of input can be used with HarmTrace this is
one of my five Critical issues. If the input needs to be separated in some way then old Humdrum
files could be used, but if there is an image score input then any clear scan of a score can be
used. Another common input is a music notation software input (such as a Finale file), but these
formats are specific to the notation software that is being used. Furthermore, an audio file input
would be optimal because they are widely available, but this is not practical because no
recording is perfect.
3.3.4 Melodic Phrase Structure and Parallelism
The PAT—pattern boundary strength profile—and LBDM—Local Boundary Detection
Model—have improved with Cambouropoulos’s modifications in 2006, but since then
parallelism has not been in the forefront of research. This more generalized application of
parallelism is imperative for pieces where a repetition is ornamented or changed slightly, but it is
often not considered with analysis tools because they often examine recurring features or one
specific task.
Boundary detection is generally used for parsing data and by incorporating parallelism
the boundaries are more accurate. By putting HarmTrace and PAT/LBDM together, the output
could have a higher accuracy and can provide a precise parsing of data as needed for analysis.
The final segmentation could be obtained for the PAT/LBDM outputs by using the HarmTrace
harmonic infrastructure. This would be a way to leverage the strengths of both models to provide
the user with a more complete outcome.
The Hardesty 2016 tool for examining rhythm has a strong basis in rhythm and music
generation. It has further uses in optimization because it incorporates psychological elements,
69
however, the goal is not completely realized. The tool can only process and generate binary
rhythm, but, with further research, the tool can come close to the human music prediction. Thus,
it furthers optimization’s goal in understanding how humans perceive rhythm and can predict it.
3.3.5 Probability and Statistics
The tools presented in the section on Probabilistic and Statistical models take three
different approaches to using probability and statistics in Computer Music Analysis. Temperley
looked at Bayesian probability, the set of probabilistic principals following the acceptance of
Bayes’ Rule, to incorporate his previous research in PRSs with cognition in similar fields to
music. Cathé’s approach aims to always keep the music in mind, so the computer looks at every
data file, music in this case, and the analyst makes the final comparisons and assumptions
looking at both music and harmonic vectors. Darrell Conklin takes bioinformatics and
probability for finding distinctive patterns, and the method parses music giving the analyst the
patterns that may be important.
Temperley’s use of Bayesian probability is to be used in his online database. Overall, the
generated coefficients can be used in other probabilistic models and in other corpus studies. As
was stated by Temperley, the coefficients are more accurate when generated for a specific
corpus, so for maximal accuracy this should be done. Furthermore, these coefficients can be used
in any generative theory if they are based on the same corpus. This is also its limitation since re-
analysing a set of works when investigating a different corpus is time consuming. This can
sometimes defeat the purpose of a computer model as it does not save time and energy.
Overall these models take a set of data and provide an output of specific generalizations.
For example, Cathé has generalized the percentage of use for each harmonic vector by composer,
70
meaning that each composer has a distinct percentage. This can be further combined with a study
on authorship in 1963. This study was on literary works and measured the specific ratio of
simple words such as upon, such etc. The amount some words were used is distinctive to the
authors. The Poisson Process, a specific aspect of probability, was adapted to complete this
method. This could potentially be adapted to music where, instead of words, harmonic vectors
are used. This application is further discussed in the concluding section of the thesis.
71
Chapter 4- Machine Learning
4.1 Introduction to Machine Learning
Machine Learning can be defined as the process of teaching a computer (the machine) to
devise new tasks, and in the case of music, to perform these new tasks on musical works. This
has applications for many aspects of Computer Music Analysis, but the focal point of Machine
Learning is the tool itself. The tool or method must provide a relatively accurate output on a first
stage analysis so that, in turn, the tool can reliably produce correct output for other pieces. This
differs from MIR and Optimization, because for MIR the goal is a database, and for
Optimization, as I have described it, the goal is to understand and reproduce a human perception
of an input.
Music poses many challenges for any computer-based analytical tool, and, as such, the
analysis of full works of music using complex ideas is not common in Machine Learning.
Machine Learning is used in multiple disciplines. When used for music, the input is often over
simplified (Widmer 2006). The field of Machine Learning as applied to music is still in its
infancy. Thus, I can only give a cursory overview of some of its developments. (Recently, a
special issue of the Journal of Mathematics and Music concentrated on Music generation in
Machine Learning, but this is an exceptional development.).
In this section, I show several emerging possibilities for Machine Learning as well as
precedents. I do so in an introductory manner because the actual processes of Machine Learning
and their application are too complex to be treated exhaustively in a thesis of this scope. (I will
discuss the literature of Machine Learning primarily from the angle of a music theorist although
it holds considerable possibilities for other domains such as composition.) Unlike previous
72
chapters, the critical analysis for this chapter is in the conclusion of the thesis, since Machine
Learning has importance to Computer Music Analysis as a whole.
4.2 Outline of Selected Tools
In this section, I aim to expose different tools in Machine Learning. First, I start with a
tool that assists guitarists with ornamentation. The next two sections build upon one another as
they are both created by Darrell Conklin and the second builds upon the first in terms of
segmentation. It is also an application of the multiple-viewpoint system discussed in the
Literature Review. The final tool is an analysis of analysis using Machine Learning. I
concentrate on Kirlin and Yust’s smaller details because it is one of the few Machine Learning
models that directly adds to music analysis.
4.2.1 Ornamentation in Jazz Guitar
I begin with a recent development in the application of Machine Learning to music. For
jazz guitar works, ornamentation is important becauseit is how expression is conveyed, but it is
not written in the score. The performer must come up with the ornamentation themselves or go
through countless recordings. Giraldo and Ramírez have attempted to address this problem with
Machine Learning. This tool aims to take an “un-expressive score” (Giraldo and Ramírez 2016,
107) and add expressive ornaments to it. This machine learning tool uses 27 sets of audio input
from a professional guitarist as a test set. Using a group of ornamentation vectors, the audio input
was aligned with the score to create an expressive score of the recording. In effect, a non-
expressive score was put together with a set of vectors derived from expressive scores. While the
73
primary goal of the study was to create a Machine Learning tool, a secondary goal of this tool
was to give new guitarists an expressive score to read to help them learn the ornamentation
practices.
Following the use of the test set, the tool was further tested on un-expressive input to get
an expressive output. The output of the tool was a generated MIDI or other audio format
recording that combined the un-expressive score with the Machine Learning ornamentation. The
researchers determined that the overall stylistic and grammatical correctness of the tool is a
statistical 78%. This tool does need further work, especially in refining itself as a Machine
Learning tool. In terms of its secondary goal however, it does fill a void in jazz guitar
performance.
4.2.2 Melodic Analysis with segment classes
Darrell Conklin’s name appears frequently in machine learning as applied to music. His
research centres around the problem of music as a multi-faceted entity. The article, entitled
“Melodic Analysis with Segment classes” (Conklin 2006), is a stepping stone towards his later
research that I will discuss in 4.2.3 (The basis for this article includes the Conklin and Whitten
1995 article discussed in the Literature Review). Conklin’s 2006 article depends upon a concept
called “viewpoints.” The idea behind viewpoints is to take a cross section of musical structures
and estimate the accuracy of the output. The aim of this study is to “demonstrate how the
viewpoints representation for segment classes can be applied to interesting music data mining
tasks” (Conklin 2006, 350).
Conklin’s method is based on a study of natural language and its segmentation. For data
mining, music must first be in a format understood by the computer and it must be hierarchal.
74
Accordingly, Conklin uses specific hierarchal and searchable terminology. A musical object is a
note, a segment is a set of musical objects, and a sequence is a series of many segments in a
specific order. Melody is a type of sequence: it is a set of notes in a specific order where the
order of those sets is specified. Segmentation is a fundamental aspect of Conklin’s analysis.
There were two methods of segmentation tested. The first was phrase boundaries and the second
was meter. Each test involved segmentation created using a viewpoint based on a set of pitches.
The particular expression determined by Conklin is as follows: set(mod12(intref(pitch,key))).
The method succeeded most with phrase and metric segmentation undertaken by beat (98%),
note (92%), and bar (91%). (There was also successful interval level (94%) which was not
segmented.) As is obvious by the percentages, the most successful was for segmentation by beat.
While the immediate task in Machine Learning is to create a tool, Conklin’s secondary
task was to discriminate style. The segmental viewpoint by beats can be used in future models
for the secondary task. Conklin discusses the further work that needs to be done in this regard.
Firstly, the length of segments must be examined for a corpus, meaning a collection of a style of
music. Secondly, the problems of the automated segmentation, meaning the segmentation done
by the computer, should be compared to human segmentation.
4.2.3 Chord sequence generation with semiotic patterns
Conklin’s 2016 article, “Chord Sequence Generation with Semiotic Pattern,” addresses
the semiotic value in trance music—a type of fast electronic music, like techno, centred
predominantly in Europe—when the latter is generated by a Machine Learning model. Aspects
of the chords in trance music have intrinsic meaning and, therefore, the meaning must be kept to
75
have an accurate stylistic representation of the music. Conklin’s model aspires to generate a
chord sequence for trance music that keeps the qualities of trance music intact.
The semiotic patterns of trance music are defined as a sequence of “paradigmatic units”
(Conklin 2016, 94). According to Conklin, the paradigmatic unit is when an idea is given a
variable (a letter name) so that a pattern of these variables can be discussed. Viewpoints, a
statistical model discussed above, is used to map, or create, an output according to a plan.
Conklin’s viewpoints are based on the following criteria: chord, root, chordal quality, inter-
onset-interval (meaning the start and stop points of a particular sound), duration, chord diatonic
root movement, chord quality movement, a combination of root and quality. Conklin describes
the combination of root and quality as “crm. (cross product) cqm.” The cross product is a
common vector operation. This combination was chosen to generate the chord, taking into
account the intrinsic meaning for trance music. I should note that Conklin only used a sampling
of trance songs, so the results need to be further examined in terms of a larger trance corpus.
The goal in Machine Learning is the tool itself. Conklin states that the best algorithms,
like the ones presented and other viewpoints, can be determined for a corpus. To further explain
this, important aspects of a corpus can be identified, and the best algorithms can be defined and
used like the “crm (cross product) cqm” used for generation in this article. Conklin also mentions
that this method can be used for analysis.
4.2.4 Analysis of analysis
Kirlin and Yust’s 2016 article “Analysis of Analysis: Using Machine Learning to
Evaluate the Importance of Music Parameters for Schenkerian Analysis” aims to get a machine
to develop the music theory branch of Machine Learning. The goal of the article was to create an
76
analysis of a score using a model resembling Schenkerian Analysis. While this goal was not
realized, the article is still noteworthy because of what the researchers explored and the Machine
Learning tool they created. Schenkerian Analysis involves reducing the work in question by
finding patterns of ornamentation and elaboration. This task is difficult to teach a computer
without stipulating the exact features to find. Kirlin and Yust defined eighteen features and then
sorted them into categories. These became stepping stones towards creating a Machine Learning
tool.
First a hierarchy of notes was created using a tool called a “maximal outerplanar graph.”
Then the eighteen features were defined as they relate to the Left note, Middle note, and Right
Note. 7 The middle note has the following six features:
• SD-M The scale degree of the note (represented as an integer from 1 through 7, qualified
as raised or lowered for altered scale degrees).
• RN-M The harmony present in the music at the time of onset of the center note
(represented as a Roman numeral from I through VII or “cadential six-four”). For applied
chords (tonicizations), labels correspond to the key of the tonicization.
• HC-M The category of harmony present in the music at the time of the center note
represented as a selection from the set tonic (any I chord), dominant (any V or VII
chord), predominant (II, II6, or IV), applied dominant, or VI chord. (The dataset did not
have any III chords.)
• CT-M Whether the note is a chord tone in the harmony present at the time (represented as
a selection from the set “basic chord member” (root, third, or fifth), “seventh of the
chord,” or “not in the chord”).
7 These lists from pages 135-136 are shortened versions of the lists presented in Kirlin and Yust 2016
77
• Met-LMR The metrical strength of the middle note’s position as compared to the metrical
strength of note L, and to the metrical strength of note R (represented as a selection from
the set “weaker,” “same,” or “stronger”).
• Int-LMR The melodic intervals from L to M and from M to R, generic (scale-step values)
and octave generalized (ranging from a unison to a seventh).
(Kirlin and Yust 2016, 135)
The left and right notes together have the following twelve:
• SD-LR: scale degree (1–7) of the notes L and R.
• Int-LR: melodic interval from L to R, with octaves removed.
• IntI-LR: melodic interval from L to R, with octaves removed and intervals larger than a
fourth inverted.
• IntD-LR: direction of the melodic interval from L to R
• RN-LR: harmony present, as a roman numeral, in the music at the time of L or R
• HC-LR: category of harmony present in the music at the time of L or R, represented as a
selection from the set tonic, dominant, predominant, applied dominant, or VI chord.
• CT-LR Status of L or R as a chord tone in the harmony present at the time
• MetN-LR A number indicating the beat strength of the metrical position of L or R. The
downbeat of a measure is 0. For duple or quadruple meters, the halfway point of the
measure is 1; for triple meters, beats two and three are 1. This pattern continues with
strength levels of 2, 3, and so on.
• MetO-LR A number indicating the beat strength of the metrical position of L or R as an
oridinal variable and treated differently in the algorithm
• Lev1-LR Whether L, M, and R are consecutive notes in the music
78
• Lev2-LR Whether L and R are in the same measure in the music
• Lev3-LR Whether L and R are in consecutive measures in the music
(Kirlin and Yust 2016, 135-6)
These features are sorted into melodic, harmonic, metrical, and temporal categories as follows
• Melodic: SD-M, SD-LR, Int-LMR, Int-LR, IntI-LR, IntD-LR
• Harmonic: RN-M, RN-LR, HC-M, HC-LR, CT-M, CT-LR
• Metrical: Met-LMR, MetN-LR, MetO-LR
• Temporal: Lev1-LR, Lev2-LR, Lev3-LR
(Kirlin and Yust 2016, 136)
Then these categories are narrowed down and ranked by importance. This yields a hierarchy with
harmony at the top, followed be melody, then meter, and finally temporality.
The results showed that harmony is the most important marker for the reductions in terms
of harmonic context and identification of non-chord tones which is obvious for an analyst, but it
is important to have the computer achieve the same outcome. Melody is the next most important
marker, when harmonic context and non-chord tones do not give enough information about
scale-degree progression and interval patterns. Following this, meter is applied to anything that is
undetermined. Though this procedure seems obvious to the analyst, the hierarchy of steps is the
most important part to the computer because it gives the computer a specific order to follow. To
reiterate, this has not been fully tested, but it is useful for the understanding the creation of a
Machine Learning model.
4.3 Summary
79
In this chapter, I have shown a few of the recent developments in Machine Learning
applied to Music. I have traced the work of Darrel Conklin in particular, since he is a pioneer in
the field and continues to contribute to research. As noted above, I have not included a critical
analysis section, because the comments I would have made there are more appropriate to the
concluding chapter of the thesis, since they address the current state of the field.
80
Chapter 5- Conclusion
As noted earlier in this thesis, there are different streams in Computer Music Analysis
and I have concentrated on Music Information Retrieval (MIR), Optimization, and Machine
Learning. These streams often run in parallel because of their different goals. In my concluding
chapter I consider some of the most recent developments in Temperley’s work, offer methods to
bridge the parallels, and present solutions, both general and specific, for the five critical issues
mentioned in Chapter 1.
5.1 Further Temperley Research and Probability
Following the Cognition of Basic Musical Structures (2001) and Music and Probability
(2007) Temperley continued his work on borrowing music-like concepts from other disciplines.
Two articles, “Information Flow and Repetition in Music” (2014) and “Information Density and
Syntactic Repetition” (2015) adapt concepts from other disciplines to further Optimization.
The first article adapts uniform information density as a methodology, which is
probability based, borrowed from psycholinguistics and used to further explain parallelism—
when parts of a musical work are repeated in an exact or similar fashion and, thus, can be
considered as “parallel.” Temperley renamed the concept “information flow for repetition in
music” and tested it on the Barlow and Morgenstern corpus of musical themes8. Temperley
found that in parallel sections of a piece the repetition is often more chromatic, but where this is
the case the overall piece has a higher probability of smaller diatonic intervals. Thus, the
8 A set of 10,000 themes available in print under co-authors Barlow and Morgenstern
81
juxtaposition of chromatic and diatonic intervals makes the parallelism stand out. Temperley also
notes that harmony impacts the repetitions.
The second article looks even closer at parallelism and information flow. It states that
“less probable events convey more information” (Temperley and Gildea 2015, 1802).
Temperley’s conclusion is consistent with what is referred to in the analysis of prose in the
“Inference in an Authorship Problem” (Monstellar and Wallace 1963). This article explains that
specific words indicate more than others about an author. I notice that by potentially using
Poisson Process and negative binomials—two standard concepts in Probability and Statistics—
the specific author of a passage in a multi-author work can be found. This links to Temperley
because they follow the acceptance of Bayes’ Rule and are, therefore, part of Bayesian
Probability. Temperley’s most recent contributions to the field of Computer Music Analysis is
this multi-disciplinary borrowing of research tools. It is the interdisciplinary approach more than
any other development that holds greatest potential for the field
5.2 Machine Learning as a means to an end
Machine Learning concentrates on the tool itself. Since this is the most recent
development in computer research, and touches on Artificial Analysis, I have left the critique
until the conclusion. Because Machine Learning focuses on the tool, it does not have a larger
goal other than creating a better tool. This method is best used, in the grand scheme of Computer
Music Analysis, as a way to improve and bring other aspects of Computer Music Analysis closer
to its goals.
82
5.3 CompMusic as an example of Intersection
Some methods of using different streams of Computer Music Analysis have been
suggested by the authors cited throughout the thesis, but I would like to add my own suggestion:
researchers need to coordinate more closely in developing their work. I believe this will further
the goals of MIR, Optimization, and Machine Learning. I will focus on “CompMusic,” since it
brings together several previously unconnected avenues of research. In this regard, it can serve as
an example for the rest of the Computer Music Analysis community to emulate.
CompMusic, also known as Computational Models for the Discovery of the World’s
Music aims to investigate non-western music. More specifically, “its goal is to advance music
analysis and description research through focusing on the music of specific non-Western musical
cultures” (CompMusic Project and Workshops 2012, 8). The research project is supported by the
European Research Council and the coordinator, Xavier Serra is centred in Spain (CompMusic
Website http://compmusic.upf.edu/ ). CompMusic has used multiple streams to finish their
database within a few years—2011 to 2017. It seeks “to challenge the current Western centered
information paradigms” (CompMusic). It concentrates on five traditions of world music:
Hindustani, Carnatic, Turkish-makam, Arab-Andalusian, and Beijing Opera (CompMusic).
Music research has traditionally focused on Western Music, so the researchers for CompMusic
had to start from very little. Because of their short time frame, probability, statistical models, and
machine learning were used.
Within CompMusic, Machine Learning is used to solve specific problems that hinder the
progress of the database, such as in the structure analysis of Beijing Opera (Yang 2016). Initially,
resources such as probabilistic and statistical models were used to find novel ways to solve
specific problems. For example, with Maghreb, a Moroccan type of music (which is a subset of
http://compmusic.upf.edu/node/2
83
Arab-Andalusian music), annotation was difficult, so a tool was created the fix these issues
(Sordo et al. 2014). These methods were then adapted to be used in the database.
Since combining different approaches in Computer Music Analysis worked well for
CompMusic, I can foresee that the same could work for an MIR project like SIMSSA. To me, it
appears that researchers are not sharing their tools and procedures to an optimal degree. This is
partially due to a geographic issue, since researcher in MIR, Optimization, and Machine
Learning seem to be in different parts of the world. If David Temperley, Darrell Conklin, and
members of the SIMSSA project, such as Ichiro Fujinaga and Andrew Hankinson, were to share
their tools and approaches more closely, I believe that there could be many new creative
problem-solving methods. One example is the previously mentioned solution to the authorship
problem (mentioned in Further Temperley research).
5.4 Five general areas for improvement in the field
In writing this thesis, I have observed five general areas where improvement can be
made. What is needed is the following: first, an institutional critical analysis of the field;
secondly, a closer coordination between Optimization and Machine Learning; thirdly, research
into authorship; fourthly, exploration into new areas in Machine Learning; and lastly, closer
integration of various MIR resources in developing Optimization and Machine Learning.
1. Critical analysis in Computer Music Analysis as a distinct enterprise has not been performed
up to this point except for MIREX, Music Information Retrieval Evaluation eXchange.
84
MIREX, in brief, is a “framework for the formal evaluation of Music Information
Retrieval (MIR) systems and algorithms” (Downie 2008, 247). The goal of MIREX is to
investigate the specific tools and algorithms that are the building blocks of larger databases. This
method isolates approaches that are nearing the end of their life cycle and compares the
performance of systems with similar goals. This provides data about the accuracy and projected
utility of algorithms to researchers who want to work within MIR. MIREX, however, only looks
at MIR tools and concentrates heavily on methods examining audio data. It does not seem to
consider specific issues and how they can be solved using other streams in Computer Music
Analysis. Presumably this limitation will be overcome in the future.
2. Closer coordination between Optimization and Machine Learning.
Optimization and Machine Learning have different goals. Optimization, as I have defined
it, aims to use computers to mimic a human perception in music to understand the brain.
Machine Learning wants to create the specific tool to complete a specific task. However, the end
products created by both streams can be used to solve specific problems and tasks in MIR as
shown by CompMusic.
3. Research into authorship.
In terms of specific items for research, the areas of authorship and what makes a piece a
composer’s own work, has room for growth. This is important for proper identification of a
work’s author when it is unknown. This is a common problem with ancient music. Fresh
85
research could involve the methods put forth by Monstellar and Wallace in 1963 with recent
Cathé research on harmonic vectors and their uniqueness to the composer (Cathé 2010a), and
Temperley’s research on information flow (Temperley 2014), Bayesian Probability (Temperley
2007), and Syntactic Repetition (Temperley and Gildea 2015).
4. New areas in Machine Learning.
Machine Learning has concentrated on music generation and, by using probabilistic and
statistical analysis, the music generation can improve by keeping high probability events.
Machine Learning can also branch out into more analytical pursuits by mean of analytical
algorithms used in Optimization to ‘teach’ a computer to do analysis. This could improve the
current analysis available in Optimization and help to further mimic human perception in the
machine.
5. Closer integration of various MIR resources in developing Optimization and Machine
Learning.
I have offered specific examples of Optimization and Machine Learning aiding in the
creation of an MIR database. However, the opposite development could occur, where MIR
databases could be used to develop new research tools. In particular, Humdrum, an analytical
MIR tool, has a reserve of files that can be used for both Machine Learning and Optimization.
Similarly, various corpora of music assembled in MIR databases could be used as test sets for the
same purpose.
86
5.5 Persaud’s Five Critical Issues with Solutions
This thesis has begun the task of a critical analysis by showing different tools in
Computer Music Analysis as a whole. The tools selected are of different ages, sizes, and have
different researchers associated with them, but all aim to use the computer as their means to an
end. I shall conclude the thesis by returning to a set of five particularly acute problems in the
field, which I mentioned in my introduction
1. Human error: The problem of human error can be resolved by the creation of more accurate
algorithms—either by using harmonic vectors or one of the many Temperley models
2. Specifying input: Improvement in specifying input are imperative to the growth of the field.
A researcher reading articles or using pre-existing model needs to know what input should
be used. This can be fixed by specifying the input in greater detail in articles and by creating
genre-specific standards.
3. Consistent evaluation principles: It is necessary to extend principles used for MIREX to
other branches of Computer Music Analysis. Overall, more critical work needs to be done in
Computer Music Analysis. Having principles or guidelines will assist in this venture.
4. The interdisciplinary problem of a Lingua franca: To solve this problem Computer Music
Analysis should create universal or at least common standards and modes of discourse for
describing computer research in music. There are standards for MIR in terms of research
tools, algorithms, and systems but those researchers not working in the area are not aware of
them. And because many of the tools and procedures are borrowed from other areas of
computer research, they are applied in different ways in specialized music research.
87
5. “What’s the Point?”-Undefined goals. The broader audience needs to understand why
Computer Music Analysis is important. This can be overcome by looking at the broader
scope of each branch.
Figure 5 Graphic of 5 critical issues with solutions
In the end, there are multiple avenues to take when it comes to solving the Critical Issues in
Computer Music Analysis. Here, I have briefly given my own solutions to these issues and other
Critical
issues
1. Human Error
-More accurate
algorithms
2. Input Specification
-Greater specification in
all writings
5. "What's the point?"
-Larger scope
4. The
Interdisciplinary
Problem
-Common standards
and practices
3. Consistent
Evaluative principles
-More Critical work in
Computer Music
Analysis
88
aspects and direction for further research, but I have not explained the importance of Computer
Music Analysis.
Computer Music Analysis is vital to analysis as a whole because it often adds a
quantitative aspect and takes advantage of technology. By incorporating probability and
statistics and computational algorithms, the output of the analysis can rely on a mathematical
explanation for a qualitative phenomenon. Technology is a fast-growing field and its use in
music analysis is inevitable. These new software and hardware move from day to day use into
research and improve the field. However, like all changes, it has its own limitations and critical
issues. Thesis limits and problems is what fascinates me for this thesis. My overall conclusion is
that researchers need to take a critical stance on the discipline for it to grow quickly and
efficiently and is a necessity to further improve music analysis.
89
Bibliography
Alphonce, Bo H. 1980. “Music Analysis by Computer: A Field for Theory Formation,”
Computer Music Journal 4, no. 2: 26-35.
Antila, Christopher, Julie Cumming et al. 2014. “Electronic Locator of Vertical Interval
Successions. Montreal Digital Humanities Showcase UQAM. (Available as slides,
scripts, and poster via Elvis website)
Appleton, Jon. 1986. Review of Composers and the Computer by Curtis Roads. Musical
Quarterly 72: 124.
Birmingham, William, Roger Dannenberg, and Bryan Pardo. 2006. “Query by Humming with
the Vocal Search System.” Communications of the ACM 49, no.8: 49-52.
Bozurt, Bariş and Karaçali Bile. 2015. “A Computational Analysis of Turkish Makam Music
Based on a Probabilistic Characterization of Segmental Phrase,” Journal of Mathematics
and Music 9, no. 1: 1-22.
Burgoyne, John Ashley, Ichiro Fujinaga and J. Stephen Downie. 2016. “Music Information
Retrieval.” In A New Companion to Digital Humanities edited by Susan Schriebman, Ray
Siemens and John Unsworth, 213-228. Wiley.
Cambouropoulos, Emilios. 2006. “Musical Parallelism and Melodic Segmentation: A
Computational Approach,” Music Perception: An Interdisciplinary Journal 23, no 3:
249-268.
Cantus Ultimus. SIMSSA.
Cathé, Philippe. 2010a. “Harmonic Vectors and Stylistic Analysis: a Computer-aided Analysis of
the First Movement of Brahms’ String Quartet op 51-1,” Journal of Mathematics and
Music 4, no 2: 107-119.
-----. 2010b. “Nouveaux Concepts et Nouveaux Outils pour les Vecteurs Harmoniques”
Musurgia 17 no 4: 57-79.
“CompMusic Project and Workshops.” 2012. Computer Music Journal 36, no. 4: 8.
CompMusic. Music Technology Group, n.d. Web. Accessed 04 Mar. 2017.
Computer Music Journal. MIT Press Journals.
Conklin, Darrell. 2006. “Melodic Analysis with Segment Classes.” Mach Learn no 65: 349-360.
-----. 2008. “Discovery of Distinctive Patterns in Music.” International Workshop on Machine
Learning and Music.
90
-----. 2010. “Distinctive Patterns in the First Movement of Brahms’ String Quartet in
C Minor,” Journal of Mathematics and Music 4, no. 2: 85-92.
------. 2016. “Chord Sequence Generation with Semiotic Pattern,” Journal of Mathematics and
Music 10, no 2: 92-106.
Conklin, Darrell and Ian H. Witten. 1995. "Multiple Viewpoint Systems for Music
Prediction," Journal of New Music Research 24, no 1: 51-73.
Cuthbert, Michael Scott. "Music21: A Toolkit for Computer-Aided Musicology."Music21: A
Toolkit for Computer-Aided Musicology. N.p., n.d. Web. 07 Mar. 2017.
Dannenberg, Roger B. 2007. “A Comparative Evaluation of Search Techniques for Query-by-
Humming Using the Musart Testbed.” Journal of the American Society for Information
Science and Technology 58, no 5: 687-701.
De Haas, W. Bas et al. 2013. “Automatic Functional Harmonic Analysis,” Computer Music
Journal 37, no 4: 37-53.
Desain, Peter and Henkjan Honing. 1992. “Time Functions Function Best as Functions of
Multiple Times.” Computer Music Journal 16, no 2: 17-34.
-----. 1999. “Computational Models of Beat Induction: The Rule Based Approach” Journal of
New Music Research 28, no 1: 29-42.
Donnelly, Patrick J. and John W. Sheppard. “Classification of Musical Timbre Using Bayesian
Networks,” Computer Music Journal 37, no. 4: 70-86.
Downie, J. Stephen. 2003. “Music Information Retrieval.” Annual Review of Information Science
and Technology 37: 295-340.
Downie, J. Stephen 2003. “The music information retrieval evaluation exchange (2005-2007): A
window into music information retrieval research,” Acoustical Science & Technology 29,
no. 4: 247-255.
El-Shimy, Dalia and Jeremy R. Cooperstock. 2016. “User-Driven Techniques for the Design and
Evaluation of New Musical Interfaces,” Computer Music Journal 40, no 2: 35-46.
ELVIS Project: Music Research with Computers. < https://elvisproject.ca/>
Fujinaga, Ichiro and Susan Forscher Weiss. 2004. “Music” In A Companion to Digital
Humanities, edited by Susan Schreibman, Ray Siemens and John Unsworth. Oxford:
Blackwell.
Giraldo, Sergio and Rafael Ramírez. 2016. “A Machine Learning Approach to Ornamentation
Modelling and Synthesis in Jazz Guitar,” Journal of Mathematics and Music10, no
2: 107-126.
Giraud, Mathieu et al. 2015. “Computational Fugue Analysis,” Computer Music Journal 39, no 2
: 77-96.
91
Gulati, Sankalp et al. “Time- delayed melody surfaces for raga recognition.” Proceedings of the
17th International Society for Music Information Retrieval Conference (ISMIR'16), New
York City (USA).
Hankinson, Andrew, Evan Magoni, and Ichiro Fujinaga. “Decentralized Music Document Image
Searching with Optical Music Recognition and the International Image Operability
Framework.” In Proceedings of the Digital Library Federation Forum. Vancouver,
BC, 2015
Hardestry, Jay. 2016. “A Self-Similar Map of Rhythmic Components,” Journal of Mathematics
and Music 10, no. 1: 36-58.
Helsen, Kate et al. 2014 “Optical Music Recognition and Manuscript Chant sources,” Early
Music no. 42: 555–58.
Huron, David. 1988. “Error Categories, Detection and Reduction in a Musical Database,”
Computers and the Humanities no. 22: 253-264.
------. 2001. “Tone and Voice: A Derivation of the Rules of Voice-Leading from Perceptual
Principles,” The Journal of the Acoustical Society of America 19, no 1: 1-64.
The Humdrum Toolkit: Software for Music Research. 2001.
http://www.musiccog.ohio-state.edu/Humdrum/FAQ.html
Iñesta, José M., Darrell Conklin, and Rafael Ramírez. 2016. “Machine Learning and Music
Generation,” Journal of Mathematics and Music 10, no. 2 :87-91.
Kacprzyk, Janusz and W. Ras Zbigniew. 2010. Advances in Music Information Retrieval.
Berlin: Springer International Publishing
Karaosmanoǧlu, M. Kemal. 2012 “A Turkish Makam music Symbolic Database for Music
Information Retrieval: SymbTr,” Proceedings of ISMIR.
Keller, Robert et al. 2013. “Automating the Explanation of Jazz Chord Progressions Using
Idiomatic Analysis.” Computer Music Journal 37, no.4: 54-69.
Kirlin, Philllip B, and Jason Yust. 2016. “Analysis of Analysis: Using Machine learning to
Evaluate the Importance of Music Parameters for Schenkerian Analysis,” Journal of
Mathematics and Music 10, no. 2: 127-148.
Larson, Steve. 2004. “Musical Forces and Melodic Expectation: Comparing Computer Models
and Experimental Results,” Music Perception: An Interdisciplinary Journal 21, no. 4:
457-498.
Louridas, Panos and Christof Ebert. 2016. “Machine Learning,” IEEE Software: 110-115.
Manning, Peter et al. 2001. "Computers and Music." Grove Music Online. Oxford Music
Online. Oxford University Press, accessed April 11,2017
https://simssa.ca/assets/files/hankinson-decentralized-dlf2015.pdf
https://simssa.ca/assets/files/hankinson-decentralized-dlf2015.pdf
https://simssa.ca/assets/files/hankinson-decentralized-dlf2015.pdf
http://www.musiccog.ohio-state.edu/Humdrum/FAQ.html
92
Meeus, Nicolas. 2003. “Vecteurs harmoniques” Musurgia 10, no 3: 7-34.
Meredith, David. 2016. Computational Music Analysis. Springer Cham, Heidelberg, New York,
Dordrecht, and London: Springer International Publishing Switzerland.
Monsteller, F and David L Wallace.1963 “Inference in an Authorship Problem,” Journal of the
American Statistical Association 58, no. 302: 275-309.
Music21: A Toolkit for Computer-Aided Musicology. < http://web.mit.edu/music21/>
Orio Nicola. 2008 “Music Indexing and Retrieval for Multimedia Digital Libraries.” In
Agosti M. (eds) Information Access through Search Engines and Digital Libraries.
The Information Retrieval Series, vol 22. Springer, Berlin, Heidelberg
Pardo, Bryan. 2008. “Music Information Retrieval,” Communications of the ACM 49, no. 8
: 29.
Pardo, Bryan et al. 2008. “The VocalSearch Music Search Engine,” JCDL.
Patrick, Howard P. 1974. “A Computer Study of a Suspension-Formation in the Masses of
Josquin Desprez,” Computers and the Humanities 8: 321-331.
Piantadosi, Steven T. et al. 2011. “Word Lengths Are Optimized for Efficient Communication,”
Proceedings of the National Academy of Science of the United States of America
108, no. 9: 3526-3529.
Ponce de Léon, Pedro J. et al. 2016. “Data-Based Melody Generation through Multi-Objective
Evolutionary Computation,” Journal of Mathematics and Music 10, no. 2: 173-192
Roads, Curtis et al. 1986. “Symposium on Computer Music Composition,” Computer Music
Journal 10, no 1: 40-63.
Search the Liber Usualis. SIMSSA.
SIMSSA-Single Interface for Music Score Searching and Analysis.
Smith, Leigh M. and Henkjan Honing. 2008. “Time- Frequency Representation of Musical
Rhythm by Continuous Wavelets,” Journal of Mathematics and Music 1, no. 2: 81-97
Sordo, Mohamed et al. 2014. “Creating Corpora for Computational Research in Arab-Andalusian
Music” Proceeding of the 1st International Digital Libraries for Musicology workshop
London (UK). < http://mtg.upf.edu/node/3028>
Temperley, David. 2001. The Cognition of Basic Musical Structures. Cambridge, Massachusetts
and London, England: MIT Press.
------. 2007. Music and Probability. Cambridge: MIT Press.
------. 2010. “Modelling Common Practice Rhythm,” Music Perception: An Interdisciplinary
Journal 27, no. 5: 355-376.
93
------. 2014. “Information Flow and Repetition in Music,” Journal of Music Theory 58, no. 2:
155-178.
Temperley, David and Christopher Bartlette. 2002. “Parallelism as a Factor in Metrical
Analysis,” Music Perception: An Interdisciplinary Journal 20, no. 2: 117-149.
Temperley, David and Danial Gildea. 2015. “Information density and Syntactic Repetition,”
Cognitive Science, no. 139: 1802-1823.
Tenkanen, Atte. 2010. “Tonal Trends and α-Motif in the First Movement of Brahms’ String
Quartet op. 50 mvt. 1,” Journal of Mathematics and Music 4, no. 2: 93-106.
Viglilensoni, Gabriel et al. 2011. “Automatic Pitch Recognition in Printed Square-Note
Notation.” Proceedings of 12th International Society for Music Information Retrieval
Conference Miami, Florida: 423-428.
Wang, Ge, Perry R. Cook, and Spencer Salazar. 2015. “ChucK: a Strongly Timed Computer
Music Language,” Computer Music Journal 29, no. 4: 10-29.
Yang, Yile. 2016. “Structure Analysis of Beijing Opera Arias” Master Thesis, Universitat
Pompeu Fabra, Barcelona (Spain).