In Search of Computer Music Analysis: Music Information Retrieval, Optimization, and Machine Learning from 2000-2016 In Search of Computer Music Analysis: Music Information Retrieval, Optimization, and Machine Learning from 2000-2016 Felicia Nafeeza Persaud Thesis submitted to the Faculty of Graduate and Postdoctoral Studies In partial fulfillment of the requirements For the MA degree in Music Department of Music Faculty of Arts University of Ottawa © Felicia Nafeeza Persaud, Ottawa, Canada, 2018 ii Table of Contents Abstract ........................................................................................................................................ vii Acknowledgements .................................................................................................................... viii Glossary ......................................................................................................................................... ix Chapter 1- Introduction and Literature Review........................................................................ 1 1.1.1 General Mission Statement 2 1.1.2 A Critical Overview of Computer Music Analysis: Music Information Retrieval, Optimization, and Machine Learning ...................................................................................... 3 1.1.3 Persaud’s Five Critical Issues ......................................................................................... 4 1.2 A Sketch of the Relationship between Computers and Music 9 1.2.1 Composition and Performance ....................................................................................... 9 1.2.2 Applications in Music Theory and Analysis ................................................................ 12 1.2.2.1 Recurrent features: Databases ................................................................................... 12 1.2.2.2 Structural Models: Analysis and Counterpoint ......................................................... 14 1.2.3 Music Information Retrieval Versus Optimization ...................................................... 15 1.3 Literature Review 17 1.3.1 David Temperley The Cognition of Basic Musical Structures (2001) ......................... 17 1.3.2 David Temperley and Christopher Bartlette “Parallelism as a Factor in Metrical Analysis” (2002) .................................................................................................................... 19 iii 1.3.3 David Temperley Music and Probability (2007) ......................................................... 20 1.3.4 David Huron “Tone and Voice: A Derivation of the Rules of Voice-Leading from Perceptual Principles” (2001) ................................................................................................ 21 1.3.5 Darrell Conklin and Ian H. Witten “Multiple Viewpoint Systems for Music Prediction” (1995)..................................................................................................................................... 22 1.4 Conclusion 23 Chapter 2- Music Information Retrieval .................................................................................. 25 2.1 Introduction 25 2.1.1 MIR Overview and Applications .................................................................................. 26 2.2 The MIR Tools 30 2.2.1 Vocalsearch .................................................................................................................. 30 2.2.2 SIMSSA ........................................................................................................................ 32 2.2.3 Donnelly and Sheppard Bayesian Network Algorithm ................................................ 37 2.3 Critical Analysis 38 2.3.1 VocalSearch .................................................................................................................. 38 2.3.2 SIMSSA ........................................................................................................................ 39 2.3.3 Bayesian Networks ....................................................................................................... 41 Chapter 3-Optimization ............................................................................................................. 42 3.1 Preference Rules 45 3.1.1 Metrical Structure ......................................................................................................... 45 iv 3.1.2 Contrapuntal Structure .................................................................................................. 49 3.1.3 Tonal-Pitch Class Representation and Harmonic Structure ......................................... 51 3.1.4 Melodic Phrase Structure.............................................................................................. 53 3.1.5 Parallelism .................................................................................................................... 54 3.2 Probabilistic and Statistical models 55 3.2.1 Introduction .................................................................................................................. 55 3.2.2 David Temperley’s use of Bayesian Probability .......................................................... 57 3.2.3 Statistics and Harmonic Vectors ................................................................................... 61 3.2.4 Distinctive Patterns using Bioinformatics and Probability ........................................... 63 3.3 Critical Analysis: Optimization 65 3.3.1 Preference rules: Metrical Structure ............................................................................. 66 3.3.2 Preference Rules: Counterpoint .................................................................................... 66 3.3.3 Preference Rules: Tonal-Class representation and Harmony ....................................... 67 3.3.4 Melodic Phrase Structure and Parallelism .................................................................... 68 3.3.5 Probability and Statistics .............................................................................................. 69 Chapter 4-Machine Learning .................................................................................................... 71 4.1 Introduction to Machine Learning 71 4.2 Outline of Selected Tools 72 4.2.1 Ornamentation in Jazz Guitar ....................................................................................... 72 4.2.2 Melodic Analysis with segment classes ....................................................................... 73 v 4.2.3 Chord sequence generation with semiotic patterns ...................................................... 74 4.2.4 Analysis of analysis ...................................................................................................... 75 4.3 Summary 78 Chapter 5- Conclusion ................................................................................................................ 80 5.1 Further Temperley Research and Probability 80 5.2 Machine Learning as a means to an end 81 5.3 CompMusic as an example of Intersection 82 5.4 Five general areas for improvement in the field 83 5.5 Persaud’s Five Critical Issues with Solutions 86 Bibliography ................................................................................................................................ 89 vi Table of Figures Figure 1 Graphic representation of the five critical issues ............................................................................ 8 Figure 2 Graphic representation of MIR…………………………………………………………………..29 Figure 3 Graphic Representation of Optimization ..................................................................................... 44 Figure 4 Beat Hierarchy .............................................................................................................................. 46 Figure 5 Graphic of five critical issues with solutions ................................................................................ 87 vii Abstract My thesis aims to critically examine three methods in the current state of Computer Music Analysis. I will concentrate on Music Information Retrieval, Optimization, and Machine Learning. My goal is to describe and critically analyze each method, then examine the intersection of all three. I will start by looking at David Temperley’s The Cognition of Basic Musical Structures (2001) which offers an outline of major accomplishments before the turn of the 21st century. This outline will provide a method of organization for a large portion of the thesis. I will conclude by explaining the most recent developments in terms of the three methods cited. Following trends in these developments, I can hypothesize the direction of the field. viii Acknowledgements I have appreciated all the help I have had in this thesis writing process. From professors, to friends, to family, everyone deserves a thank you. Firstly, I must thank my thesis supervisor Dr. P. Murray Dineen who has guided me throughout this process. His feedback and support has helped me immensely to improve as a writer. I am grateful that Dr. Dineen has helped me to gain invaluable skills over the last two years in my Master of Arts. I would also like to thank my committee members, Dr. Roxanne Prevost and Dr. Jada Watson, who have provided amazing feedback and discussion. They have helped me greatly in creating the final thesis. I would like to thank Dr. Julie Pedneault-Deslauriers as well for serving as a member of the committee for the thesis proposal. I am grateful to the rest of my professors and colleagues at the University of Ottawa for everything I have learned at the University of Ottawa. It has helped to guide me in creating this thesis and has helped me improve myself. My friends and family also deserve a thank you for going through sections and drafts throughout this process. A special thank you to my dad, sister and fiancé who went through my first draft. It has come a long way since then. ix Glossary Algorithm: a set of steps followed in calculations or problem-solving operations to achieve some end result. Computer Music Analysis: analysis of music using a computing software or algorithms. This is a ‘catch all’ term referring to all of the smaller aspects using computers for music analysis including, Music Information Retrieval (MIR), Optimization, and Machine Learning. According to a 2016 book, entitled Computational Music Analysis, by David Meredith, a general definition is “using mathematics and computing to advance our understanding of music […] and how music is understood.” (Meredith 2016) Machine Learning: teaching of a computer to analyse and find features, so as to gain knowledge of musical conventions. Machine learning is a route that is parallel to MIR, Preference Rule Systems (PRSs), and Probabilistic models. Like a human learning, “a computer learns to perform a task by studying a training set of examples.” (Louridas and Ebert 2016) Following this, a different example is given, and the effectiveness is measured in several ways depending on the task. Music Information retrieval (MIR): research concerned with making all aspects of a music file (melody, instrumentation, form etc.) searchable. MIR will eventually lead to a search engine for music. Optimization: a term used in calculus or business that refers to maximizing use of space or resources. Resources are still important in the musical sense, but they refer to time and energy. This is done through accessibility and more efficient computer tools and algorithms. Examples given below to show that it is possible to optimize analysis by integrating more mathematics and computer tools. Piano-roll input: a graphic representation of a score with notes on the vertical axis and timing in millisecond s on the horizontal. Preference rule system (PRS): a set of instructions for a computer in a hierarchy. These can be created as a system where there are multiple sets with a hierarchy within “criteria for evaluating possible analysis of a piece.” (Preface Temperley and Bartlette 2002) This is known as a rule- based grammar in Manning et al 2001. Parallelism rule (as a type of preference rule): the idea that the similar construction of a musical element be regarded as important in a PRS.” Prefer beat intervals of a certain distance to the extent that repetition occurs at that distance in the vicinity.” (Temperley and Bartlette 2002, 134) Probabilistic Methods: a method of analysis based in probability. The word “Probabilistic” means for an idea to be based on or adapted to a theory of probability, this term encompasses even distant uses of probability in computer models. This is a term used by Temperley referring to a computational method that uses probability. Chapter 1- Introduction and Literature Review 1.1 Overview My interest in Computer Music Analysis stems from my fascination with interdisciplinarity in music analysis. Computer Music Analysis intersects with mathematics, computer science, psychology, and, of course, music. My thesis will take a small sampling of interdisciplinary tools in Computer Music Analysis from Music Information Retrieval (MIR), Optimization, and Machine Learning. MIR aims to make music searchable, primarily through online databases. Optimization encompasses many different tools with the eventual goal to understand human perception of music. Machine Learning, on the other hand, teaches the machine, often a computer, to perform a task, making the tool itself the end goal. For this thesis, I preface my work with Peter Manning’s entry, entitled “Computers and Music,” in the Grove Dictionary of Music and Musicians as a way to understand the existing conventions and uses of computers in music prior to the year 2000. Manning does not offer a specific definition, but instead discusses the common uses and devices of the computer as it relates to music. He states, “Computers have been used for all manner of applications, from the synthesis of new sounds and the analysis of music in notated form to desktop music publishing and studies in music psychology; from analysing the ways in which we respond to musical stimuli to the processes of music performance itself.” (Manning et al 2001). This quote exemplifies how interdisciplinary Computer Music Analysis is. Manning’s work touches on composition, performance, and analysis addressing a key critical issue: human error. A computer is only useful because of its human programmer no matter what the application. With every new application of the computer—or tool—there are more issues and limitations. For example, a tool 1 2 that identifies duple metrical structures cannot identify compound meter and has a margin of error. The idea of the human creation of a computer model, and its limitations, is the focus of my thesis and is explored in three branches of Computer Music Analysis: Music Information Retrieval (MIR), Optimization, and Machine Learning. Manning’s entry coupled with the Literature Review provide a foundation on which I build this thesis. 1.1.1 General Mission Statement This thesis aims to critically examine specific tools in Music Information Retrieval (MIR), Optimization –a term referring to improvements in Preference Rule Systems and Probabilistic Models– and Machine Learning individually. The exploration of MIR, Optimization, and Machine Learning will do two things: act as a survey of the literature and show trends within these subfields. In the conclusion, I show how the three aspects can interact. Most branches in Computer Music Analysis run in parallel (Meredith 2016), and few researchers take inspiration from the parallel branches. It is not my intent to show that there is no interaction, but merely to show opportunities for more interaction. To survey the literature, I first look at the developments—prior to the turn of the 21st century, the period when the field of Computer Music Analysis was born. The background comes primarily from David Temperley’s book the Cognition of Basic Musical Structures (2001) as well as from works covered in the Literature Review and the Sketch of the Computer-Music Relationship sections. To explore current trends, I restrict myself primarily to the literature from 2000 to 2016. These texts build from the turn of the century and show how researchers utilize new technology to push the field further. This area constitutes the body of the thesis and shows 3 where the field has gone and where it is going. Additionally, using a critical examination of the literature, I explore recent trends in Computer Music Analysis and offer points of entry for new research. I use models drawn from World Music Research. I concentrate on the three areas of the field, as mentioned above. This research can be applied to other similar areas like Mathematical Music Theory, which represents basic musical structures in a mathematical form, or Computational Musicology, which investigates the simulation of computer models in music. 1.1.2 A Critical Overview of Computer Music Analysis: Music Information Retrieval, Optimization, and Machine Learning The current state of the field in Computer Music Analysis sees a shifting of positions among the three areas: Music Information Retrieval (MIR), Machine Learning, and Optimization. Music Information Retrieval is the most rapidly evolving field of the three; due in large part to developments in and the spread of computers and the Internet – specifically an increase in computing capacity. The second field is Machine Learning; this is similarly due to computing capacity and the Internet, but also because of its widespread use in other disciplines, which music researchers are drawing from at greater and greater lengths. The third field is Optimization, which has stagnated. However, Optimization borrows from other disciplines, and contributes to the advances made by MIR and Machine Learning. As such, we can see that Optimization is currently evolving, even if other two fields are moving at a much greater pace. To sketch in greater detail, there are crucial differences and overlapping areas between the three fields that explain their current situations. Machine Learning is a precise endeavor that aims to create specific tools to meet well-defined goals or serve finite tasks. MIR, on the other hand, works with large bodies of data and serves goals that are often ill-defined if not undefined. 4 Conversely, Optimization is presently in a state of coming together– in fields other than music – and, therefore, would appear not to be advancing as quickly. But, in fact, Optimization in its current state is laying a framework for major developments. Though there is overlap between MIR, Optimization, and Machine Learning, it is limited to a few researchers and projects. Examples include the following: Darrell Conklin using probability and bioinformatics in conjunction with Machine Learning; Giraud et al, who are creating a tool for MIR and Optimization; and, most notably, CompMusic—a database for six subsets of World Music—that uses both Optimization and Machine Learning to create an MIR database. These will be discussed in the later parts of the thesis. 1.1.3 Persaud’s Five Critical Issues From the critical perspective adopted in this thesis, several issues arise. Some of which have been addressed in the literature surveyed. Unfortunately, they have not been brought together in such a fashion to yield an overall critical perspective of the current field. To this end, I have isolated five central critical issues, which I address here. During the remainder of the thesis, I make reference to these from time to time, by means of a numbered list set out below and in Fig. 1. I refer to these as Persaud Critical Issues, since, to my knowledge they have not been catalogued in this fashion. Persaud’s Critical Issue 1. Human Error. Firstly, data entry is still largely human-dependent and with large amounts of data—like with an MIR database—a person will often make mistakes. This was discussed by both Peter Manning in his definition and David Huron about The Humdrum Toolkit. As Huron and Manning 5 explain, the machine is limited by the programmers themselves. Outside of research, artificial intelligence (AI) is being used to complete simple tasks and can learn, by itself, various other tasks. Similarly, quantum computers are becoming more common instead of using simple binary code. Both of these devices are making their way into day-to-day life and eventually will end up in multidisciplinary research. In terms of what is being used currently, data entry could be improved by the application of Machine Learning. Certain parameters could be handled by machine input rather than human input. These advances are being made elsewhere but have not been seen in the area of music research, except in world music database creation [see conclusion of the thesis]. We need to see more inroads made by Machine Learning in the analysis of Western music and Ancient music. Human limitations are not only evident in data entry but also in setting parameters, in annotations, and in the creation of algorithms in general (Huron 1988). Setting parameters is a vital aspect of Optimization. It enables the most accurate analysis of the data provided and, therefore, generate more accurate outcomes. Because the parameters are calibrated by humans, there is an implicit limitation. This similar to the annotation of pieces in MIR databases and the creation and application of an algorithm in Machine Learning. Persaud’s Critical Issue 2. Input Specification Input modes are not well-defined by researchers to be easily understood. To a certain extent, this is a problem of writing and communication, one that arises from research silos. This could be resolved by creating common standards and modes of discourse for describing computer research in music, and specifically the modes of input involved. Complementary to input specification due to research silos, input modes change from generic type to type. For 6 example, popular music is not often scored, while ancient music is not performed in its original form. As such, the input for popular music would most likely be an audio file, while for ancient music, an image of a score is more likely. Furthermore, the input could differ from a full form, such as all tracks on a song, to a simpler form, such as main melody only. This further complicates the situation. In addition to genre, input modes depend upon translation into computer compatible formats. Though an MP3 audio format is widely available, it is not easily readable for analytical us. As a work around, researchers use either a MIDI format, or the input is further broken down into tracks. In the study of ancient music, image data cannot be read by a computer and must endure multiple passes of analysis using computer-based algorithms and processes, but this method still yields errors. Persaud’s Critical Issue 3. No Consistent Mode of Evaluation for Non-MIR Tools Music Information Retrieval Evaluation eXchange (MIREX), is a method of formally evaluating MIR systems and algorithms. This does not exist for other branches of Computer Music Analysis like Optimization and Machine Learning. These unknown standards for algorithms and tools result in an end-product that may not have any further use beyond its creation. Furthermore, without a widespread knowledge of the tools and algorithms, they cannot be used for MIR or other branches of Computer Music Analysis simply due to unknowingness. Persaud’s Critical Issue 4. The Interdisciplinary Problem (Downie 2003) The Interdisciplinary Problem is one that is examined and discussed by Stephen J. Downie in his article “Music Information Retrieval.” Though this is an issue in MIR specifically, 7 it extends to other branches of Computer Music Analysis such as Optimization and Machine Learning. It simply refers to the lack of coordination between researchers and research fields when it comes to creating a tool and the different uses of the same terminology. Some tools and systems are made overly difficult for someone without programming knowledge, even though the outcomes of the tool would be useful to them. Persaud’s Critical Issue 5. “What’s the point?” Lack of Defined Goals and Frameworks Research in Computer Music Analysis often comes as small creations and discoveries rather than a large finished tool. As Computer Music Analysis often concentrates on the method to an output, these smaller steps cannot be used by another researcher until it is completed. Furthermore, the specific usage of the individual step is unknown or has very few applications, if any, so the “What’s the point?” argument returns. This argument also does not take into account the full potential of each field and is created by a lack of understanding for the goals of each branch in Computer Music Analysis. 8 Figure 1 Graphic representation of the five critical issues Critical issues 1. Human Error -data Entry -human limitations 2. Input Specification -undefined -generic change -computer compatible 5. "What's the point?" -undefined goals and framework 4. The Interdisciplinary Problem -lack of coordination -terms used differently 3. Consistent Evaluative principles -other than for MIR 9 1.2 A Sketch of the Relationship between Computers and Music 1.2.1 Composition and Performance Music and computers have a lengthy history that touches on three fields: composition, performance, and music research. To understand the current state of Computer Music Analysis, the history needs to be discussed. In fundamental terms, the above-mentioned disciplines helped shape Computer Music Analysis In terms of composition, computer music was one of the principal areas of early research. One main source for understanding this research was the Computer Music Journal, founded in 1977. This journal examines crossroads between computers and music such as composition with computers, MIDI, synthesizer theory, and analytical models using the computer (Computer Music Journal). Though the material is broad, there have been specific issues that address analytical models included in this thesis. This publication includes articles about CompMusic— an organization committed to database creation for World Music—, which I will return to in my conclusion. The publication also includes Donnelly and Sheppard’s “Classification of Timbre Using Bayesian Networks” which is one of the few instances of cross-branch research. While the original inroads made into computer music composition were slow and burdened by clumsy and awkward hardware, this situation soon changed. Curtis Roads is a composer of electronic music and an author. His 1985 book, Composers and the Computer, is interview-based to get the composer’s perspective. According to Appleton’s review, Roads’s main point is that arts and science are becoming closer to create new music (Appleton 1986). Furthermore, Appleton explains the importance of understanding the means in music creation and the method of computer usage is vital for listening to computer music compositions. “If […] 10 the principles of serial technique are necessary to an intelligent hearing of the works of Webern, Carter, Babbitt, or Boulez, then surely an appreciation of the principles of algorithmic compositional techniques and the possibilities of digital sound synthesis are required for the through audition of works by Xenakis, Chowning, Risset, and Dodge (Appleton 1986, 124).” This quote situates the importance of method in music and how the new computer capabilities enhance the composition process. In 1986, a symposium on computer music composition was held and a review was written in the Computer Music Journal. This symposium was a “product of a questionnaire sent in 1982,1983, and 1984, to over 30 composers experienced in the computer medium” (Roads et al, 40). The review examines, in a similar manner to Roads’ book, what brought the composer to the computer and how they choose to use it. The review states that “articles in Computer Music Journal and other publications point to the broad application of computers in musical tasks, especially to sound synthesis, live performance, and algorithmic or procedural composition” (Roads et al 1986, 40). Music Representation Languages (MRLs) are another important milestone in the history of Computer Music Analysis. An MRL is a type of format that the computer can understand (Downie 2003), and these are vital to composition. An example of this is Musical Instrument Digital Interface commonly known as MIDI. MIDI revolutionized sound processing by enabling the user to store real input, such as playing on a synthesizer, into movable and changeable blocks of sound easily understood by the computer. It has two-way variability because there is a disparity from the player of an external synthesizer and the producer can move and change the blocks of sound after the player has played (Manning et al 2001). It provides more control to all parties for its end result and MIDI is now widely used. 11 Another significant creation in computer music composition is music notation software. This software, like Finale, often include a MIDI playback. According to Manning “it quickly became apparent that major composition and performance possibilities could be opened up by extending MIDI control facilities to personal computers” (Manning et al 2001, 169). This new MIDI playback on music notation software gave the composer the ability to create music digitally with the option to hear what it would sound like. Computer music composition, of course, continues today. Recent developments include ChucK, a programming language specifically for music and is prevalent for laptop orchestra use (Wang et al 2015), and melodic idea generation and evaluation—which is the creation of a motive and the assessment of it (Ponce de Leon et al 2016). Both tools are used for the creation of musical ideas. ChucK, for example, can create a complete piece in real time. Though computer music composition is important to the relationship between computers and music, it will not be further discussed in this thesis. The field of Computer Music Analysis has moved away sufficiently to be treated as a separate endeavour, at this point. It should be noted that composition with computers is only one aspect of computer assisted musical creation. According to Manning’s “Computers and Music”, the uses of computers in music can be separated into two branches: performance and music theory. For performance, MIDI is highlighted as a major development, but more performer-like methods are being developed such as DARMS (Manning et al 2001). DARMS is a “comprehensive coding system […] which has the capacity to handle almost every conceivable score detail” (Manning et al 2001, 176). For current performances, Laptop Orchestra is becoming more prevalent at universities. Though computer use in performance is important, I will not be concentrating on it. 12 1.2.2 Applications in Music Theory and Analysis Music research uses for computers are more complex and have been based around two facets: 1. The first is identification of recurrent features. Recurrent features are an important aspect of analysis as it can show that a set of items is a pattern rather than a coincidence. “One of the earliest uses of the computer as a tool for analysis […] involves the identification of recurrent features that can usually be subjected to statistical analysis.” (Manning et al 2001, 174). Statistical analysis further strengthens a pattern by utilizing quantitative measures. Statistical analysis is still present today and will be discussed in Chapter 3. 2. The second concerns the application of two kinds of “Rule-based analysis.” Analysis used for generative purposes and analysis used in and of itself or as an analytic method. As Manning describes rule-based analysis in general: “rule-based analysis methods presuppose that the processes if composition are bound by underlying structural principles that can be described in algorithmic terms. […] At this level it becomes possible to establish links with computer-based research into musical meaning” (Manning et al, 174). Now I will present examples of both facets. Both show, in a simple fashion, the above two ideas and, also, demonstrate the main sources of error and limitation in Computer Music Analysis. 1.2.2.1 Recurrent features: Databases A major database software for computer music research was the Humdrum Toolkit created by David Huron and its files finished revision in 2001. Huron is based at the Ohio State University School of Music and commonly researches Music Cognition, Computational 13 Musicology, and Systematic musicology. The Humdrum Toolkit runs using UNIX software tools, but it is compatible with previous versions of Windows and Mac platforms. This database gives the public access to information on scores, and renotes scores in a format that is useable with the Humdrum Toolkit. It is also possible to import or export files from Finale software for scores that are not available in the database. “Humdrum” itself is composed of the Humdrum Syntax and Humdrum Toolkit. The syntax, like other programming language, enables the user to search for files and other elements using the Humdrum Toolkit. This programming language, however, must be learned to adequately use the software. The Humdrum Toolkit is used for recurrent features because of its capabilities. The capabilities of Humdrum include searching between sets of pieces for motives, syncopation, harmonic progression, dynamics, pitch, and meter. These elements of music can be searched by genre, by composer, and by any other grouping for an overarching and statistical analysis, therefore, this use for computers in music aligns with Manning’s definition in Grove. However, some of the above-mentioned elements are more easily found using the Humdrum Toolkit software than others. Firstly, this is due to “the interdisciplinary problem” since some queries need a complex search using programming language. Programming knowledge is something that is not consistent between all database users. Secondly, human error is always a possibility with a completely manmade database. Like all tools, this one is imperfect. Huron found three reasons for mistakes when using computers because of “Humdrum” (Huron 1988). They are as follows: 1. Errors in actual score 2. Errors in transcription of score 3. Errors by program 14 These errors according to Huron, are human. 1 1.2.2.2 Structural Models: Analysis and Counterpoint P. Howard Patrick in 1978 used computers for analysis of suspensions in the Masses of Josquin Des Prez. Patrick made an important distinction between music theory for the composition student and music theory for the computer rule-based structural model: music theory is often a description, but a computer needs a set of steps to follow. To get the computer to properly parse and identify the data, Patrick looked at the errors and changed criteria as needed. (Alphonce 1988) Arthur Mendel inspired Patrick’s study in a seminar by looking for the criteria of structure in Josquin’s work. Patrick outlined the goal of this project as getting computer programs to print a reduction of a score by, first, going through a succession of tests and then finding the “most consonant pitch” (Patrick 1974, 325). Patrick tested three randomly selected texts to outline the problems that he described as “Non-Suspensions (Patrick 1974,326)” and “Problem Suspensions.” (Patrick 1974, 328) These errors were due to the computer’s now ‘preconceived’ notion of what a suspension is, but the largest error, as explained by Patrick, are the questions that people ask the computer. Criticism for this type of analysis is that it only yields a result that can be found by a person doing the research by hand and thus is susceptible to the same kinds of errors humans might make. As stated by Patrick, “The limitations of the computer are overshadowed by the inherent limitations of the user.” (Patrick 1974,321) This means that the computer can find any 1 These sources of error are paraphrased from Huron 1988, 254 15 solution, but only if it can be fathomed by the user. Some larger scale problems are too difficult to solve without help from another source, such as a computer. In this sense, Patrick thought the computer-aided analysis route was the most useful. This set the groundwork for development in Computer Music Analysis that do not mimic “research by hand.” 1.2.3 Music Information Retrieval Versus Optimization Music Information Retrieval (MIR) is interdisciplinary, due to its computer-based information, and originated from the same point as Optimization. But, the two fields have different goals. By Music Information Retrieval, I mean the sector of Computer Music Analysis that aims to create a database, either analytical or non-analytical, drawn from characteristics of a musical document such as a score, so as to further research. MIR aims to look into musical documents to find features or commonalities between different works of music. MIR approaches recurrent features by creating a database with annotations, or another searchable method, so a user can search for a specific feature. Optimization, which concerns itself with preference rules, probability, and statistical models, does not detach itself from the human experience. The following quotation demonstrates the distinctiveness of Optimization for MIR: “Computational research in music cognition tends to focus on models of the human mind, whereas MIR prefers the best‐performing models regardless of their cognitive plausibility” (Burgoyne et al 2016, 214). In summary, Optimization is tied to music cognition (Burgoyne et al 2016) while MIR is not. MIR has turned into an ever-growing and prevalent field due to the internet (Fujinaga and Weiss) and is present in commonly used items like Google Books (Helsen et al 2014), but it 16 originally came from a small field of research in comparison. According to Burgoyne et al, in 1907, C.S. Myers studied Western folksong using MIR, which required tabulation done by hand examining the intervals present in folksongs. Similarly, in ethnomusicology a year earlier, 1906, a similar method was used to find features in Non-Western music to differentiate it from Western music (Burgoyne et al 2016). The practice of “Finding Features” has become a standard use for Computer Music Analysis. These are the earliest examples of Music Information Retrieval even though the term itself was not used until the 1960s. From 1907 to the 1960s Music Information Retrieval was ignored, but, “interest grew in computerized analysis of music” (Burgoyne et al, 215) because of the prevalence and accessibility of computers. The beginning of MIR concentrated on methods to input music into the computer (Burgoyne et al) such as notational software or standardized audio file formats like MP3 and MIDI (Fujinaga and Weiss). This made it possible for the computer to ‘understand’ the musical items. These methods grew into more complex software applications like Humdrum which was discussed in section 1.2.2.1 This history of MIR is written in brief, however it gives a basic outline of its developments that is important to the thesis. Since, this field re-emgered because of the internet and increasing availability of computers, the tabulations could be done using a software instead of by hand. After creating a form of music that can be understood by a computer, databases, like Humdrum, were more easily produced. Creating a database of music recognizable by a computer, according to Andrew Hankinson—a Digital Humanities and Medieval Music researcher—, is the first step in a large retrieval system (Helsen et al 2014). Large databases of different varieties will be further discussed Chapter 2. 17 1.3 Literature Review I aim to explore the major works I use for this thesis in the literature review. The order is to mirror the order of the thesis: first Optimization then Machine Learning. MIR has a more complicated Literature base, so I discuss it in Chapter 2. I commence with David Temperley’s works in chronological order because I incorporate their organization tools and major ideas into Chapter 3. Parallelism is highlighted because it grows from a single- line preference rule to a multi-level set of ideas. Since perception is key to Optimization, I include David Huron for the link from computers to perception. Huron’s paper examines voice-leading rules, which are common knowledge and vital to music theorists, thus act as a stable starting point. The final work is Darrell Conklin and Ian Whitten’s paper investigating the multiple-viewpoint system. This article is one of the first that examine Machine Learning in music and should, therefore, be included. 1.3.1 David Temperley The Cognition of Basic Musical Structures (2001) David Temperley is centred at the Eastman School of Music and writes extensively on music theory and music cognition. I will concentrate on specific sections of his book The Cognition of Basic Musical Structures (2001), that explain Preference Rule Systems or Computational models. Temperley outlines the following six Preference Rule Systems in the first half of the book, Metrical Structure, Melodic Phrase Structure, Contrapuntal Structure, Tonal- Pitch-Class Representation, Harmonic Structure, and Key Structure, and the second half explores the expectation of the listener, Rock Music, African music, composition, and recomposition. The first half of the book is where I will concentrate this review. Temperley states that the goal of the 18 book is to explore the “’infrastructural’ levels of music,” meaning the basic building blocks of music perception, because there is very little research on the subject. Before presenting the Preference Rule System (PRS), Temperley outlines previous research on musical structure as it relates to each section. For example, Temperley describes at length the Desain and Honing model for beat induction in the chapter on Metrical Structure. The specificities of each section is discussed in Chapter 3 of this thesis . He notes that each PRS is based on a piano-roll input for the computer. The PRS itself is a group of rules the computer follows to narrow a set of possible choices. Within each rule there is a preference—hence the name preference rule. The end choice is selected because more rules are preferred in a specific hierarchy. After presenting Preference Rule Systems, Temperley describes the tests he goes through to ensure well-functioning systems. Meter, unlike the others, has had plenty of research concerning theoretical and computational models. Temperley builds upon the Lerdahl and Jackendoff Generative Theory of Tonal Music (1983) by adapting it for a preference rule approach. The meter section takes the Well Formedness definition from Lerdahl and Jackendoff where grouping and hierarchy are most important and Temperley explains it as “every event onset must be marked by a beat [and] that a beat at one level must be at all lower levels” (Temperley 2001,30). This is used in all successive PRSs. Similarly, for Key Structure there is sufficient research from music cognition and computational methods to improve upon. Temperley uses the Krumhansl-Schumckler Key-Finding Algorithm and discusses problems and solutions. The other four PRSs take a list of rules and within each have a list of preferences in a specific order, so the computer knows which item is the most important or most common. For 19 example, the Phrase Structure Preference Rules (Temperley 2001 Melodic Phrase Structure Chapter pp. 68-70) comprise of three rules. 1. Gap Rule: Prefer to Locate phrase boundaries at a. Large inter-onset intervals and b. Large offset-to-onset intervals 2. Phrase Length Rule: Prefer phrases to have roughly 8 notes 3. Metrical Parallelism Rule: Prefer to begin successive groups at parallel points in the metrical structure This is for only well-formed, by the previously mentioned definition, monophonic melodies. For implementation of each of these rules, a formula, score or other quantification is applied. The best “score” is the best analysis for a melody. Temperley’s Preference Rule Systems gives me multiple examples of how the computer evaluated different problems which I can then relate to other models for evaluation. In this regard, Temperley’s 2001 book acts as a springboard for my thesis. It gives important background information in Computer Music Analysis and shows me how Temperley’s subsequent work has built upon it. The book will be further discussed in Chapter 3: Optimization. 1.3.2 David Temperley and Christopher Bartlette “Parallelism as a Factor in Metrical Analysis” (2002) This text builds upon the previous Temperley book by adding further information to the “Metrical Parallelism Rule.” (Temperley 2001,70). The “well-formedness rule,” as mentioned in Temperley 2001, still applies in this article, as does the need for monophony. The goal of this 20 article is to build upon the book for clarity, accuracy and precision when dealing with Parallelism. Temperley and Bartlette examine the effect of Parallelism and realized that the definition must be modified. Parallelism is defined as a repetition either of the exact sequence or the contour. The Parallelism Rule is now redefined to “prefer beat intervals of a certain distance to the extent that repetition occurs at that distance in the vicinity.” (Temperley and Bartlette 2002, 134) This is useful to the thesis because it gives a more inclusive definition to Paralellism as a term and a rule and, also, because of the influence it had on the later treatment of parallelism. 1.3.3 David Temperley Music and Probability (2007) Though Temperley was content with the 2001 book, it seemed like more should be added to the approach because preference rule models could not be applied to “linguistics or vision” (Temperley 2007, ix). The goal of the 2007 book is to use specific Bayesian probability tool, as a link between perception and style. In the perception of linguistics and vision, Bayesian probability techniques such as probability of an event following another are more common in computer analytic tools. To quote Temperley, “I realized that Bayesian models provided the answer to my problems with preference rule models. In fact, preference rule models were very similar to Bayesian models” (Temperley 2007,x) meaning that the existing PRSs can be easily turned into Bayesian models. The book shows a new trend in Computer Music research: probability. It uses the Essen Corpus, also known as the Essen Folksong Collection, 2 to test for the central distribution of the 2 The Essen Folksong collection is a set of folksongs from Germany, China, France, Russia and more collected by Helmut Schaffrath. http://essen.themefinder.org/ 21 aspects of music (and relies on a method of representation created by Lerdahl and Jackendoff in 1983, which, by this point, was familiar to music theorist). The book itself touches on Rhythm, Pitch, Key, Style, Composition, and, like the first computer music analytic tools, error detection in its main chapters. 1.3.4 David Huron “Tone and Voice: A Derivation of the Rules of Voice-Leading from Perceptual Principles” (2001) I have included this work in the literature review because we must remember that all computer models tie back to perception, in some way, to be correct. It should be noted that Huron’s text was also referenced in Temperley’s work because the psychological principles behind musical aspects make computational modelling difficult. Huron’s 2001 work shows the relationship between voice-leading and auditory perception using perception. The article presents a set of the voice-leading rules, then derives them from the perception principles, and finally it makes ties to genre. Each voice leading rules is scrutinized under three questions: 1. What goal is served by the following rule? 2. Is the goal worthwhile? 3. Is the rule an effective way of achieving the purported goal? (Huron 2001, 1) Huron brings up the important concept of culture. With analysis, it remains unknown if these principles of auditory perception are inherent in all people or if they are created by cultures. However, Huron notes that “perceptual principles can be used to account for a number of aspects of musical organization, at least with respect to Western music” (Huron 2001,1) and 22 concludes that six principles in perception account for most voice leading rules in Western Music. Another important aspect brought up is the compositional goals because the composer plays with the perception of the listener. For example, Huron mentions “Bach gradually changes his compositional strategy. For works employing just two parts, Bach endeavors to keep the parts active (few rests of short duration) and to boost the textural density through pseudo-polyphonic writing. For works having four or more nominal voices, Bach reverses this strategy” (Huron 2001, 47). This deceives the listener because a four-voice work may sound more sparse while a two-voice work sounds more dense making these voice-leading rules more like compositional options. 1.3.5 Darrell Conklin and Ian H. Witten “Multiple Viewpoint Systems for Music Prediction” (1995) Darrell Conklin concentrates on research in Machine Learning and Music at the University of Basque Country in Spain. This article has been cited in Temperley’s works such as The Cognition of Basic Musical Structures (2001). The paper takes an “empirical induction approach to generative theory” (Conklin and Whitten 1995, 52) by exploring previous compositions for style and patterns. More specifically, this article uses Bach Chorale as a starting point for choral music. Conklin and Whitten describe Machine Learning, applied to music research, as follows: “Machine learning is concerned with improving performance as a specific task. Here the task is music prediction” (Conklin and Whitten 1995, 55). Since much of Machine Learning uses context models, but that requires exact matches. Music does not always use exact matches 23 because similarity is enough for auditory perception, Conklin and Witten take a multiple- viewpoint system. Each viewpoint is an aspect of music, to derive musical ideas that take style into account. Conklin and Whitten describe the next steps in this field as: 1. Research on prediction and entropy of music 2. The creation of “a general-purpose machine learning tool for music” (Conklin and Whitten 1995,71) for all musical genres Their work adds to the thesis by providing the beginning of Machine Learning. From this, the rest of the accomplishments in Machine Learning and music can be put into perspective. 1.4 Conclusion In the introductory chapter of this thesis, I have described my goal: to critically examine aspects of Music Information Retrieval (MIR), Optimization, and Machine Learning. Between MIR and Optimization there is a common starting point, but they differ in goal. MIR aims to create a database or multiple databases for further analysis while Optimization uses a computer model to understand the human perception of a musical structure. Machine Learning is different than the other two since it concentrates on the creation of a tool and not necessarily the uses. I have surveyed specific literature in the field of computer music analysis for a background and inroad to the research from 2000 to 2016. For a historical context, I have brought in Manning’s multi-faceted explanation of the relationship between computers and music. This mentions composition, performance, and analysis and displays the many important developments prior to the turn of the century. The developments include Music Representation Languages (MRLs)—like MIDI—and notation software because they created a widespread 24 usage. This literature touches on MIR, Optimization, and Machine Learning and, also, exposes some critical issues in Computer Musical Analysis. I have set out a list of five critical issues, that I use to gain critical perspective on the field. The first issue is Human Error which refers to human limitations and the capacity to make mistakes. This was brought up by both Peter Manning and David Huron. Second is input specification, which is a recurring issue since articles do not specify what input is used for a tool. The input is largely genre-based due to availability. Consistent Evaluative Principles are needed for all branches of Computer Music Analysis, so that there is a reliable set of algorithms and methods to be drawn upon. The Interdisciplinary Problem is an issue with term usage and level differences in tools creation and is evident through all of the authors in the literature review. This is because each author uses their own set of terms based on their usual field of research. “What’s the Point?” refers to the lack of reason for a specific tool because, for a branch like Optimization, the tools are working towards understanding human perception. This means a specific tool may not have a specific usage at its inception. Using this chapter as a basis, I begin my analysis of specific tools in each of the three subfields starting with Music Information Retrieval. 25 Chapter 2- Music Information Retrieval 2.1 Introduction Music Information Retrieval (MIR) is a subsection of Computer Music Analysis that is growing exponentially because of current technology. MIR is concerned with examining music, either by locating or by analysing, and often aims to make music searchable. The locating branch is often aimed at examining the metadata of a large set of works. The analysis/production branch concerns itself with a smaller number of pieces but goes into much greater detail (Downie 2003) as is stated by Downie: “Analytic/Production systems usually contain the most complete representation of music information” (Downie 2003, 308). Databases created for MIR can be accessible through the internet, so they are used by all researchers if they have the background knowledge needed. The goal of this chapter is to begin a critical comparison of tools and problem-solving methods in MIR. This will be accomplished by discussing three projects: a large completed tool, a large tool in progress, and a small tool. These tools are just the “tip of the iceberg” when it comes to MIR, but they have been chosen to show different stages within the evolution of a tool. The large completed tool is VocalSearch where song lyrics can be searched to identify their presence in a song. The in-progress tool is a research project called the Single Interface for Music Score Searching and Analysis (SIMSSA). The small milestone studied here is Patrick Donnelly and John Sheppard’s approach to timbre identification using probability. In fact, Donnelly and Sheppard’s project provide a solution to a specific problem which in turn can provide help to a larger database. This final milestone will show how smaller projects in Computer Music Analysis can help solve larger problems and thus help move the field forward. 26 2.1.1 MIR Overview and Applications The purpose of this section is to give a description of major terms in Music Information Retrieval (MIR) and to show the different systems at work in MIR. I will not be going in depth about all systems, but I would like to show the complexity of MIR. I will first explain the two main types of MIR systems: locating and analytic/production. Then I will outline the different types of data. I will then explain how the different types of musical information fit into each of the data categories and systems. MIR examines multiple facts of music information in many different forms. According to J. Stephen Downie— the creator of MIREX and specializing in information sciences at the University of Illinois—there are two different types of MIR systems: locating and analytic/production (Downie 2003) as mentioned in the introduction. The locating systems are used by people searching for music either as a consumer on a website or as a researcher in a recordings database. A locating system looks at many works, but does not go in depth, and often locates information on the title, composer, performer etc. This type of information is called metadata. An analytic/production system generally looks at a small number of works, but in much greater detail. These systems, for example, can look at audio recordings, pictures of scores, and/or symbolic forms of scores. (I will not go into detail about specific systems at this point since they will be discussed later in the chapter.) The different types of possible data in music, as mentioned above, are metadata, audio, symbolic, and image. Metadata is simply data about data, so, in music, this is information about the performers or pieces performed, such as title, composer, etc. Audio data is a recording. Most commonly, MP3 files are used for audio data because they are easily read by computers and this 27 is often the data used for popular music. In certain regards, images and symbolic forms are similar; image data refers to images of scores, while symbolic data is a format that a computer can understand, such as a score notated in Finale or some other notation file. These different types of data have specific limits and uses. For example, metadata, which was explained above, is used in all search engines that look through bibliographic data. Audio on the other hand is not as easy to search but is very easy to obtain in standard MP3 format. According to Burgoyne et al in Chapter 15 of the 2016, A New Companion to Digital Humanities, audio data is difficult for feature extraction—when a user aims to identify a particular query—because it comes in the form of large files. Historically “query-by-humming” (Burgoyne et al 2016) has been a popular MIR for feature extraction if it has been properly annotated. For query-by-humming, a user hums a tune in a microphone and the tune is matched with a piece. This, however, is by no means a complete picture of what audio can be used for. If an audio recording could be transferred to symbolic data, it would be more useful to MIR (Burgoyne et al 2016). Symbolic Data, often is in the form MIDI or a readable score format, is easily recognizable by a computer and is used for information retrieval, classification, music performance, and music analysis. A symbolic form can retrieve sets of pitches (together making themes), rhythms, harmonic progressions, and more. Classification using symbolic formats identifies stylistic “emblems” such as a specific harmonic progression or the usage of specific intervals. This emblem is a defining characteristic. In terms of music performance, symbolic data is also used for expressive timing studies. Finally, for music analysis, symbolic format is used for automated analysis (this also overlaps with optimization) and for pitch spelling when MIDI is used (Burgoyne et al 2016). 28 Image data, like audio data, is difficult for a computer to recognize, and at present there is no consistent recognized form for sharing it. A score itself can be transcribed or turned into a MIDI format but that is time consuming. Optical Music Recognition (OMR) was created to solve this issue. OMR is a tool that can identify musical characters much like Optical Character Recognition can identify letters in typed images. This renders score images readable by computers (This will be further discussed in the section on SIMSSA, Single Interface for Music Score Searching and Analysis). MIR is a multifaceted, multicultural, multidisciplinary tool. There are also seven facets of music information (Downie 2003): 1. Pitch 2. Temporal 3. Harmonic 4. Timbral 5. Editorial 6. Textual 7. Bibliographic In the following graphic, I have given a representation of the overall shape of MIR, as it currently stands. The reader will note the breakdown into two large parts, locating and analytic/production, as discussed above. And within these, the reader will find the various of these fields as described above. 29 Figure 2 The second row and the last set of facets are the two categories of MIR system explained by J. Stephen Downie in his 2003 article. The four types of data are from chapter 15 by Burgoyne et al 2016 Though the graphic looks as if it represents a concrete situation, these lines are blurring due to changes since the turn of the century. These changes are being examined by ISMIR, the Music Information Retrieval Locating Metadata Bibliographic Analytical/Product ion Audio Pitch Temporal Harmonic Timbral Image Editorial Textual Pitch Temporal Harmonic Symbolic Editorial Textual Pitch Temporal Harmonic 30 International Society of Music Information Retrieval, and MIREX, the Music Information Retrieval Evaluation eXchange (Burgoyne et al 2016), but, as stated in their names, they only look at MIR tools (this is one of my five critical issues). This graphic representation has been included as a comparison point for the rest of Chapter Two, so I will be referring these types of data (Metadata, Audio, Image, Symbolic), facets (Pitch, Temporal, Harmonic, Timbral, Editorial, Textual, Bibliographic), and systems (Locating, Analytical/Production). 2.2 The MIR Tools In this part of the thesis I shall look at several tools in MIR. Some of which are to be used by researchers in MIR and others for layperson use. First, I start with VocalSearch, which is now unavailable online but gives valuable information to the thesis. Next, I discuss three Single Interface for Music Score Searching and Analysis (SIMSSA) tools: Search the Liber Usualis, Cantus Ultimus, and Electronic Locator of Vertical Interval Successions (ELVIS). Finally, I examine a smaller tool which is Donnelly and Sheppard’s Bayesian Network Algorithm that investigates timbre identification. 2.2.1 Vocalsearch Vocalsearch is a web-based tool which was available to everyone and is used to identify unknown songs without metadata (Pardo et al 2008). Metadata is the information about the song such as title, artist, album, etc (Burgoyne et al 2016) and, without it, it is difficult to identify a song (Orio 2008). Vocalsearch was created by teams from University of Michigan and Carnegie Mellon University (Birmingham, Dannenbery, and Pardo 2006). I have chosen to include it as a 31 tool that is ‘complete’—as research grows this project may change, but it is a complete database when compared to the tools that follow in my discussion. This tool lets the user search—by humming a segment, by providing music notation, and by providing lyrics—using Melodic Music Indexing and Query-by-Humming technology. Melodic Music Indexing is a way for the computer to understand the melodic content of a song. A song is annotated with the melodic content; often this is done through MIDI sequencing. MIDI is easily understood by a computer because it gives both pitch and duration. When a query is hummed, the computer matches it to the corresponding song. Song matching is problematic. Often, when a query-by-humming platform does not work, it is because the user did not hum the melody clearly or chose a different song layer, perhaps another instrument or vocal line (Dannenberg et al 2007). The tool must also equalize and understand the query, and, for Vocalsearch, this is done using a probability algorithm (Birmingham, Dannenberg Pardo 2006). The approach measures the similarity between the MIDI and the sung query for the large database. Within MIR, Vocalsearch builds upon the existing audio data recognition and locating systems. It lets a specific song or number of songs be located using various queries recognized both through a typed search and a hummed audio search. Vocalsearch uses usual metadata searches if needed but seems to be more useful for unusual queries like, humming or notational search. The database itself is used for music with a lyrical content, hence the name, but the site is now unavailable, so the data from a user’s perspective is limited. A common issue with a database is that music is constantly being created, but this database of music will keep growing because a user can add songs (Pardo et al 2008). 32 2.2.2 SIMSSA As I mentioned above, the in-progress tool is a research project called the Single Interface for Music Score Searching and Analysis (SIMSSA). In this section of the thesis, I describe three SIMSSA projects: “Search the Liber Usualis,” “Cantus Ultimus,” and “ELVIS.” These all have different goals and technologies, so including all three gives a well-rounded view of what goes into a tool. 2.2.2.1 Search the Liber Usualis The Liber Usualis contains valuable information for those working on early music. The text is over 2000 pages, so it is difficult to locate the needed information. To solve this problem, SIMSSA decided to render its contents searchable and make it all available online. This tool lets researchers search the text for pitch sequences (either transposed or exact),neumes, contour, intervals, and, of course, text (Search the Liber Usualis Website is located at liber.simssa.ca). To do so, SIMSSA has used Optical Text Recognition (OTR), sometimes referred to Optical Character Recognition (OCR), and Optical Music Recognition (OMR). OMR, as previously mentioned, is a computer method involved in “turning musical notation represented in a digital image in a computer-manipulable symbolic notation format (Vigliensoni et al 2011 423).” Using OMR with neumes, or square-note notation, is difficult because it is a precursor to standard musical notation. Because this notation is a precursor, there is no standard notation software, so the tool must translate the square-note notation to the standardized one. OMR must be configured to translate the first notation to the required notation. The translation to standard notation requires computer understanding of eleven neumes. SIMSSA 33 decided to use the ‘Music Staves Gamera Toolkit’ as a bank of algorithms to perform an analysis on 40 test pages of the Liber Usualis. The test pages were manually classified and annotated to double check the output of the algorithms. The algorithms used did the following tasks: created the staff lines, removed the staff, added ledger lines, and classified the types of neumes. When classifying neumes, the algorithm did not work 100%, so the final version was examined by a human to ensure perfection. These algorithms, however, do not tackle clef recognition and note identification. Note Identification was made possible using horizontal projection of neumes, but this only worked for a subset of the eleven neumes. In conjunction with the algorithms used prior for determining types and placement of neumes relative to the staff, the starting pitch of the neume was identified using the average size of the neume and its “center of mass (Vigliensoni et al 2011, 426).” The clef was then identified and each neume was given a pitch relative to the clef. This was possible because the clef is always the first neume-like image in the line. The remaining set of neumes often have multiple pitches, so they were treated as exceptions to the above-mentioned method. These neumes were first split so the resulting output would correctly identify the multiple pitches. In conclusion, a different algorithm from the Music Staves Gamera Toolkit was used for each of the procedures, but, together, the algorithms rendered the scores from the entire book searchable. The scores were made searchable using algorithms, then, the text was searchable through OTR technology in a simpler fashion to the scores. The “Search the Liber Usualis” project fits in the MIR chart above by being analytical and as a tool for locating scores and text. It is analytical because it uses an image of a text and looks at contour and interval, these being elements of 34 analysis and locating because it finds specific ideas based on the searched criteria. This is possible because of the computer’s ability to ‘read the music’ once the algorithms translate it. 2.2.2.2 Cantus Ultimus The “Search the Liber Usualis” can be seen as an initial test, laying the groundwork for the Cantus Ultimus. Their goals, however, are different. For the Liber, the goal was to make it searchable and make it easy for researchers to use the book. With the Cantus Ultimus, the aim is to preserve the ancient manuscripts digitally before they deteriorate further. The database shows images of the searched score, with typed lyrics, and standard notation on the side bar (Cantus Ultimus is located at cantus.simssa.ca/). Only a few sets of images have been added, but this project is still growing. The Cantus Ultimus is part of SIMSSA primarily located at McGill University. This tool builds upon the existing Cantus Database with more digitized scores and Optical Music Recognition (OMR) technology. Researchers and plainchant enthusiasts can search through the database by text, genre, office, and by reference to the associated liturgical feast. Text queries include lyrics of the chant and the metadata for each. They can also make musical search using “Volpiano searches” which are searches using notes specifically. This can either be a normal search where A-B-C would show results for A-B-C, D-E-F, and any other series with the same intervals or a literal search where only A-B-C sequences would be shown (cantus.simssa.ca/). Each query can yield multiple results, so, in effect, it is a locating system. The system locates based on notes, and lyrics, but, more importantly, it is an image searching database. The 35 ability to search through images was made possible through OMR and OCR with all of the algorithms used in the “Search the Liber Usualis.” 2.2.2.3 Electronic Locator of Vertical Interval Succession ELVIS The Electronic Locator of Vertical Interval Succession (ELVIS) was created to give counterpoint the attention it deserves. In fact, a presentation on ELVIS, by Christopher Antila, won first prize at the 2014 Montreal Digital Humanities Showcase and is funded by a Digging into Data Challenge award (located at https://elvisproject.ca/). The goal of ELVIS is to look at musical style in terms of changes in counterpoint (Antilla and Cumming 2014). ELVIS is a set of downloadable scores in a database, a web-based application, and a downloadable tool. These three aspects have taken many people to create it. Most of the people, such as Ichiro Fujinaga and Peter Schubert, are from McGill University in Montreal, those working on the harmonic side of counterpoint are headed by Ian Quinn from Yale University, and the University of Aberdeen has also been involved with this project. But, the software for the downloadable tool, music21, was created by Myke Cuthbert at the Massachusetts Institute of Technology (Music21) Music21 is a python based “toolkit for computer-aided musicology” (music21) that allows the user to search though any imported scores using basic programming language. What this means is, by using commands such as if x then y, then a desired output can be found. This works especially well for big data queries in MIR (Antilla and Cumming 2014). Using the ELVIS database, the scores can be imported and searched using music21. The scores in the database can be searched through the ELVIS website and, using the web app, patterns are located. The Downloadable software is a VIS, Vertical Interval Succession—meaning a set of 36 harmonic intervals in a particular order—, framework used on music21 (ELVIS project). The framework uses n-grams when referring to the number (n) of vertical interval successions. This analysis uses intervals without quality instead of note names to compare many works regardless of key (Antilla and Cumming 2014). This software is used on Python, a standard programming language, so those with a knowledge of programming commands can get the most out of it. For those who do not have programming knowledge there is a Counterpoint web app (counterpoint.elvis.ca). The application for ELVIS is called the Counterpoint Web App on their website (ELVIS project) and is specifically for pattern recognition. This web app continues to use a VIS framework, but it is more limited in query possibilities than the downloadable extension for music21. Getting to the application through the website is problematic because of a broken link or, perhaps, the web application is not finished. As previously mentioned, SIMSSA is building tools and many of the tools are still in progress. Music Sonification is used in the ELVIS project to turn the music notation data into sound but can be manipulated by the researcher. Accessibility, in this case, was the main concern because not all researchers will have in depth knowledge of recording or sound mixing software. To solve this problem, the ELVIS team have created a graphic user interface. This is a graphic representation of music and the most useful audio tools for interval analysis. The concentration on interval analysis is because ELVIS is for contrapuntal analysis and pattern recognition (ELVIS project). ELVIS is both a locating and analysis tool. The locating part is from the web app because it only locates patterns. The analytical axis, however, is much more in depth and is available for a wide variety of early music using the VIS Framework and the programming language. Though the intention of the project was for counterpoint alone, the VIS Framework, 37 music21, and the use of pandas libraries—where the scores themselves are kept—make possibilities endless (ELVIS project). 2.2.3 Donnelly and Sheppard Bayesian Network Algorithm Donnelly and Sheppard—researchers from University of Notre Dame and Montana State University respectively—found that timbre has not been fully explored in MIR, so they have modified an existing algorithm derived from Bayesian probability Networks. This new system of steps identifies different timbres in music. This can be used to establish another way of organizing and searching through music in a large corpus. In Donnelly and Sheppard’s article, “Classification of Musical Timbre Using Bayesian Networks,” nearest neighbour and vector machine as timbral identification models are compared to this new model. Upon comparison to the other models, the Bayesian algorithm better differentiates strings, but still has drawbacks. The other models better differentiate between aerophones, like woodwinds and brass, but, together, it appears the models can differentiate all instruments together. This seems to still be useful as a method for categorizing string instruments and, in conjunction with the other tools, can categorize all instruments. The target audience for this method, are researchers and others who want to organize a database using instruments within a musical track. This can grow the locating section for audio as an alternative to metadata, but this would be for smaller tasks examining instruments. This is included as a smaller technology that has capabilities for MIR and to show the possibilities for connection between MIR and Optimization, which is the following chapter. 38 2.3 Critical Analysis This chapter thus far has explained what each of the tools do. This section examines each tool critically. I discuss the assumptions made, and further extensions of the tool that were not examined in the articles themselves. I go through each of the tools in this order that they were previously presented, so first I examine VocalSearch, SIMSSA—Cantus Ultimus, Liber Usualis, and ELVIS—and finally the Bayesian Networks presented by Donnelley. 2.3.1 VocalSearch VocalSearch takes audio input, which is difficult because audio input must be taken apart to match a specific line in a song. However, it is not mentioned if a melody sung in a different key from the original will match a song to the input. Though melodies are often remembered in the original key, the user may not have the range to do so. Also, this article does not mention the matching of a song from the database to a slightly inaccurate input, so it likely would not work in such a case. VocalSearch achieves its goal of being able to reach a large audience using the internet and having multiple ways of searching queries. Setting up such a database takes a large body of songs, but to keep a database like this current, new songs must be added regularly. To do this, the makers of VocalSearch have included a function that allows users to add content to the database. There are a few issues with users adding content. As previously stated in the Introductory chapter, the errors made by a computer program are due to human error. This human error can be in the programming itself, but more often it is in the input for the program. As mentioned with VocalSearch, there are multiple methods of searching, so the person who inputs a song must 39 enter all correct information. If incorrect information is added, then the tool will not work correctly rendering it inutile. 2.3.2 SIMSSA SIMSSA has multiple projects, so I will critically analyze each of the projects from SIMSSA. Overall, SIMSSA uses scores images and creates databases using OMR, OTR, and other technologies. 2.3.2.1 Search the Liber Usualis Using optical text recognition (OTR) and optical music recognition (OMR) the Liber Usualis is searchable. Meaning that, by typing in a search bar, matching text or music is highlighted and, by using the colour coating available on the web-based tool, multiple searches can be highlighted at once. This is useful for researchers who need specific information from this 1000+ page text. More information on the tool can be found in section 2.3. OMR and OTR are used when the file format ha come from images and are, therefore, not searchable. These technologies make the document searchable by translating the image data into a format recognized by the computer. For OTR, this translates the image of a letter to the letter itself while OMR must attach the letter name and the function of note. This increases the margin of error. An issue I have found when using the tool is that coloured highlighting box around the searched content is not completely accurate. With some searched content, the box is around a set of words that do not contain the searched item. Also, an assumption made is that the user wants the entire sentence highlighted when searching for a specific word or group of words. 40 This calls into question how OTR works because if it turns a text searchable, then it should only highlight what is searched. 2.3.2.2 Cantus Ultimus Cantus Ultimus uses digitized scores and OMR, to create an interactive and searchable score database. This not only gives a researcher the access to the database, but also lets them search the score in multiple ways. Furthermore, the database gives the researcher access to the manuscript image online with the typeface version in the righthand menu. For example, if there are neumes in very small writing on the score image then the right-hand menu will give the modern notation of the score. Currently, there are only a few scores or manuscripts, so the obvious improvement is to have more scores. The process to add a score, however, is very long even using OTR and OMR because all scores should be checked. Because the manuscripts have aged, can be faded, or overall difficult for a computer to read, checking is imperative to a proper database entry. What could help are Machine Learning and Optimization models that are discussed in further chapters. 2.5.2.3 Electronic Locator of Vertical Interval Successions (ELVIS) ELVIS gives counterpoint priority in research by combining a database with a web app and music21. The database gives the user access to a set of scores while the web app and music21 lets the researcher search through the scores. The web app is designed for a non- 41 programmer to find recurring patterns, and music21 has more features and the entire score can be searched using programming language. This tool attempts to cater to both the programmer and the non-programmer by using music21, that is based on python—a common programming language--, and the web app. However, the web app only allows the user to find recurring features, so a non-programmer has limited usage with this tool. It is assumed that a non-programmer will only want to use this tool to find recurring features while they could, also, be looking for specific vertical interval successions, or a specific set of notes. 2.3.3 Bayesian Networks As previously stated, this model gives timbre attention because it can be used to add in a search. This model is, however, limited in its ability to distinguish between aerophones, but can better differentiate between strings. To approach this problem, the tool must be combined with others to achieve greater accuracy. The goal of this tool is to differentiate between instruments and, eventually, search through a database and render it searchable by instrument. Another way to approach this is to look at the metadata which often contains instrument data. Using an OTR-like algorithm, the metadata can be searched for contributing artists and musicians. This would render a set of works searchable by the contributors which will often contain the name of the instrument each contributor plays and, therefore, the set of works would be searchable by instrument. This approach is useful specifically for works where the contributors’ instrument is unknown, and the unknown instruments are stringed. Upon combining this method with other similar methods, the usefulness will increase because all instruments can be identified. 42 Chapter 3: Optimization I use the term “optimization” to refer to the increase of output for less time and energy in music analysis—the optimization of effort so as to achieve a result. More specifically, this section will look at Preference Rule Systems (PRSs) and Probabilistic and Statistical Models. The goal in optimization is to understand and reproduce a human perception of an input. My goal is to show that, by integrating more mathematics and computer tools, analysis can be optimized. This term was inspired by its customary use in the areas of Calculus or Business, where the optimization of space and resources is described in term of optimization problems. In music, the term pertains to David Temperley’s progression in analytical approaches. Temperley’s the Cognition of Basic Musical Structures (2001) took a preference rule approach to musical elements. For each element, a set of Preference rules were outlined for a computer tool to analyze a piece of music for information. Following this, Temperley took a few of the elements examined in the 2001 book and applied a probabilistic approach to them using Bayesian Probability —a term referring to extensions of the acceptance of Bayes’ Rule3— to match the approach of similar perceptual fields. The 2007 book, Music and Probability, aims to build upon the previous set of preference rules and move further in the research. This is the method of Optimization to be addressed here. This section of the thesis will explain a previous way of approaching a problem and explain how a new method has helped to optimize the older one. Like both of Temperley’s approaches, there will be a section on organization by Preference Rules and a section examining Probability and Statistical models. In the Preference Rule section, Temperley’s approach will be 3 Bayes’ rule is expressed as follows: P(A|B) = 𝑃(𝐵|𝐴)𝑃(𝐴) 𝑃(𝐵) where probability is P and items A and B are distinct and different. Upon acceptance of this theorem, a branch of probability is built called Bayesian Probability 43 discussed first. Following this, other preference rule methods and computer tools will be presented as they relate to Temperley’s Cognition of Basic Musical Structures (2001). The second section will show various approaches to music analysis that involve different aspects of Probability and Statistics. Some of these approaches, like Temperley, use Bayesian Probability, and others concentrate on statistical analysis. Though the two sections are split in this thesis they are related since the hierarchy built in a PRS carries through into Probability. I have separated them in the thesis to better explain how a newer model has built upon Temperley’s work bit they are related. This is represented graphically in figure 3 where the dashed line represents the implicit link between the two main sections, even though they are distinct in their principal focus (i.e. a PRS or application of probability). The items under each of the main headings are the topics that are covered in this chapter. Bayesian Probability can encompass all of the subheadings under Preference Rules, but Harmonic Vectors and the application of Bioinformatics later in the chapter relate more specifically to other subheadings. This is also represented through dashed lines. 44 Figure 3 This is a graphic representation of the aspects of the field I concentrate on. It shows that Preference Rules and Probability and Statistics are not completely separate from each other. Optimization Preference Rules Metrical Structure Contrapuntal Structure Harmonic Structure Melodic Phrase Structure Parallelism Probability and Statistics Bayesian Probability Harmoinc Vectors Bioinformatics 45 3.1 Preference Rules This section on Preference Rules will start by outlining David Temperley’s Preference Rule Systems (PRSs) from The Cognition of Basic Musical Structures (2001). I concentrate on the first section of the book. Temperley uses a piano roll input for the computer and, based on the subsection in question, specific tests are performed to examine the usefulness of the approach. The subsections of this book—Metrical Structure, Melodic Phrase Structure, Contrapuntal Structure, Tonal-Pitch-Class Representation, Harmonic Structure, and Key Structure—will serve as subsections of the following chapter. Parallelism is the final subsection in this chapter and it was added because of a 2002 Temperley and Bartlette article, “Parallelism as a Factor in Metrical Analysis,” that further explains the importance of parallelism (This article also gives a broader definition to parallelism which is important to further research). For each subsection, Temperley’s findings from 2001 will be presented followed by the research that has built upon the findings. In this part of the thesis, I take Temperley’s model and examine how the next 16 years of research has built upon it. I will present a set of the comparable models and give a brief explanation of the element of Temperley the model builds upon. Following this section, I will critically examine the newer models and tools through comparison. I begin, however, with Temperley’s 2001 book The Cognition of Basic Musical Structures. 3.1.1 Metrical Structure 46 As David Temperley explains in The Cognition of Basic Musical Structures (2001), the computer must concentrate on that beat induction when examining metrical structure. Beat induction is when the computer must understand or tap the beat. In some senses, the term refers to a ‘foot tapping’-like induction, but for the Temperley PRS it is for inferring meter. The meter is shown in a Lerdahl Jackendoff graphic model with different hierarchies of beats as shown in figure 4. This is a Metrical grid for 2/4 time where the lowest set of dots indicates the eighth note lever (the division of the beat level), the middle set of dots are the main beat (1, 2, etc.) and the highest level is the strong beat (the downbeat). Figure 4 This is a beat hierarchy and described by Lerdahl and Jackendoff For finding metrical structure, Temperley outlines the rules as followed: 1. Event rule: Prefer event on a strong beat onset 2. Length rule: prefer long events on strong beats 3. Regularity rule: prefer evenly spaced beats at each level 4. Grouping rule: “Prefer a strong beat at beginning of groups (Temperley 2001, 38)” 5. Duple bias rule: Prefer duple or triple levels (for example 3/4 instead of 6/8) 6. Harmony rule: strong beats align with harmonics change 7. Stress rule: prefer strong beats with loud events 8. Linguistic stress rule: prefer stressed syllables on strong beats 2 4 47 9. Parallelism rule: prefer the same metrical structure to the same segments What these rules consolidate to is a set of preferences for a computer system to go through to find the “best-fit” for metrical structure. The computer will attempt to fit different meters onto a piece of music and chose a version where the most parameters are preferred. Because these are preference rules, in other words the computer does not have to have all of them true when choosing a meter, so the “best-fit” refers to the meter with most of the preferences. Tempo is a bottom-up and a top-down process depending on how long someone listens to a piece of music in the same tempo. It is a bottom-up process because we need a few notes to perceive a tempo, but following these few notes it is a top-down process because we apply the tempo we have perceived to the music—as evident through foot-tapping, head bobbing etc. However, if the tempo were to change suddenly for expressivity, a person could catch it quickly. According to Desain and Honing, “beat induction is a fast process [since] only after a few notes a strong sense of beat can be induced” (Desain and Honing 1999,29), and, therefore, a computer inducing tempo is a large undertaking. In Temperley’s writing he mentions the “most important work in the area of quantization (Temperley 2001, 27)” is a 1992 Desain and Honing study entitled “Time Functions Function Best as Functions of Multiple Times.” I mention this article because of its comparative approach and use of much the same rule-based models as Temperley. The 1992 article is a connectionist approach that uses stationary units and interactive units that change based on the surrounding material. The approach does not keep the length of notes the same but keeps the onset the same, which is problematic for Temperley. Even though this model offers multiple beats per time interval it cannot handle expressive timing (Temperley 2001) 48 The 1999 Desain and Honing study, “Computational Models of Beat Induction: The Rule Based Approach” used a rule-based model for beat induction of musical input and aims to explore the perception of tempo in people and in computers. The goal of this article is to look at rule-based models and provide an understanding of how these models create an initial beat structure. Desain and Honing examined the contribution and robustness of rules in different rule- based models. The important aspect taken from this article is that models, regardless of year created, can work more optimally with rules taken from other models. This points towards the mixing of rules and ideas which is in fact what Temperley has done to create his PRS. Smith and Honing (2008) explains how the problem of expressive timing could be overcome. This study used rhythmically isolated segments –meaning that there was only rhythm as input—to incorporate expressive timing. This accounts for the fact that a person can easily change their original beat structure to incorporate expression. A technique based on Morlet Wavelengths was used to do so because of its similarity to human hearing and prediction4. This remains consistent with the overall goals of Optimization, which is to explain with greater and greater efficiency perception and human signal processing. These wavelets, however, are best used for short bursts of input similar to that of expressive timing at the ends of phrases. The article first looks at the analytical techniques and the application of Morlet Wavelets to create a continuous wavelet (one that uses expressive timing). A wavelet is a representation of the repetitive rhythmic structure, such as a repeated rhythm or time signature. Then it puts the rhythmic findings into a hierarchy. Following this, the article finds the “foot tapping rate” (Smith and Honing 2008, 83) which is the basic tempo and, finally, the model is complete by showing 4 Definition taken from an Online Dictionary on time frequency. https://cnx.org/contents/SkfT37_l@2/Time- Frequency-Dictionaries https://cnx.org/contents/SkfT37_l@2/Time-Frequency-Dictionaries https://cnx.org/contents/SkfT37_l@2/Time-Frequency-Dictionaries 49 the incorporation of expressive timing (Step 1 with Step 3). Overall, this model will provide an accurate analysis of foot-tapping. It will be further discussed in Section 3.2. Hardesty in 2016 goes a different direction in building upon Temperley as well as Huron and Lerdahl and Jackendoff’s A Generative Theory of Tonal Music (1983). His approach aims to identify rhythmical features and examine music prediction from the rhythmic and parallelism point of view. This will be further discussed in the parallelism section. 3.1.2 Contrapuntal Structure As mentioned in Chapter 2 with the ELVIS project, counterpoint often does not get the attention it deserves. Temperley examines counterpoint with the goal of understanding the perception behind it. It is worth mentioning that the PRS for contrapuntal structure is geared towards a piano roll representation of a piece. Temperley uses the concept of “streams” which are a group of ideas in the same voice with minimal white squares. The white squares refer to moments of silence. Temperley’s Preference rules are as follows: 1. Pitch Proximity Rule: prefer to avoid large leaps in a stream 2. New Stream Rule: prefer the least number on streams 3. White Square Rule: prefer the least number of white squares in a stream 4. Collision Rule: prefer cases where a square is in one stream 5. Top Voice Rule: prefer a single voice as the top voice, so there is minimal voice exchange I would like to clarify that a stream does not refer to a phrase because, in contrapuntal structures, a stream can have multiple phrases. For example, one voice in a 4-part fugue would start with the 50 melody which can be multiple phrases, then the same voice will play contrapuntal variations with multiple phrases; this voice acts as one stream A 2015 Komosinki article examined analysis of counterpoint for compositional research by using a method called “dominance relation.” This is a method that uses multiple criteria to do analysis like a PRS. It specifically looks at first species counterpoint and can produce an output of a composition. Because this is a composition tool, I will concentrate on the evaluative module of the method. The model will first always generate the first species counterpoint, but each item is evaluated by the following criteria: 1. Direct motion 2. A repeated note 3. A vertical imperfect consonance 4. A skip 5. A vertical perfect consonance reached by direct motion 6. Skips by tritone or larger than P5 except m6 These criteria are examined through the generated piece and they are all counted. The output produced by a dominance relation will be either “dominated” or “non- dominated.” Using rules based upon the counterpoint method of Fux (Fux 1965), dominated counterpoint will have another counter point that is ‘better’ and this evaluation will repeat until a final, non-dominated counterpoint is found. This article builds upon Temperley’s rules but only in a general sense. Temperley’s rules are used to narrow down choices and find the best fit, while this method tests all rules on each counterpoint, and eventually finds the counterpoint that most exemplifies the rules. 51 Giraud et al in 2015 builds upon research on fugues. The input has the voices in the fugue already separated. This is much the same as Temperley’s streams and uses “generic MIR techniques” (Giraud et al 2015, 79). I have decided to put this into the Optimization section for two reasons. First, it is an example of work lying between optimization and MIR. Secondly, it acts more as an Optimization tool than an MIR tool because of its small scale. The goal is not to create a database. Instead, the goal is to be used as an evaluative model for fugues. This tool needs input that is already separated for computer use, so it uses files from the Humdrum toolkit because they have been previously separated into voices. This method concentrates on using tools to examine pattern repetitions and gives a complete analysis. It does so by identifying the subject, and countersubject(s), the key for individual occurrences, harmonic sequence, cadence, pedals, and overall structure. Giraud et al tested this method on 36 Bach and Shostakovich fugues. They found that, for some pieces, the analysis was complete and correct, but the method still gets false positives. Other results were completely unusable, but these were mostly double and triple fugues. More specifically, if the subject was correctly identified the overall analysis was more correct. Like any computer method, this one can be made better and Giraud et al makes suggestions on how. To make this optimal, Giraud et al suggests that the current method can be combined with probabilistic models. Probabilistic models will be discussed in the following section. 3.1.3 Tonal-Pitch Class Representation and Harmonic Structure Tonal-Pitch Class Representation is important to the PRS of Harmonic Structure. The term Tonal-Pitch Class is taken from Temperley and I have understood it to mean the set of pitch classes creating a tonal structure (i.e. key area). Tonal-Pitch Class representation is the sorting of 52 the pitches in a piece to a specific key. The Preference rules outlined by Temperley are as follows: 1. Pitch Variance Rule: prefer to label such that nearby events within the same key 2. Voice-leading Rule: events a half step apart are preferred to be different letter names 3. Harmonic Feedback Rule: prefer a Tonal Pitch Class where the harmonic structure is good (meaning that there is a logical progression) These rules help to decide a specific key and minimize notes outside of a chosen key. All keys would be tested for a specific idea and the best-fit would be chosen. The PRS for Harmonic Structure builds upon this assignment by adding roots and chords to the piece. These rules create a hierarchy of possibilities for the individual chords and, because the last rule for Tonal-Pitch representation considers harmonic progression, the progression is relatively accurate. This does not eliminate the analyst, however, because this is not 100% accurate. The PRS for Harmonic Structure are as follows: 1. Compatibility Rule: Prefer roots in the following order-> 1,5,3, flat3, flat7, flat5, flat9, ornamental (all others) 2. Strong Beat Rule: prefer chords on strong beats 3. Harmonic Variance Rule: prefer the next root to be on the circle of fifths 4. Ornamental Dissonance Rule: [ornamental dissonance is “if it does not have a chord-tone relationship to the chosen root] Prefer ornamental dissonances where the next or prior note is a tone or semitone away and/or on a weak beat The PRS for Harmonic Structure still considers chords that are not part of the original key, and thus modal mixture and other temporary key changes are possible. This method also considers proximity, so modulation can be addressed. 53 To add to this, De Haas et al in 2013 created HarmTrace which stands for Harmonic Analysis and Retrieval of Music with Type-level Representation of Abstract Chord Entities. This tool is useful for tonal works to separate data using harmonic similarity estimation, chord recognition, and automatic harmonization. To explain further, this tool can recognize chords and show that different aspects of a piece are similar because of the harmonic structure or progression. This tool can do so by taking all the chord possibilities into consideration for the specific beat and extracting the most correct one. (The tool can also harmonize a progression which is useful for the performer, but not within the scope of this paper.) This article was included because it furthers Temperley’s PRS: it can provide the automatic harmonization and similarity estimation. It does not need the previous Tonal-Pitch class representation PRS to figure out the specific chords. Instead it puts the possibilities into a hierarchical structure. The authors claim that this model can be used for MIR because it moves beyond theoretical uses and is practical as an internet-based method (De Haas et al 2013). 3.1.4 Melodic Phrase Structure Melodic Phrase Structure is involved in multiple levels of a piece because melody itself often adheres to specific rules and works with other musical structures such as meter and harmony (Temperley 2001). Thus, Temperley’s PRS must take all of these into account to be accurate. The rules are as follows: 1. Gap Rule: prefer boundaries either at time between intervals or at a time at a rest before and interval 2. Phrase Length Rule: prefer 8 note long phrases 54 3. Metrical Parallelism Rule: prefer phrases that start at the same point in the metrical structure The first rule refers to the time that could be between phrases or in a phrase. The Gap Rule is to make phrase boundaries at a rest or after a longer note value because these are both possibilities. An extension of this model will be discussed in 3.1.5 parallelism. 3.1.5 Parallelism In The Cognition of Basic Musical Structures (2001), parallelism was mentioned and treated, and was revisited in Temperlay and Bartlette 2002 article. Parallelism was redefined as follows: a) Parallelism: repetition either exact sequence or contour b) Parallelism rule: “prefer beat intervals of a certain distance to the extent that repetition occurs at that distance in the vicinity” (Temperley Barlette 2002, 134) This twofold definition kept the existing definition but added contour and sequence in essence. Emilios Cambouropoulos, from Aristotle University of Thessaloniki, in 2006 explored parallelism and melodic segmentation using a computer. Cambouropoulos wanted to incorporate parallelism into this method because it is often forgotten by analysts and has an impact on parsing data. Cambouropoulos used the pattern boundary strength profile (PAT) and the Local Boundary Detection Model (LBDM) to find phrase boundaries that take parallelism into account. PAT was first only able to extract patterns that are exactly the same, but Cambouropoulos modified it to extract patterns that are similar. The goal of this modification is to provide a more general application of parallelism which is exactly what Temperley wanted to do with the modification of his prior definition. Cambouropoulos was able to create a basic method for 55 melodic segmentation that incorporates parallelism, but it is not perfect as it does not provide the final segmentation of the piece. As previously mentioned, Hardesty in 2016 published an article on music prediction and generation for rhythm. This method was based on finding parallelism, Lerdahl and Jackendoff’s 1983 publication –A Generative Theory of Tonal Music (1983) –, and the psychological understanding of music. The psychological aspect of rhythm is based on “rhythmic anticipation and parallelism” (Hardesty 2016, 39). This method was only conducted on binary rhythm where strong and weak beats alternate, so the assumption is that an attack on a weak beat is followed by an attack of the strong beat. The method takes derivation of a rhythm to find the underlying operations to generate rhythms. The goal is to “[define] a collection of rhythmic building blocks (Hardesty 2016 abstract)” while taking psychological aspects of rhythm and meter and parallelism into account. The result is a hierarchy of rhythms based on duration. An interesting point is that the final outcome can still be the same if the input is different so long as they are derived from the same rhythm. 3.2 Probabilistic and Statistical models Though this is a separate section from Preference Rules, Probability and Statistics encompasses the same hierarchical structure as a Preference Rule System. Often in Computer Music Analysis, different methods are layered to create an optimal outcome. The incorporation of Probability and Statistics stems from Temperley’s move away from PRSs to a model that is more similar to other fields studying perception. 3.2.1 Introduction 56 In 2010, the Journal of Mathematics and Music published a special edition examining the first movement of Brahms’ String Quartet in C Minor Op. 5, no. 1 to show different perspectives on Computer Music Analysis (referred to in the article as “computer-aided analysis”). The edition brought to light three major developments I explore further: Music Information Retrieval, Optimization, and Machine Learning. This section, however, will concentrate on Optimization in terms of probability and statistics. This will touch on work by David Temperley, Philippe Cathé, and Darrell Conklin. I will also introduce a method of using probability to assist in MIR, introduced in the previous chapter. Temperley sought to improve Preference Rules with Bayesian Probability because it can do the job of preference rules. Preference rules are not used in other perception relation fields, like linguistics, so Temperley took their methods and adapted it for music. Temperley changed from a preference rule approach to a more generative approach using Bayesian Probability, which stems from the accepting of Bayes Rule as correct. This is when the probability of another event happening changes based on the occurrence of a previous event. Combining his previous work with that of Music and Probability (2007), Temperley created Melisma Version 2.0, available online for analysis. Philippe Cathé located at L’Université Paris-Sorbonne looks primarily at Harmonic Vectors and uses a computer to perform the statistical analysis. The computer, however, does not perform the analysis itself, but, instead, treats each as a data file. Cathé attempts to keep the music in mind by, after the statistical analysis, explaining the interaction between the music and the vectors. With Harmonic Vectors, the changes can be heard in recordings making the statistical analysis seem more factual. 57 Darrell Conklin also employed probability, as well as bioinformatics for efficient pattern recognition. Finding patterns is an integral part of analysis but becomes subjective when choosing patterns for study. The goal of Conklin’s work is to create an algorithm to find the distinctive patterns, which are patterns frequent within the piece, the corpus, and infrequent in a selected set of pieces, the anticorpus. This gives the analyst a set of patterns that may be important. 3.2.2 David Temperley’s use of Bayesian Probability In the Cognition of Basic Musical Structures (2001), David Temperley created a set of Preference Rules inspired A Generative Theory of Tonal Music (1983) by Lerdahl and Jackendoff. Similarly, Music and Probability (2007) takes a generative approach and combines it with Bayesian Probability. The reason for using probability was to use similar tools to language and vision because preference rules were not being used in these similar domains. Bayesian probability is a subset of probabilistic rules where the probability of an occurrence is affected by the occurrence of a previous event. This subsection will concentrate on select chapters from Music and Probability (2007). The approach to analysis here is to first do a probabilistic analysis of the Essen Folksong Collection to find the probability of various musical building blocks, such as meter, keys—both in monophonic and polyphonic music—, and melodic ideas. This analysis sets the parameters for the computer program, so that the rest of the pieces analyzed will have a higher accuracy. Using the Essen Folksong Collection5, the parameters are set, and the analysis is completed through a 5 The Essen Folksong Collection is a collection of 10,000 folksongs collected by Helmut Schaffrath. These are located at http://essen.themefinder.org/ http://essen.themefinder.org/ 58 generative process. A generative process works by finding a structure based on the surface content of a work and then generating a surface in multiple choices (keys, meter, etc.). After generating a surface, the program will decide which is the highest probability based on the underlying structure. This simplified method will now be explained for meter, key—both monophonic and polyphonic—, and melodic ideas. Meter has been well studied prior to Temperley’s work in Music and Probability (2007), so this model aims to build upon previous models with a generative approach. A ‘metrical grid’ is generated from the piece based on the parameters set from the remainder of the Essen Folksong Collection, but there are many different possibilities of metrical grids for any given piece. As noted above metrical grid refers to the graphic representation of beats, strong beats, and main beat divisions in three levels as shown in figure 3 (Section 3.1.1) The following steps are used in creating the optimal grid: 1. Decide time signature: choices between duple and triple meter and the individual time signatures within each category 2. Generate the tactus: this is the middle or second level of beats and is based on the notes that are present (simultaneously with 3) 3. Addition of upper level beats: indicates the actual beat division and is the highest level in the metrical grid (simultaneously with 2) 4. Addition of lower level beats: indication of the subdivision required for the excerpt. This is the lowest set of points on the metrical grid 5. Generate note onset: solid vertical lines that indicate where the actual notes line up on the metrical grid. (not in figure) 59 After generating many metrical grids, the tool would test the probability of the onset, with the assumption that the grid was correct. It would then multiple the grid by the probability of the grid itself. This would yield a probability value of statistically less than one and the highest scoring grid would be selected. Upon testing this model on multiple pieces, Temperley compared it to the software that previously used preference rules to find the best fit. The tests showed that the PRS was more accurate when compared to the Bayesian model. Temperley hypothesized reasons for this. The reasons for higher accuracy with the first model is because the perception of rhythm is based on harmony, note lengths, and parallelism as well. Longer note lengths most often occur on strong beats such as the beginning of the measure and the Bayesian Model at the time could not take that into consideration. In creating a computer model that perceives key, the musical facets the mind isolates must be taken into consideration. A key, at least in monophonic pieces, is composed of both pitch proximity and range and Temperley poses the question “What kind of pitch sequence makes a likely melody?” (Temperley 2007). This, once again, is a generative process where all keys are tested, but there is no obvious starting point when examining key, so Temperley relies upon previous research on key-profile. The key-profiles are heavily based on the Krumhansl and Kessler 1982 experiment. The experiment asked participants to rate the degree to which audible pitches belonged to an established key and, from this, a correlation was created. This experiment was successful in major keys, but minor keys were problematic because there are multiple versions of a minor key. Temperley made the needed changes to the established key profiles to incorporate minor keys and began constructing a model using Bayesian Probability. 60 To construct a generative process for key finding, Temperley used the key profiles as a starting point. He did an overall analysis of the Essen Folksong Collection to find a normal distribution, or bell-curve, of the pitches. Following this, a pitch is chosen at random from the peak area of the bell-curve to construct a range profile around it, and then, it is combined with a proximity profile. All keys are tested in this way and the key with the highest probability will be chosen as the key for the melody. This same method for key-finding is problematic for polyphony. This approach takes the structure from the surface material, but the surface of a polyphonic piece is dense and contains notes acting as passing or neighbouring tones. When examining a piece, many notes are not the tonic of a scale, so this would skew most computer programs. Temperley aimed to overcome this obstacle by segmenting the piece on the assumption that pieces stay in the same key for a little while. This assumption is based on the perception-based concept of ‘inertia’ where there is a lack of movement in an item (Larson 2004). In this case, it means that the key will stay the same for the amount of time affected by inertia. This also helps with the second problem of modulation. Modulation occurs when a new key is introduced for an indeterminate amount of time. This is difficult for computer because two, or more, notes act as the tonic at different times in the piece. In the case of polyphony, this is overcome by segmenting the piece into smaller sections, as is already needed to look at polyphonic works. The smaller segments will show a higher probability to one key and a section that modulates will show a higher probability of another key. The segmentation, in turn, will assist in both, identifying modulation and key-finding in polyphonic works. Melodic ideas in this case often involve expectation or error detection where the model attempts to answer this question: ‘does this pitch work in this sequence?’ Pitch expectation is 61 tested in two ways. The first is if the participant expects a pitch and the second is whether a participant can add a pitch. 6 Temperley is concentrating on the first type of test and uses the Cuddy and Lunney (1995) experiment where participants rated the ‘fit’ of the next note in a corpus, not the Essen Corpus, from one to seven. The numbers were converted, by Temperley, into values to use the probability model. The values were used to test the strength of the fit of the note to explore the capabilities of the computer tool and to examine pitch sequence. Here, Temperley realized that the parameters work best if they were created by other pieces from the same corpus. The strength of best fit is much higher (from 0.729 to 0.870 in terms of correlation coefficient), but this shows that the computer tool does not work equally for all music but can give some insight. 3.2.3 Statistics and Harmonic Vectors Harmonic Vectors is a newer harmony theory influenced by Riemann that aims to take a generative and systematic look at tonality that can be used for statistical analysis (Meeùs 2003). Nicolas Meeùs used this term from 1988 and wrote extensively on it into the twentieth century. My primary source for background information on Harmonic Vectors is a 2003 Meeùs article entitled “Vecteurs harmoniques.” This takes the motion of scale degrees and systematically sorts them into either Dominant (V) or Subdominant (SD) Vectors. The two types of vector are based on classification of progressions from Schoenberg and Sadaï, who wrote an extension of Schoenberg’s work. The reason for this analysis is the assumption that a chord alone has no meaning but creates its function within a succession of chords; therefore, the meaning is 6 Temperley refers to this as either the perception paradigm or the production paradigm 62 generative. These vectors can be graphically represented and can be used for statistical analysis but may not be representative if done on few works (Meeùs 2003). Philippe Cathé took Harmonic Vectors and combined it with Computer Music Analysis to dig deeper into a set of works. There are three levels of research with Harmonic Vectors: finding regularities, finding pendulums, and finding correlations between the other two levels and the music (Cathé 2010a). Cathé expands on vectorial pairs (Meeùs 2003), an analysis looking at the pairs of side-by-side vectors, and mono vectorial sequences, meaning the same vector repeated, as methods for finding regularities. Pendulums help to further differentiate composers based on their vector use. A pendulum is a series of three vectors where the first and third vector are the same and the second vector is different. The final level of research brings back the music and aims to find correlations between the music and vectors found. The goal is to understand why a vector is used (Cathé 2010a). These three stages help to further explore a set of works. The application of harmonic vectors for statistical analysis was mentioned and used by both Meeùs and Cathé. Both expressed the analysis in a table of percentages, organized by movement of scale degrees, the types of vector, and level, or with graphic representation, as line diagrams or graphs. The diagrams express the amount of each vector (Meeùs 2003), vector pair, mono-vectorial sequences, or pendulums (Cathé 2010a), most often in percentage, and break this down by era and composer. The computer has assisted Cathé in the three-level analysis by cutting down on the time and making the output as unbiased as possible. To perform comparisons, Cathé uses ‘Charles.’ ‘Charles’ is a computer program based on Microsoft Excel that gives proportions vectors (pair, pendulums, etc.) for a certain piece or a set of pieces, or data files. The output is expressed most often in charts or linear graphics. This gives the analyst 63 another method to represent the data and makes comparison easier between eras, composers, and compositions. The idea that works of music taken from different eras sound different is not new. Harmonic Vectors aims to show this through the change in proportions between eras. Each era has a different average of each vector, vector pairs, pendulums etc. that can be identified through larger scale comparative analysis (Cathé 2010b) and represented in the form of statistics. In addition to eras, a comparative statistical analysis of harmonic vectors can also be applied to composers and compositions. All composers and compositions are slightly different, so Philippe Cathé took ten versions of Vater unser im Himmelreich and compared the usage of Harmonic Vectors (Cathé 2010b). A composer uses different amount of each vector (pair, pendulums, etc.) by piece, but the percentage remains very close (Cathé 2010a). This can also be used to show the degree of difference between two composers meaning that a composer’s use of vectors is consistent by composer. 3.2.4 Distinctive Patterns using Bioinformatics and Probability Looking for patterns is needed in all analyses and finding patterns that are distinctive is paramount. According to Darrell Conklin, a distinctive pattern is one which is frequent within the corpus when compared to the frequency within the anticorpus. The algorithm that was created aims to find the distinctive pattern within the corpus to narrow down the possibilities for the analyst (Conklin 2008). The corpus is a specific piece or set of pieces that are examined, so the distinctive patterns found is over-represented in the corpus. The anticorpus, on the other hand, is a piece or a set of pieces, often by the same composer, where the distinctive pattern is 64 under-represented. The frequency needed for distinctiveness, the corpus, and the anticorpus are all determined by the analyst. I will now explain a few applications of distinctiveness. In this section, I will look at two different applications of this method done by Darrell Conklin. The first is on the Essen Folksong Collection and the second is on Johannes Brahms’ String Quartet opus 51 no.1. The reason for choosing Conklin’s application is to look at the approach of a researcher who commonly examines Music and Machine Learning (from Basque University Webstire http://www.ehu.eus/cs-ikerbasque/conklin/) and to further explain distinctiveness with an analysis. Both of the analyses use the similar following formula: ΔP ≝ 𝑝(𝑃 ⊕⁄ ) 𝑝(𝑃 ⊖⁄ ) = 𝑐⊕(𝑃) 𝑝(𝑃 ⊖⁄ ) ×𝑛⊕ The middle expression (between the two equal signs) refers to the probability of a pattern in the corpus (⊕) or in the anticorpus (⊖). The last expression is used to find the value of ΔP, also known as likelihood of P (I(P)). The numerator is the total number of a pattern in the corpus and the denominator is the probability of a pattern in the anticorpus multiplied by the total number of events in the corpus. The first analysis was conducted on the Essen Folksong Collection, the same collection used by Temperley in his Music and Probability (2007), and, more specifically, the Shanxi, Austrian, and Swiss folksongs. Conklin was searching for the “maximally general distinctive patterns,” (Conklin 2008,1) which are patterns that can be used for classification but are not so general that they occur in almost all pieces. For a pattern to be considered interesting, or frequent, it must be in a minimum of 20% of the corpus. The likelihood (I (P)), also known as Δ P in later works, must be greater or equal to 3. This study showed that, for each region, there is a maximally general distinctive pattern that can be used for classification purposes (Conklin 2008). http://www.ehu.eus/cs-ikerbasque/conklin/ 65 The second analysis was on the first movement of the Brahms String Quartet, opus 51 no 1, and the anticorpus used was the string quartets no 2 and no 3. For the best comparison, Conklin only uses the first movement of no 2 and no 3. The goal was to show that the motives Forte found in his ariticle “Motivic design and structural levels in the first movement of Brahms’s string quartet in C minor” (1983) are found as distinctive using this analysis, excluding two motives that cannot be maximally general. This is converse to when David Huron revisited the same analysis in 2001, where Huron found that only the alpha motive was distinctive (Conklin 2010). I will now outline what was determined by the analysis. The minimum frequency, in this study, for a pattern is 10 and the likelihood of a pattern, renamed to the ΔP, is minimum 3 to be considered distinctive. The Humdrum kern formats were used for an easily available and computer compatible format. When the analysis was completed, all of Forte’s motives, not including the mentioned exception, were labeled as distinctive (Conklin 2010). This shows that the tool can be used to identify likely distinctive motives, but the analyst will still need to analyse the data for a complete picture. 3.3 Critical Analysis: Optimization The chapter thus far shows the progression made in research in general and specifically that David Temperley made from The Cognition of Basic Musical Structures (2001) to Music and Probability (2007) by exploring the previous research, reasons for looking at probability, and the use of Bayesian Networks. In essence, the recent research in Optimization builds upon what Temperley provides or upon developments mirrored by Temperley. (Temperley has more recent publications, but these will be discussed in the conclusion of the thesis.) 66 3.3.1 Preference rules: Metrical Structure The Smith and Honing use of Morlet wavelets was discussed in 3.1.1 as a method to incorporate expressive timing into beat induction. This method has its limitations. Firstly, the method does not work by exposing the tool to the music because the input must be in an isolated rhythm form. This means the tool cannot perform beat induction on a non-separated piece. Another issue is the selection of tempo is not as sensitive as needed. This method has made leaps and bounds in testing and creation but cannot currently work as a stand-alone program. And, because of its current limitations, the method cannot be a simple online application at this point, so it is only useful to a small number of people. The first improvement is to make it either a stand-alone program or an addition to another larger tool. As its own stand-alone program, it would have much use to a researcher but may be a teaching aid for a student to learn expressive timing or beat induction. A more wide-spread use of this tool would be in playback software for scores to determine the efficacy of a playback. If the tool could not find the tempo of a piece as played in a playback, then it would show that the playback is not as similar to human playing. However, this tool does help to further the goal of Optimization by getting closer to human beat induction. In time, if work on beat induction continues, researchers may understand how people can find the beat and adapt it quickly. 3.3.2 Preference Rules: Counterpoint The extensions of counterpoint from 2000 to 2016 have concentrated on the evaluative or compositional side, but they are still useful to analysts. The Komosinki article concentrates 67 heavily on composition but it gives an evaluative approach for the generated composition. On a smaller scale this tool is useful for evaluating a first species counterpoint which is taking an opposite direction than Temperley. It has been included to show a different use for Temeprley’s Preference Rules. It is useful to an analyst by giving a general outline of evaluative criteria needed by a computer. On its own, it needs to stay with a generative model because of the dominated vs. non-dominated output, but it is a good model for future evaluations of generative models. The tool proposed by Giraud et al gives the analyst a strong head start on fugue analysis if the subject is properly identified. This tool is best used on a larger corpus of similar fugues (i.e. by the same composer in the same era) if it were to be combined with probabilistic models. The best probability of subject length, key notes used etc., is found when the corpus is evaluated independently. This was a trend in probability, because probability of certain gestures change based on the composer. This tool would indeed be best used in conjunction with a probabilistic model, but extra work needs to be done to separate a set of fugues into streams or voices. To separate the voices Temperley’s preference rules to examine streams can be used if they are indeed one in the same. However, neither of these tools examine fugues with multiple instruments. This is left for further work. 3.3.3 Preference Rules: Tonal-Class representation and Harmony HarmTrace can estimate the harmonic similarity, recognize chords, and automatically harmonize an input. This tool does not need a set of Tonal-Pitch Class rules or key profiles. Instead it uses a hierarchical structure to narrow down its choices. The authors of the article further say that this model can be used for MIR because it is practical as an internet-based 68 method. An issue that is not addressed is what kind of input can be used with HarmTrace this is one of my five Critical issues. If the input needs to be separated in some way then old Humdrum files could be used, but if there is an image score input then any clear scan of a score can be used. Another common input is a music notation software input (such as a Finale file), but these formats are specific to the notation software that is being used. Furthermore, an audio file input would be optimal because they are widely available, but this is not practical because no recording is perfect. 3.3.4 Melodic Phrase Structure and Parallelism The PAT—pattern boundary strength profile—and LBDM—Local Boundary Detection Model—have improved with Cambouropoulos’s modifications in 2006, but since then parallelism has not been in the forefront of research. This more generalized application of parallelism is imperative for pieces where a repetition is ornamented or changed slightly, but it is often not considered with analysis tools because they often examine recurring features or one specific task. Boundary detection is generally used for parsing data and by incorporating parallelism the boundaries are more accurate. By putting HarmTrace and PAT/LBDM together, the output could have a higher accuracy and can provide a precise parsing of data as needed for analysis. The final segmentation could be obtained for the PAT/LBDM outputs by using the HarmTrace harmonic infrastructure. This would be a way to leverage the strengths of both models to provide the user with a more complete outcome. The Hardesty 2016 tool for examining rhythm has a strong basis in rhythm and music generation. It has further uses in optimization because it incorporates psychological elements, 69 however, the goal is not completely realized. The tool can only process and generate binary rhythm, but, with further research, the tool can come close to the human music prediction. Thus, it furthers optimization’s goal in understanding how humans perceive rhythm and can predict it. 3.3.5 Probability and Statistics The tools presented in the section on Probabilistic and Statistical models take three different approaches to using probability and statistics in Computer Music Analysis. Temperley looked at Bayesian probability, the set of probabilistic principals following the acceptance of Bayes’ Rule, to incorporate his previous research in PRSs with cognition in similar fields to music. Cathé’s approach aims to always keep the music in mind, so the computer looks at every data file, music in this case, and the analyst makes the final comparisons and assumptions looking at both music and harmonic vectors. Darrell Conklin takes bioinformatics and probability for finding distinctive patterns, and the method parses music giving the analyst the patterns that may be important. Temperley’s use of Bayesian probability is to be used in his online database. Overall, the generated coefficients can be used in other probabilistic models and in other corpus studies. As was stated by Temperley, the coefficients are more accurate when generated for a specific corpus, so for maximal accuracy this should be done. Furthermore, these coefficients can be used in any generative theory if they are based on the same corpus. This is also its limitation since re- analysing a set of works when investigating a different corpus is time consuming. This can sometimes defeat the purpose of a computer model as it does not save time and energy. Overall these models take a set of data and provide an output of specific generalizations. For example, Cathé has generalized the percentage of use for each harmonic vector by composer, 70 meaning that each composer has a distinct percentage. This can be further combined with a study on authorship in 1963. This study was on literary works and measured the specific ratio of simple words such as upon, such etc. The amount some words were used is distinctive to the authors. The Poisson Process, a specific aspect of probability, was adapted to complete this method. This could potentially be adapted to music where, instead of words, harmonic vectors are used. This application is further discussed in the concluding section of the thesis. 71 Chapter 4- Machine Learning 4.1 Introduction to Machine Learning Machine Learning can be defined as the process of teaching a computer (the machine) to devise new tasks, and in the case of music, to perform these new tasks on musical works. This has applications for many aspects of Computer Music Analysis, but the focal point of Machine Learning is the tool itself. The tool or method must provide a relatively accurate output on a first stage analysis so that, in turn, the tool can reliably produce correct output for other pieces. This differs from MIR and Optimization, because for MIR the goal is a database, and for Optimization, as I have described it, the goal is to understand and reproduce a human perception of an input. Music poses many challenges for any computer-based analytical tool, and, as such, the analysis of full works of music using complex ideas is not common in Machine Learning. Machine Learning is used in multiple disciplines. When used for music, the input is often over simplified (Widmer 2006). The field of Machine Learning as applied to music is still in its infancy. Thus, I can only give a cursory overview of some of its developments. (Recently, a special issue of the Journal of Mathematics and Music concentrated on Music generation in Machine Learning, but this is an exceptional development.). In this section, I show several emerging possibilities for Machine Learning as well as precedents. I do so in an introductory manner because the actual processes of Machine Learning and their application are too complex to be treated exhaustively in a thesis of this scope. (I will discuss the literature of Machine Learning primarily from the angle of a music theorist although it holds considerable possibilities for other domains such as composition.) Unlike previous 72 chapters, the critical analysis for this chapter is in the conclusion of the thesis, since Machine Learning has importance to Computer Music Analysis as a whole. 4.2 Outline of Selected Tools In this section, I aim to expose different tools in Machine Learning. First, I start with a tool that assists guitarists with ornamentation. The next two sections build upon one another as they are both created by Darrell Conklin and the second builds upon the first in terms of segmentation. It is also an application of the multiple-viewpoint system discussed in the Literature Review. The final tool is an analysis of analysis using Machine Learning. I concentrate on Kirlin and Yust’s smaller details because it is one of the few Machine Learning models that directly adds to music analysis. 4.2.1 Ornamentation in Jazz Guitar I begin with a recent development in the application of Machine Learning to music. For jazz guitar works, ornamentation is important becauseit is how expression is conveyed, but it is not written in the score. The performer must come up with the ornamentation themselves or go through countless recordings. Giraldo and Ramírez have attempted to address this problem with Machine Learning. This tool aims to take an “un-expressive score” (Giraldo and Ramírez 2016, 107) and add expressive ornaments to it. This machine learning tool uses 27 sets of audio input from a professional guitarist as a test set. Using a group of ornamentation vectors, the audio input was aligned with the score to create an expressive score of the recording. In effect, a non- expressive score was put together with a set of vectors derived from expressive scores. While the 73 primary goal of the study was to create a Machine Learning tool, a secondary goal of this tool was to give new guitarists an expressive score to read to help them learn the ornamentation practices. Following the use of the test set, the tool was further tested on un-expressive input to get an expressive output. The output of the tool was a generated MIDI or other audio format recording that combined the un-expressive score with the Machine Learning ornamentation. The researchers determined that the overall stylistic and grammatical correctness of the tool is a statistical 78%. This tool does need further work, especially in refining itself as a Machine Learning tool. In terms of its secondary goal however, it does fill a void in jazz guitar performance. 4.2.2 Melodic Analysis with segment classes Darrell Conklin’s name appears frequently in machine learning as applied to music. His research centres around the problem of music as a multi-faceted entity. The article, entitled “Melodic Analysis with Segment classes” (Conklin 2006), is a stepping stone towards his later research that I will discuss in 4.2.3 (The basis for this article includes the Conklin and Whitten 1995 article discussed in the Literature Review). Conklin’s 2006 article depends upon a concept called “viewpoints.” The idea behind viewpoints is to take a cross section of musical structures and estimate the accuracy of the output. The aim of this study is to “demonstrate how the viewpoints representation for segment classes can be applied to interesting music data mining tasks” (Conklin 2006, 350). Conklin’s method is based on a study of natural language and its segmentation. For data mining, music must first be in a format understood by the computer and it must be hierarchal. 74 Accordingly, Conklin uses specific hierarchal and searchable terminology. A musical object is a note, a segment is a set of musical objects, and a sequence is a series of many segments in a specific order. Melody is a type of sequence: it is a set of notes in a specific order where the order of those sets is specified. Segmentation is a fundamental aspect of Conklin’s analysis. There were two methods of segmentation tested. The first was phrase boundaries and the second was meter. Each test involved segmentation created using a viewpoint based on a set of pitches. The particular expression determined by Conklin is as follows: set(mod12(intref(pitch,key))). The method succeeded most with phrase and metric segmentation undertaken by beat (98%), note (92%), and bar (91%). (There was also successful interval level (94%) which was not segmented.) As is obvious by the percentages, the most successful was for segmentation by beat. While the immediate task in Machine Learning is to create a tool, Conklin’s secondary task was to discriminate style. The segmental viewpoint by beats can be used in future models for the secondary task. Conklin discusses the further work that needs to be done in this regard. Firstly, the length of segments must be examined for a corpus, meaning a collection of a style of music. Secondly, the problems of the automated segmentation, meaning the segmentation done by the computer, should be compared to human segmentation. 4.2.3 Chord sequence generation with semiotic patterns Conklin’s 2016 article, “Chord Sequence Generation with Semiotic Pattern,” addresses the semiotic value in trance music—a type of fast electronic music, like techno, centred predominantly in Europe—when the latter is generated by a Machine Learning model. Aspects of the chords in trance music have intrinsic meaning and, therefore, the meaning must be kept to 75 have an accurate stylistic representation of the music. Conklin’s model aspires to generate a chord sequence for trance music that keeps the qualities of trance music intact. The semiotic patterns of trance music are defined as a sequence of “paradigmatic units” (Conklin 2016, 94). According to Conklin, the paradigmatic unit is when an idea is given a variable (a letter name) so that a pattern of these variables can be discussed. Viewpoints, a statistical model discussed above, is used to map, or create, an output according to a plan. Conklin’s viewpoints are based on the following criteria: chord, root, chordal quality, inter- onset-interval (meaning the start and stop points of a particular sound), duration, chord diatonic root movement, chord quality movement, a combination of root and quality. Conklin describes the combination of root and quality as “crm. (cross product) cqm.” The cross product is a common vector operation. This combination was chosen to generate the chord, taking into account the intrinsic meaning for trance music. I should note that Conklin only used a sampling of trance songs, so the results need to be further examined in terms of a larger trance corpus. The goal in Machine Learning is the tool itself. Conklin states that the best algorithms, like the ones presented and other viewpoints, can be determined for a corpus. To further explain this, important aspects of a corpus can be identified, and the best algorithms can be defined and used like the “crm (cross product) cqm” used for generation in this article. Conklin also mentions that this method can be used for analysis. 4.2.4 Analysis of analysis Kirlin and Yust’s 2016 article “Analysis of Analysis: Using Machine Learning to Evaluate the Importance of Music Parameters for Schenkerian Analysis” aims to get a machine to develop the music theory branch of Machine Learning. The goal of the article was to create an 76 analysis of a score using a model resembling Schenkerian Analysis. While this goal was not realized, the article is still noteworthy because of what the researchers explored and the Machine Learning tool they created. Schenkerian Analysis involves reducing the work in question by finding patterns of ornamentation and elaboration. This task is difficult to teach a computer without stipulating the exact features to find. Kirlin and Yust defined eighteen features and then sorted them into categories. These became stepping stones towards creating a Machine Learning tool. First a hierarchy of notes was created using a tool called a “maximal outerplanar graph.” Then the eighteen features were defined as they relate to the Left note, Middle note, and Right Note. 7 The middle note has the following six features: • SD-M The scale degree of the note (represented as an integer from 1 through 7, qualified as raised or lowered for altered scale degrees). • RN-M The harmony present in the music at the time of onset of the center note (represented as a Roman numeral from I through VII or “cadential six-four”). For applied chords (tonicizations), labels correspond to the key of the tonicization. • HC-M The category of harmony present in the music at the time of the center note represented as a selection from the set tonic (any I chord), dominant (any V or VII chord), predominant (II, II6, or IV), applied dominant, or VI chord. (The dataset did not have any III chords.) • CT-M Whether the note is a chord tone in the harmony present at the time (represented as a selection from the set “basic chord member” (root, third, or fifth), “seventh of the chord,” or “not in the chord”). 7 These lists from pages 135-136 are shortened versions of the lists presented in Kirlin and Yust 2016 77 • Met-LMR The metrical strength of the middle note’s position as compared to the metrical strength of note L, and to the metrical strength of note R (represented as a selection from the set “weaker,” “same,” or “stronger”). • Int-LMR The melodic intervals from L to M and from M to R, generic (scale-step values) and octave generalized (ranging from a unison to a seventh). (Kirlin and Yust 2016, 135) The left and right notes together have the following twelve: • SD-LR: scale degree (1–7) of the notes L and R. • Int-LR: melodic interval from L to R, with octaves removed. • IntI-LR: melodic interval from L to R, with octaves removed and intervals larger than a fourth inverted. • IntD-LR: direction of the melodic interval from L to R • RN-LR: harmony present, as a roman numeral, in the music at the time of L or R • HC-LR: category of harmony present in the music at the time of L or R, represented as a selection from the set tonic, dominant, predominant, applied dominant, or VI chord. • CT-LR Status of L or R as a chord tone in the harmony present at the time • MetN-LR A number indicating the beat strength of the metrical position of L or R. The downbeat of a measure is 0. For duple or quadruple meters, the halfway point of the measure is 1; for triple meters, beats two and three are 1. This pattern continues with strength levels of 2, 3, and so on. • MetO-LR A number indicating the beat strength of the metrical position of L or R as an oridinal variable and treated differently in the algorithm • Lev1-LR Whether L, M, and R are consecutive notes in the music 78 • Lev2-LR Whether L and R are in the same measure in the music • Lev3-LR Whether L and R are in consecutive measures in the music (Kirlin and Yust 2016, 135-6) These features are sorted into melodic, harmonic, metrical, and temporal categories as follows • Melodic: SD-M, SD-LR, Int-LMR, Int-LR, IntI-LR, IntD-LR • Harmonic: RN-M, RN-LR, HC-M, HC-LR, CT-M, CT-LR • Metrical: Met-LMR, MetN-LR, MetO-LR • Temporal: Lev1-LR, Lev2-LR, Lev3-LR (Kirlin and Yust 2016, 136) Then these categories are narrowed down and ranked by importance. This yields a hierarchy with harmony at the top, followed be melody, then meter, and finally temporality. The results showed that harmony is the most important marker for the reductions in terms of harmonic context and identification of non-chord tones which is obvious for an analyst, but it is important to have the computer achieve the same outcome. Melody is the next most important marker, when harmonic context and non-chord tones do not give enough information about scale-degree progression and interval patterns. Following this, meter is applied to anything that is undetermined. Though this procedure seems obvious to the analyst, the hierarchy of steps is the most important part to the computer because it gives the computer a specific order to follow. To reiterate, this has not been fully tested, but it is useful for the understanding the creation of a Machine Learning model. 4.3 Summary 79 In this chapter, I have shown a few of the recent developments in Machine Learning applied to Music. I have traced the work of Darrel Conklin in particular, since he is a pioneer in the field and continues to contribute to research. As noted above, I have not included a critical analysis section, because the comments I would have made there are more appropriate to the concluding chapter of the thesis, since they address the current state of the field. 80 Chapter 5- Conclusion As noted earlier in this thesis, there are different streams in Computer Music Analysis and I have concentrated on Music Information Retrieval (MIR), Optimization, and Machine Learning. These streams often run in parallel because of their different goals. In my concluding chapter I consider some of the most recent developments in Temperley’s work, offer methods to bridge the parallels, and present solutions, both general and specific, for the five critical issues mentioned in Chapter 1. 5.1 Further Temperley Research and Probability Following the Cognition of Basic Musical Structures (2001) and Music and Probability (2007) Temperley continued his work on borrowing music-like concepts from other disciplines. Two articles, “Information Flow and Repetition in Music” (2014) and “Information Density and Syntactic Repetition” (2015) adapt concepts from other disciplines to further Optimization. The first article adapts uniform information density as a methodology, which is probability based, borrowed from psycholinguistics and used to further explain parallelism— when parts of a musical work are repeated in an exact or similar fashion and, thus, can be considered as “parallel.” Temperley renamed the concept “information flow for repetition in music” and tested it on the Barlow and Morgenstern corpus of musical themes8. Temperley found that in parallel sections of a piece the repetition is often more chromatic, but where this is the case the overall piece has a higher probability of smaller diatonic intervals. Thus, the 8 A set of 10,000 themes available in print under co-authors Barlow and Morgenstern 81 juxtaposition of chromatic and diatonic intervals makes the parallelism stand out. Temperley also notes that harmony impacts the repetitions. The second article looks even closer at parallelism and information flow. It states that “less probable events convey more information” (Temperley and Gildea 2015, 1802). Temperley’s conclusion is consistent with what is referred to in the analysis of prose in the “Inference in an Authorship Problem” (Monstellar and Wallace 1963). This article explains that specific words indicate more than others about an author. I notice that by potentially using Poisson Process and negative binomials—two standard concepts in Probability and Statistics— the specific author of a passage in a multi-author work can be found. This links to Temperley because they follow the acceptance of Bayes’ Rule and are, therefore, part of Bayesian Probability. Temperley’s most recent contributions to the field of Computer Music Analysis is this multi-disciplinary borrowing of research tools. It is the interdisciplinary approach more than any other development that holds greatest potential for the field 5.2 Machine Learning as a means to an end Machine Learning concentrates on the tool itself. Since this is the most recent development in computer research, and touches on Artificial Analysis, I have left the critique until the conclusion. Because Machine Learning focuses on the tool, it does not have a larger goal other than creating a better tool. This method is best used, in the grand scheme of Computer Music Analysis, as a way to improve and bring other aspects of Computer Music Analysis closer to its goals. 82 5.3 CompMusic as an example of Intersection Some methods of using different streams of Computer Music Analysis have been suggested by the authors cited throughout the thesis, but I would like to add my own suggestion: researchers need to coordinate more closely in developing their work. I believe this will further the goals of MIR, Optimization, and Machine Learning. I will focus on “CompMusic,” since it brings together several previously unconnected avenues of research. In this regard, it can serve as an example for the rest of the Computer Music Analysis community to emulate. CompMusic, also known as Computational Models for the Discovery of the World’s Music aims to investigate non-western music. More specifically, “its goal is to advance music analysis and description research through focusing on the music of specific non-Western musical cultures” (CompMusic Project and Workshops 2012, 8). The research project is supported by the European Research Council and the coordinator, Xavier Serra is centred in Spain (CompMusic Website http://compmusic.upf.edu/ ). CompMusic has used multiple streams to finish their database within a few years—2011 to 2017. It seeks “to challenge the current Western centered information paradigms” (CompMusic). It concentrates on five traditions of world music: Hindustani, Carnatic, Turkish-makam, Arab-Andalusian, and Beijing Opera (CompMusic). Music research has traditionally focused on Western Music, so the researchers for CompMusic had to start from very little. Because of their short time frame, probability, statistical models, and machine learning were used. Within CompMusic, Machine Learning is used to solve specific problems that hinder the progress of the database, such as in the structure analysis of Beijing Opera (Yang 2016). Initially, resources such as probabilistic and statistical models were used to find novel ways to solve specific problems. For example, with Maghreb, a Moroccan type of music (which is a subset of http://compmusic.upf.edu/node/2 83 Arab-Andalusian music), annotation was difficult, so a tool was created the fix these issues (Sordo et al. 2014). These methods were then adapted to be used in the database. Since combining different approaches in Computer Music Analysis worked well for CompMusic, I can foresee that the same could work for an MIR project like SIMSSA. To me, it appears that researchers are not sharing their tools and procedures to an optimal degree. This is partially due to a geographic issue, since researcher in MIR, Optimization, and Machine Learning seem to be in different parts of the world. If David Temperley, Darrell Conklin, and members of the SIMSSA project, such as Ichiro Fujinaga and Andrew Hankinson, were to share their tools and approaches more closely, I believe that there could be many new creative problem-solving methods. One example is the previously mentioned solution to the authorship problem (mentioned in Further Temperley research). 5.4 Five general areas for improvement in the field In writing this thesis, I have observed five general areas where improvement can be made. What is needed is the following: first, an institutional critical analysis of the field; secondly, a closer coordination between Optimization and Machine Learning; thirdly, research into authorship; fourthly, exploration into new areas in Machine Learning; and lastly, closer integration of various MIR resources in developing Optimization and Machine Learning. 1. Critical analysis in Computer Music Analysis as a distinct enterprise has not been performed up to this point except for MIREX, Music Information Retrieval Evaluation eXchange. 84 MIREX, in brief, is a “framework for the formal evaluation of Music Information Retrieval (MIR) systems and algorithms” (Downie 2008, 247). The goal of MIREX is to investigate the specific tools and algorithms that are the building blocks of larger databases. This method isolates approaches that are nearing the end of their life cycle and compares the performance of systems with similar goals. This provides data about the accuracy and projected utility of algorithms to researchers who want to work within MIR. MIREX, however, only looks at MIR tools and concentrates heavily on methods examining audio data. It does not seem to consider specific issues and how they can be solved using other streams in Computer Music Analysis. Presumably this limitation will be overcome in the future. 2. Closer coordination between Optimization and Machine Learning. Optimization and Machine Learning have different goals. Optimization, as I have defined it, aims to use computers to mimic a human perception in music to understand the brain. Machine Learning wants to create the specific tool to complete a specific task. However, the end products created by both streams can be used to solve specific problems and tasks in MIR as shown by CompMusic. 3. Research into authorship. In terms of specific items for research, the areas of authorship and what makes a piece a composer’s own work, has room for growth. This is important for proper identification of a work’s author when it is unknown. This is a common problem with ancient music. Fresh 85 research could involve the methods put forth by Monstellar and Wallace in 1963 with recent Cathé research on harmonic vectors and their uniqueness to the composer (Cathé 2010a), and Temperley’s research on information flow (Temperley 2014), Bayesian Probability (Temperley 2007), and Syntactic Repetition (Temperley and Gildea 2015). 4. New areas in Machine Learning. Machine Learning has concentrated on music generation and, by using probabilistic and statistical analysis, the music generation can improve by keeping high probability events. Machine Learning can also branch out into more analytical pursuits by mean of analytical algorithms used in Optimization to ‘teach’ a computer to do analysis. This could improve the current analysis available in Optimization and help to further mimic human perception in the machine. 5. Closer integration of various MIR resources in developing Optimization and Machine Learning. I have offered specific examples of Optimization and Machine Learning aiding in the creation of an MIR database. However, the opposite development could occur, where MIR databases could be used to develop new research tools. In particular, Humdrum, an analytical MIR tool, has a reserve of files that can be used for both Machine Learning and Optimization. Similarly, various corpora of music assembled in MIR databases could be used as test sets for the same purpose. 86 5.5 Persaud’s Five Critical Issues with Solutions This thesis has begun the task of a critical analysis by showing different tools in Computer Music Analysis as a whole. The tools selected are of different ages, sizes, and have different researchers associated with them, but all aim to use the computer as their means to an end. I shall conclude the thesis by returning to a set of five particularly acute problems in the field, which I mentioned in my introduction 1. Human error: The problem of human error can be resolved by the creation of more accurate algorithms—either by using harmonic vectors or one of the many Temperley models 2. Specifying input: Improvement in specifying input are imperative to the growth of the field. A researcher reading articles or using pre-existing model needs to know what input should be used. This can be fixed by specifying the input in greater detail in articles and by creating genre-specific standards. 3. Consistent evaluation principles: It is necessary to extend principles used for MIREX to other branches of Computer Music Analysis. Overall, more critical work needs to be done in Computer Music Analysis. Having principles or guidelines will assist in this venture. 4. The interdisciplinary problem of a Lingua franca: To solve this problem Computer Music Analysis should create universal or at least common standards and modes of discourse for describing computer research in music. There are standards for MIR in terms of research tools, algorithms, and systems but those researchers not working in the area are not aware of them. And because many of the tools and procedures are borrowed from other areas of computer research, they are applied in different ways in specialized music research. 87 5. “What’s the Point?”-Undefined goals. The broader audience needs to understand why Computer Music Analysis is important. This can be overcome by looking at the broader scope of each branch. Figure 5 Graphic of 5 critical issues with solutions In the end, there are multiple avenues to take when it comes to solving the Critical Issues in Computer Music Analysis. Here, I have briefly given my own solutions to these issues and other Critical issues 1. Human Error -More accurate algorithms 2. Input Specification -Greater specification in all writings 5. "What's the point?" -Larger scope 4. The Interdisciplinary Problem -Common standards and practices 3. Consistent Evaluative principles -More Critical work in Computer Music Analysis 88 aspects and direction for further research, but I have not explained the importance of Computer Music Analysis. Computer Music Analysis is vital to analysis as a whole because it often adds a quantitative aspect and takes advantage of technology. By incorporating probability and statistics and computational algorithms, the output of the analysis can rely on a mathematical explanation for a qualitative phenomenon. Technology is a fast-growing field and its use in music analysis is inevitable. These new software and hardware move from day to day use into research and improve the field. However, like all changes, it has its own limitations and critical issues. Thesis limits and problems is what fascinates me for this thesis. My overall conclusion is that researchers need to take a critical stance on the discipline for it to grow quickly and efficiently and is a necessity to further improve music analysis. 89 Bibliography Alphonce, Bo H. 1980. “Music Analysis by Computer: A Field for Theory Formation,” Computer Music Journal 4, no. 2: 26-35. Antila, Christopher, Julie Cumming et al. 2014. “Electronic Locator of Vertical Interval Successions. Montreal Digital Humanities Showcase UQAM. (Available as slides, scripts, and poster via Elvis website) Appleton, Jon. 1986. Review of Composers and the Computer by Curtis Roads. Musical Quarterly 72: 124. Birmingham, William, Roger Dannenberg, and Bryan Pardo. 2006. “Query by Humming with the Vocal Search System.” Communications of the ACM 49, no.8: 49-52. Bozurt, Bariş and Karaçali Bile. 2015. “A Computational Analysis of Turkish Makam Music Based on a Probabilistic Characterization of Segmental Phrase,” Journal of Mathematics and Music 9, no. 1: 1-22. Burgoyne, John Ashley, Ichiro Fujinaga and J. Stephen Downie. 2016. “Music Information Retrieval.” In A New Companion to Digital Humanities edited by Susan Schriebman, Ray Siemens and John Unsworth, 213-228. Wiley. Cambouropoulos, Emilios. 2006. “Musical Parallelism and Melodic Segmentation: A Computational Approach,” Music Perception: An Interdisciplinary Journal 23, no 3: 249-268. Cantus Ultimus. SIMSSA. Cathé, Philippe. 2010a. “Harmonic Vectors and Stylistic Analysis: a Computer-aided Analysis of the First Movement of Brahms’ String Quartet op 51-1,” Journal of Mathematics and Music 4, no 2: 107-119. -----. 2010b. “Nouveaux Concepts et Nouveaux Outils pour les Vecteurs Harmoniques” Musurgia 17 no 4: 57-79. “CompMusic Project and Workshops.” 2012. Computer Music Journal 36, no. 4: 8. CompMusic. Music Technology Group, n.d. Web. Accessed 04 Mar. 2017. Computer Music Journal. MIT Press Journals. Conklin, Darrell. 2006. “Melodic Analysis with Segment Classes.” Mach Learn no 65: 349-360. -----. 2008. “Discovery of Distinctive Patterns in Music.” International Workshop on Machine Learning and Music. 90 -----. 2010. “Distinctive Patterns in the First Movement of Brahms’ String Quartet in C Minor,” Journal of Mathematics and Music 4, no. 2: 85-92. ------. 2016. “Chord Sequence Generation with Semiotic Pattern,” Journal of Mathematics and Music 10, no 2: 92-106. Conklin, Darrell and Ian H. Witten. 1995. "Multiple Viewpoint Systems for Music Prediction," Journal of New Music Research 24, no 1: 51-73. Cuthbert, Michael Scott. "Music21: A Toolkit for Computer-Aided Musicology."Music21: A Toolkit for Computer-Aided Musicology. N.p., n.d. Web. 07 Mar. 2017. Dannenberg, Roger B. 2007. “A Comparative Evaluation of Search Techniques for Query-by- Humming Using the Musart Testbed.” Journal of the American Society for Information Science and Technology 58, no 5: 687-701. De Haas, W. Bas et al. 2013. “Automatic Functional Harmonic Analysis,” Computer Music Journal 37, no 4: 37-53. Desain, Peter and Henkjan Honing. 1992. “Time Functions Function Best as Functions of Multiple Times.” Computer Music Journal 16, no 2: 17-34. -----. 1999. “Computational Models of Beat Induction: The Rule Based Approach” Journal of New Music Research 28, no 1: 29-42. Donnelly, Patrick J. and John W. Sheppard. “Classification of Musical Timbre Using Bayesian Networks,” Computer Music Journal 37, no. 4: 70-86. Downie, J. Stephen. 2003. “Music Information Retrieval.” Annual Review of Information Science and Technology 37: 295-340. Downie, J. Stephen 2003. “The music information retrieval evaluation exchange (2005-2007): A window into music information retrieval research,” Acoustical Science & Technology 29, no. 4: 247-255. El-Shimy, Dalia and Jeremy R. Cooperstock. 2016. “User-Driven Techniques for the Design and Evaluation of New Musical Interfaces,” Computer Music Journal 40, no 2: 35-46. ELVIS Project: Music Research with Computers. < https://elvisproject.ca/> Fujinaga, Ichiro and Susan Forscher Weiss. 2004. “Music” In A Companion to Digital Humanities, edited by Susan Schreibman, Ray Siemens and John Unsworth. Oxford: Blackwell. Giraldo, Sergio and Rafael Ramírez. 2016. “A Machine Learning Approach to Ornamentation Modelling and Synthesis in Jazz Guitar,” Journal of Mathematics and Music10, no 2: 107-126. Giraud, Mathieu et al. 2015. “Computational Fugue Analysis,” Computer Music Journal 39, no 2 : 77-96. 91 Gulati, Sankalp et al. “Time- delayed melody surfaces for raga recognition.” Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR'16), New York City (USA). Hankinson, Andrew, Evan Magoni, and Ichiro Fujinaga. “Decentralized Music Document Image Searching with Optical Music Recognition and the International Image Operability Framework.” In Proceedings of the Digital Library Federation Forum. Vancouver, BC, 2015 Hardestry, Jay. 2016. “A Self-Similar Map of Rhythmic Components,” Journal of Mathematics and Music 10, no. 1: 36-58. Helsen, Kate et al. 2014 “Optical Music Recognition and Manuscript Chant sources,” Early Music no. 42: 555–58. Huron, David. 1988. “Error Categories, Detection and Reduction in a Musical Database,” Computers and the Humanities no. 22: 253-264. ------. 2001. “Tone and Voice: A Derivation of the Rules of Voice-Leading from Perceptual Principles,” The Journal of the Acoustical Society of America 19, no 1: 1-64. The Humdrum Toolkit: Software for Music Research. 2001. http://www.musiccog.ohio-state.edu/Humdrum/FAQ.html Iñesta, José M., Darrell Conklin, and Rafael Ramírez. 2016. “Machine Learning and Music Generation,” Journal of Mathematics and Music 10, no. 2 :87-91. Kacprzyk, Janusz and W. Ras Zbigniew. 2010. Advances in Music Information Retrieval. Berlin: Springer International Publishing Karaosmanoǧlu, M. Kemal. 2012 “A Turkish Makam music Symbolic Database for Music Information Retrieval: SymbTr,” Proceedings of ISMIR. Keller, Robert et al. 2013. “Automating the Explanation of Jazz Chord Progressions Using Idiomatic Analysis.” Computer Music Journal 37, no.4: 54-69. Kirlin, Philllip B, and Jason Yust. 2016. “Analysis of Analysis: Using Machine learning to Evaluate the Importance of Music Parameters for Schenkerian Analysis,” Journal of Mathematics and Music 10, no. 2: 127-148. Larson, Steve. 2004. “Musical Forces and Melodic Expectation: Comparing Computer Models and Experimental Results,” Music Perception: An Interdisciplinary Journal 21, no. 4: 457-498. Louridas, Panos and Christof Ebert. 2016. “Machine Learning,” IEEE Software: 110-115. Manning, Peter et al. 2001. "Computers and Music." Grove Music Online. Oxford Music Online. Oxford University Press, accessed April 11,2017 https://simssa.ca/assets/files/hankinson-decentralized-dlf2015.pdf https://simssa.ca/assets/files/hankinson-decentralized-dlf2015.pdf https://simssa.ca/assets/files/hankinson-decentralized-dlf2015.pdf http://www.musiccog.ohio-state.edu/Humdrum/FAQ.html 92 Meeus, Nicolas. 2003. “Vecteurs harmoniques” Musurgia 10, no 3: 7-34. Meredith, David. 2016. Computational Music Analysis. Springer Cham, Heidelberg, New York, Dordrecht, and London: Springer International Publishing Switzerland. Monsteller, F and David L Wallace.1963 “Inference in an Authorship Problem,” Journal of the American Statistical Association 58, no. 302: 275-309. Music21: A Toolkit for Computer-Aided Musicology. < http://web.mit.edu/music21/> Orio Nicola. 2008 “Music Indexing and Retrieval for Multimedia Digital Libraries.” In Agosti M. (eds) Information Access through Search Engines and Digital Libraries. The Information Retrieval Series, vol 22. Springer, Berlin, Heidelberg Pardo, Bryan. 2008. “Music Information Retrieval,” Communications of the ACM 49, no. 8 : 29. Pardo, Bryan et al. 2008. “The VocalSearch Music Search Engine,” JCDL. Patrick, Howard P. 1974. “A Computer Study of a Suspension-Formation in the Masses of Josquin Desprez,” Computers and the Humanities 8: 321-331. Piantadosi, Steven T. et al. 2011. “Word Lengths Are Optimized for Efficient Communication,” Proceedings of the National Academy of Science of the United States of America 108, no. 9: 3526-3529. Ponce de Léon, Pedro J. et al. 2016. “Data-Based Melody Generation through Multi-Objective Evolutionary Computation,” Journal of Mathematics and Music 10, no. 2: 173-192 Roads, Curtis et al. 1986. “Symposium on Computer Music Composition,” Computer Music Journal 10, no 1: 40-63. Search the Liber Usualis. SIMSSA. SIMSSA-Single Interface for Music Score Searching and Analysis. Smith, Leigh M. and Henkjan Honing. 2008. “Time- Frequency Representation of Musical Rhythm by Continuous Wavelets,” Journal of Mathematics and Music 1, no. 2: 81-97 Sordo, Mohamed et al. 2014. “Creating Corpora for Computational Research in Arab-Andalusian Music” Proceeding of the 1st International Digital Libraries for Musicology workshop London (UK). < http://mtg.upf.edu/node/3028> Temperley, David. 2001. The Cognition of Basic Musical Structures. Cambridge, Massachusetts and London, England: MIT Press. ------. 2007. Music and Probability. Cambridge: MIT Press. ------. 2010. “Modelling Common Practice Rhythm,” Music Perception: An Interdisciplinary Journal 27, no. 5: 355-376. 93 ------. 2014. “Information Flow and Repetition in Music,” Journal of Music Theory 58, no. 2: 155-178. Temperley, David and Christopher Bartlette. 2002. “Parallelism as a Factor in Metrical Analysis,” Music Perception: An Interdisciplinary Journal 20, no. 2: 117-149. Temperley, David and Danial Gildea. 2015. “Information density and Syntactic Repetition,” Cognitive Science, no. 139: 1802-1823. Tenkanen, Atte. 2010. “Tonal Trends and α-Motif in the First Movement of Brahms’ String Quartet op. 50 mvt. 1,” Journal of Mathematics and Music 4, no. 2: 93-106. Viglilensoni, Gabriel et al. 2011. “Automatic Pitch Recognition in Printed Square-Note Notation.” Proceedings of 12th International Society for Music Information Retrieval Conference Miami, Florida: 423-428. Wang, Ge, Perry R. Cook, and Spencer Salazar. 2015. “ChucK: a Strongly Timed Computer Music Language,” Computer Music Journal 29, no. 4: 10-29. Yang, Yile. 2016. “Structure Analysis of Beijing Opera Arias” Master Thesis, Universitat Pompeu Fabra, Barcelona (Spain).