Registration (9am – 10am / 1pm – 2pm)
Morning Tutorials (10am – 1pm)
T1: Bayes and Markov Listen to Music – George Tzanetakis, University of Victoria, Canada
● Room: 201
T2: Leveraging MIDI files for Music Information Retrieval – Colin Raffel, Google Brain, USA
● Room: 206
Afternoon Tutorials (2pm – 5pm)
T3: A basic introduction to audio-related music information retrieval – Meinard Müller, Christof Weiss, International Audio Laboratories Erlangen, Germany
● Room: 208
T4: So you want to conduct a user study in MIR? – Andrew Demetriou, Delft University of Technology, Netherlands; Audrey Laplante, Université de Montréal, Canada; Sally Jo Cunningham, University of Waikato, New Zealand; Cynthia Liem, Delft University of Technology, Netherlands
● Room: 203
T5: Machine-Learning for Symbolic Music Generation – Pierre Roy, Spotify, France; Jean-Pierre Briot, Paris VI – SONY CSL, France
● Room: 207
Tutorial Abstracts
T1: Bayes and Markov Listen to Music
Music is a very complex signal with information spread across different hierarchical levels and temporal scales. In the last 15 years in the field of Music Information Retrieval (MIR) and Music Signal Processing there has been solid progress in developing algorithms for understanding music signals with applications such as music recommendation, classification, transcription and visualization. Probabilities and probabilistic modeling play an important role in many of these algorithms. The goal of this tutorial is to explore how probabilistic reasoning is used in the analysis of music signals.
The target audience is researchers and students interested in MIR but the tutorial would also be of interest to participants from other areas of signal processing as the techniques described have a wide variety of applications. More specifically the tutorial will cover how basic discrete probabilities can be used for symbolic music generation and analysis, followed by how classification can be cast as a probability density function estimation problem through Bayes theorem. Automatic chord detection and structure segmentation will be used as a motivating problems for probabilistic reasoning over time and Hidden Markov Models more specifically. Kalman and particle filtering will be described through real-time beat tracking and score following. More complex models such as Bayesian Networks and Conditional Random Fields and how the can be applied for music analysis will also be presented. Finally the tutorial will end with Markov Logic Networks a formalism that subsumes all previous models. Through the tutorial the central concepts of Bays Theorem, Markov assumptions and maximum likelihood estimation and expectation maximization will be described.
More material available here
George Tzanetakis is a Professor in the Department of Computer Science with cross-listed appointments in ECE and Music at the University of Victoria, Canada. He is Canada Research Chair (Tier II) in the Computer Analysis and Audio and Music and received the Craigdaroch research award in artistic expression at the University of Victoria in 2012. In 2011 he was Visiting Faculty at Google Research. He received his PhD in Computer Science at Princeton University in 2002 and was a Post-Doctoral fellow at Carnegie Mellon University in 2002-2003. His research spans all stages of audio content analysis such as feature extraction,segmentation, classification with specific emphasis on music information retrieval. He is also the primary designer and developer of Marsyas an open source framework for audio processing with specific emphasis on music information retrieval applications. His pioneering work on musical genre classification received a IEEE signal processing society young author award and is frequently cited. He has given several tutorials in well known international conferences such as ICASSP, ACM Multimedia and ISMIR. More recently he has been exploring new interfaces for musical expression, music robotics, computational ethnomusicology, and computer-assisted music instrument tutoring. These interdisciplinary activities combine ideas from signal processing, perception, machine learning, sensors, actuators and human-computer interaction with the connecting theme of making computers better understand music to create more effective interactions with musicians and listeners. More details can be found http://www.cs.uvic.ca/gtzan.
T2: Leveraging MIDI files for Music Information Retrieval
MIDI files are a widely-available digital score format which contain a bounty of valuable information about a given piece of music. A MIDI file which has been matched to a corresponding audio recording can provide a transcription, key and meter annotations, and occasionally lyrics for the recording. They are also useful in very large-scale metadata-agnostic corpus studies of popular music. Despite their potential utility, they remain underused in the music information retrieval community. The purpose of this tutorial is to expose attendees to the promise of leveraging MIDI files in MIR tasks.
The motivation for having this tutorial now is the release of the Lakh MIDI Dataset (LMD), a collection of 178,561 MIDI files of which many have been matched and aligned to corresponding entries in the Million Song Dataset. The tutorial will therefore include a mix of explanatory sessions on research involving MIDI files and hands-on demonstrations of utilizing the LMD. Attendees will leave the tutorial with a strong awareness of what MIDI files are, what sort of information we can extract from them, what steps are necessary for leveraging this information, practical knowledge of how to utilize MIDI files, and an idea of tantalizing prospects for future research.
Colin Raffel is a researcher focused on machine learning methods for sequences, with a particular interest in music data. He is currently a Research Scientist at Google Brain. In 2016, he completed a PhD in Electrical Engineering at Columbia University In LabROSA, supervised by Dan Ellis. His thesis focused on learning- based methods for comparing sequences, with the particular application of matching MIDI files to corresponding audio recordings. Prior to his PhD, he completed a Master’s at the Center for Computer Research in Music and Acoustics and a Bachelor’s at Oberlin College.
T3: A basic introduction to audio-related music information retrieval
The main goal of this tutorial is to give an introduction to Music Information Retrieval with a particular focus on audio-related analysis and retrieval tasks. Well-established topics in MIR are selected to serve as motivating application scenarios. Within these scenarios, fundamental techniques and algorithms that are applicable to a wide range of analysis and retrieval problems are presented in depth. Including numerous figures and sound examples, this tutorial is intended to suite for a wide and interdisciplinary audience with no particular background in MIR or audio processing. This tutorial consists of eight parts, each lasting between 20 and 25 minutes. The first two parts cover fundamental material on music representations and the Fourier transform—concepts that are required throughout the tutorial. In the subsequent parts, concrete MIR tasks serve as starting points for our investigations. Each part starts with a general description of the MIR scenario at hand and integrates the topic into a wider context. Motivated by a concrete scenario, each part discusses important techniques and algorithms that are generally applicable to a wide range of analysis, classification, and retrieval problems.
More material available here.
Meinard Müller studied mathematics (Diploma) and computer science (Ph.D.) at the University of Bonn, Germany. In 2002/2003, he conducted postdoctoral research in combinatorics at the Mathematical Department of Keio University, Japan. In 2007, he finished his Habilitation at Bonn University in the field of multimedia retrieval. From 2007 to 2012, he was a member of the Saarland University and the Max-Planck Institut fur Informatik. Since September 2012, Meinard Müller holds a professorship for Semantic Audio Processing at the International Audio Laboratories Erlangen, which is a joint institution of the Friedrich-Alexander-Universitat Erlangen-Nürnberg (FAU) and the Fraunhofer-Institut fur Integrierte Schal̈tungen IIS. His recent research interests include music processing, music information retrieval, audio signal processing, and motion processing. Meinard Muller has been a member of the IEEE Audio and Acoustic Signal Processing Technical Committee from 2010 to 2015 and is a member of the Board of Directors of the International Society for Music Information Retrieval (ISMIR) since 2009. He has co-authored more than 100 peer-reviewed scientific papers, wrote a monograph titled Information Retrieval for Music and Motion (Springer, 2007) as well as a textbook titled Fundamentals of Music Processing (Springer, 2015, www.music-processing.de).
Christof Weiß studied physics (Diplom) at the Julius-Maximilians-Universitat Wurzburg as well as composition (Diplom-Musik) at the Hochschule fur Musik Wurzburg, Germany. From 2012–2015, he worked as a Ph.D. student in the Semantic Music Technologies Group at the Fraunhofer Institute fur Digitale Medientechnologie (IDMT) Ilmenau, Germany. His Ph.D. thesis deals with computational methods for tonality and style analysis in music recordings and was supervised by Prof. Karlheinz Brandenburg. In 2014, he visited the Centre for Digital Music at the Queen Mary University of London for two extended research stays. Since 2015, Christof Weiß has been a member of the Semantic Audio Processing Group headed by Prof. Meinard Muller at the International Audio Laboratories Erlangen. He conducts research in a project on Wagner’s “Ring” cycle, which is a collaboration with the musicology department of the Universitat des Saarlandes, Saarbrucken. His work as a composer encompasses pieces for orchestra, ensemble, and choir, as well as chamber music. In 2013, he was awarded a second prize in the competition “Pablo Casals” in Prades, France. He was commissioned by the Mozartfest Wurzburg and the festival “Young Euro Classics” Berlin. In 2012, he received the Youth Cultural Advancement Award of the city of Amberg, Germany. From 2007–2015, Christof Weiß was a fellow of the Foundation of German Business in the study and the Ph.D. scholarship program.
T4: So you want to conduct a user study in MIR?
This tutorial will consist of three main parts. In the first one, we will provide an overview of the user studies in the ISMIR community as well as in other domains such as psychology, music sociology, musicology, library and information science, and HCI, highlighting the important scholars and summarizing the major themes. In the second part, we will present an overview of the different user research methods. We will cover the commonly used methods in social science research (e.g., interviews, surveys, focus groups, written/audio journals, task-based experiments, tracking biological data, etc.), discuss the suitability of each method for different research projects, and the strengths and weaknesses of each method.
The third part will consist of an interactive session during which the participants will be invited to brainstorm research questions that are relevant to their own MIR research projects and could be answered by conducting an interdisciplinary user study. Individual participants or group of participants will present their ideas and they will receive feedback from the presenters as well as from other participants.
Andrew Demetriou is a research assistant in the Multimedia Computing research group at TU Delft. He completed a research masters in social psychology at VU Amsterdam (2015), with a focus on biological data collection methods, and mate choice/romantic attraction. His research has been published in Letters on Evolutionary Behavioral Science, Journal of Crime and Delinquency, Proceedings of ISMIR 2016, and Proceedings of 10th ACM Conference on Recommender Systems. His research interests include social and romantic bonding, optimal mental/physiological states (e.g. “flow”, mindfulness), and how music, along with biological and sensor data, can be used to study these phenomena.
Audrey Laplante is an associate professor at the Université de Montréal’s Library and Information Science School. She is a member of the Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT). She received a PhD (2008) and a Master’s (2001) in information science from McGill University and Université de Montréal, respectively. Her research interests focus primarily on the information-seeking behaviour of music researchers and amateurs, and on systems for music information retrieval and discovery. Her research has been published in a variety of outlets, including Library & Information Science Research, the Proceedings of the ISMIR, Journal of Documentation, and the collective book New Directions in Children’s and Adolescents’ Information Behavior Research (Emerald Group Publishing, 2014).
Sally Jo Cunningham is an associate professor of Computer Science at the University of Waikato (Te Whare Wānanga o Waikato), in Hamilton, New Zealand. Her research focuses on everyday, authentic information behavior over a range of media (text, music, images, and video). Sally Jo was advised by her flute instructor to choose a major other than music as an undergraduate; now she enjoys experiencing music through the experiences of other people with her MIR research. She is active in the digital libraries and human-computer interaction research communities—a member of the steering committee for JCDL; program co-chair for ICADL 2008, JCDL 2014, DL 2015, ISMIR 2017; general chair for ICADL 2017; chair of the IEEE/CS TCDL (2016-2017)—and has over 120 refereed research publications in digital libraries, music information retrieval, human-computer interaction, and machine learning.
Cynthia Liem is an Assistant Professor in Computer Science in the Multimedia Computing Group of Delft University of Technology, and pianist of the Magma Duo. She initiated and co-coordinated the European research project PHENICX (2013-2016), focusing on technological enrichment of symphonic concert recordings with partners such as the Royal Concertgebouw Orchestra. Her research interests consider music and multimedia search and recommendation, and increasingly shift towards making people discover new interests and content which would not trivially be retrieved. Beyond her academic activities, Cynthia gained industrial experience at Bell Labs Netherlands, Philips Research and Google. She was a recipient of the Lucent Global Science and Google Anita Borg Europe Memorial scholarships, the Google European Doctoral Fellowship 2010 in Multimedia, and a finalist of the New Scientist Science Talent Award 2016 for young scientists committed to public outreach. She will serve as general co-chair of ISMIR 2018, to be hosted in Delft, The Netherlands.
T5: Machine-Learning for Symbolic Music Generation
The goal of this tutorial is to present in a comprehensive way the challenges and techniques of using computers for generating musical content. Various kind of techniques will be considered from Markov models to deep learning models, with the goal of presenting both the state of the art and the current limitations and open problems. The tutorial will cover essentially symbolic music generation, with an emphasis on leadsheets, seen as the primary form of mainstream music, as well as polyvocal music.
We will cover two classes of models: Markov models, and in particular Markov Constraints models, which have been particularly successful at modeling monophonic material as well as leadsheets. We’ll describe the underlying models that can be learned efficiently, and will illustrate them with many examples of generated music in various styles.
We will also cover deep learning models and application to polyvocal music. After reviewing the basic components of deep architectures (neural layers, autoencoders, recurrent networks…), we will describe how they can be used in a direct way, e.g., to produce musical accompaniment, or in a more indirect way (by controlling sampling or unit selection and aggregation, etc.) for a finer control of generation. Various examples of architectures, experiments and approaches will be analyzed and compared.
More material available here.
Dr. Pierre Roy is a researcher in the Tech Creation Lab at Spotify, in Paris, France. Pierre Roy has a background in pure mathematics and in Artificial Intelligence, Operations Research, and Machine Learning. He received his Ph.D. from the University Paris 6 – Pierre et Marie Curie. He has designed and implemented EDS, a genetic-algorithm based machine learning system for the automatic creation of acoustic features for the classification of sounds and music, a system that is now embedded in many consumer electronic devices commercialized by Sony Corp. He is interested in applying problem solving techniques to content generation, with a focus on the computer-assisted creation of texts and music. He has made many contributions to the domain of Constraint Satisfaction Problems (CSP), especially in relation with music generation. He has implemented the BackTalk and BackJava constraint solvers, used in many text and music generation applications. With François Pachet, in the late 90s, they pioneered the automatic generation of playlists with global constraints and they introduced Markov constraints, a global constraint framework bridging the gap between statistical Markov models and complete problem solving techniques. Markov constraints are a choice technique for the generation of text or musical sequences with control on both stylistic consistency and structural properties.
Jean-Pierre Briot is a computer scientist trained in mathematics and computer science at Université Pierre et Marie Curie (Paris 6). He has a long interest in music and computer science, since his PhD conducted at IRCAM and Paris 6 in 1984. He is a CNRS Research Director at Laboratoire d’Informatique de Paris 6 (LIP6), research consultant for the Flow Machines Project at Sony CSL and permanent visiting researcher at PUC-Rio University in Brazil. He is also a regular musician (jazz and Brazilian music).