Jose Luis Crespo is one of the most prominent sound engineers in Spain, specializing in music recording and mixing, including film soundtracks in Dolby Atmos. He is highly sought after by leading composers and film directors in his country.
Antonio Pedrero is a multidisciplinary researcher with a background in engineering and musicology. He specializes in environmental acoustics, acoustic metrology, building acoustics, and virtual acoustic reality technologies. Since 1997, he has served as the technical director of the Acoustic and Vibration Laboratory at Universidad Politecnica de Madrid. He also works as an auditor for acoustic testing laboratories with the National Accreditation Entity. Dr. Pedrero is actively involved in the Spanish Society of Acoustics, where he has been the chair since 2020, and the Spanish Society of Musicology. He is a member of the board of the International Commission for Acoustics.
Xavier Serra is a Professor at the Universitat Pompeu Fabra in Barcelona, where he leads the Music Technology Group within the Department of Information and Communication Technologies. He earned his PhD in Computer Music from Stanford University in 1989, focusing on spectral processing of musical sounds, a foundational work in the field. His research spans computational analysis, description, and synthesis of sound and music signals, blending scientific and artistic disciplines. Dr. Serra is very active in the fields of Audio Signal Processing, Sound and Music Computing, Music Information Retrieval and Computational Musicology at the local and international levels, being involved in the editorial board of several journals and conferences and giving lectures on current and future challenges of these fields. He received an Advanced Grant from the European Research Council for the CompMusic project, promoting multicultural approaches in music information research. Currently, he directs the UPF-BMAT Chair on AI and Music, dedicated to fostering Ethical AI initiatives that can empower the music sector.
From Audio Processing to Music Understanding – a Research Journey
Dr. Serra’s PhD research, carried out in the 1980s, focused on modeling complex sounds. By using spectral analysis and synthesis techniques we developed a deterministic plus stochastic model able to obtain sonically and musically meaningful audio parameterizations. That research found practical applications in synthesizing and transforming a wide variety of sounds, including the human singing voice.
As a natural progression of that research, in the 1990s, it became interesting and relevant to analyze collections of sounds, thus aiming to describe and model the relationships between sound entities. To accomplish this, we incorporated machine learning methodologies to complement the signal processing approaches used until then. This research was the beginning of the Music Information Retrieval (MIR) field, within which the aim is to analyze and describe music collections.
In the 2000s, with the growth of the Web, scaling these analysis technologies gained importance. In our research group we embarked on curating and leveraging large audio collections with which to conduct research in this direction and develop efficient software tools supporting music search, retrieval, and recommendation systems many of which gained relevance for the music industry.
As web-based music applications became globalized, it became clear that the existing research approaches and systems had important cultural biases. Thus, in the 2010s, we started to work on refining music description methodologies, integrating domain knowledge from diverse music traditions. This research led to the development of culture-specific audio signal processing and machine learning approaches to analyze music signals. These methodologies are of major relevance in the field of Computational Musicology, putting the emphasis on the music understanding perspective.
In recent years, the emergence of deep learning techniques and large AI models based on self-supervised approaches has reshaped the research landscape. Presently, we are working on the development of large AI models trained on huge amounts of diverse multimodal music data that can capture the complex relationships that make up music. From those models, we can then develop smaller task-specific models to support applications related to the creation, production, distribution, access, analysis, or enjoyment of music. The challenge here is how to drive our research from an ethical perspective, putting the musician at the center while supporting all the stakeholders of the music sector.
In this talk we will go through this long research journey, highlighting some of the most relevant developments and giving our view on past and current trends in this area of research.
Created by the industry, for the industry, as a non-profit volunteer based organization, designed to inspire, educate and promote the technology and practice of audio, by bringing leading people and ideas together.