The folded space of machine listening
Keywords:Machine listening, listening space, media archeology, acoustic fi ngerprint, audio watermarking, copyright detection algorithms.
The paper investigates new machine listening technologies through a comparison of phenomenological and empirical/media-archeological approaches. While phenomenology associates listening with subjectivity, empiricism takes into account the technical operations involved with listening processes in both human and non-human apparatuses. Based on this theoretical framework, the paper undertakes a media-archeological investigation of two algorithms employed in copyright detection: “acoustic fi ngerprinting” and “audio watermarking”. In the technical operations of sound recognition algorithms, empirical analysis suggests the coexistence of a multiplicity of spatialities: from the “sound event”, which occurs in three-dimensional physical space, to its mathematical representation in vector space, and to the one-dimensional informational space of data processing and machine-to-machine communication. Recalling Deleuze’s defi nition of “the fold”, we defi ne these coexistent spatial dimensions in techno-culturally mediated sound as “the folded space” of machine listening. We go on to argue that the issue of space in machine listening consists of the virtually infi nite variability of the sound event being subjected to automatic recognition. The diffi culty lies in conciliating the theoretically enduring information transmitted by sound with the contingent manifestation of sound affected by space. To make machines able to deal with the site-specifi city of sound, recognition algorithms need to reconstruct the three-dimensional space on a signal processing level, in a sort of reverse-engineering of the sound phenomenon that recalls the concept of “implicit sonicity” defi ned by Wolfgang Ernst. While the metaphors and social representations adopted to describe machine listening are often anthropomorphic – and the very term “listening”, when referring to numerical operations, can be seen as a metaphor in itself – we argue that both human listening and machine listening are co-defi ned in a socio-technical network, in which the listening space no longer coincides with the position of the listening subject, but is negotiated between human and nonhuman agencies.
Bengert, J., & Upward, A. (2003). Elec 499A. Perceptual Audio Project. http://ece.uvic.ca/~elec499/2003a/group09/index.htm.
Biancorosso, G. (2016). Situated Listening: The Sound of Absorption in Classical Cinema. Oxford: Oxford Scholarship Online.
Bishop, C.M. (2006). Pattern Recognition and Machine Learning. New York: Springer Internal Publishing.
Cavanaugh, W.J., & Wilkes, J.A. (1999). Architectural Acoustics: Principles and Practice. Hoboken: John Wiley & Sons.
Davarynejad, M., Ahn, C.W., Vrancken, J., van den Berg, J., & Coello Coello, C.A. (2010). Evolutionary hidden information detection by granulation-based fi tness approximation. Applied Soft Computing, Vol. 10, Issue 3, pp. 719-729. https://doi.org/10.1016/j.asoc.2009.09.001.
DeLanda, M. (2006). A new philosophy of society: assemblage theory and social complexity. London: Continuum.
Deleuze, G. (1993). The Fold: Leibniz and the Baroque. Trans. by T. Conley. London: The Athlone Press.
Desai, N., & Tahilramani, N. (2016). Digital Speech Watermarking for Authenticity of Speaker in Speaker Recognition System. 2016 International Conference on Micro-Electronics and Telecommunication Engineering (ICMETE). https://doi.org/10.1109/ICMETE.2016.13.
Di Scipio, A. (2013). Sound object? Sound event! Ideologies of sound and the biopolitics of music. Soundscape. Journal of Acoustic Ecology, 13: 10-14.
Ernst, W. (2016). Sonic Time Machines: Explicit Sound, Sirenic Voices and Implicit Sonicity. Amsterdam University Press.
Ernst, W. (2017). The Delayed Present: Media-Induced Tempor(e)alities & Techno-Traumatic Irritations of “the Contemporary”. Aarhus: Sternberg Press.
Ernst, W. (2018). Radical Media Archaeology: Its Epistemology, Aesthetics and Case Studies. Artnodes, 21, pp. 35-43. https://doi.org/10.7238/a.v0i21.3205.
Feaster, P. (2011). A compass of extraordinary range: the forgotten origins of phonomanipulation. ARSC Journal XLII/ ii.
Labelle, B. (2008). Background Noise: Perspectives on Sound Art. New York: Continuum.
Latour, B. (2005). Reassembling the social. An introduction to actor-network theory. London: Oxford University Press.
Li, J., Deng, L., Gong Y., & Haeb-Umbach, R. (2014). An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 22, No. 4, pp. 745-777. https://doi.org/10.1109/TASLP.2014.2304637.
Mackenzie, A. (2007). Protocols and the irreducible traces of embodiment: The Viterbi algorithm and the mosaic of machine time. In: R. Hassan & R.E. Purser (Eds.), 24/7: Time and temporality in the network society (pp. 89–106). Stanford, CA: Stanford University Press.
Meillasoux, Q. (2006). Après la fi nitude. Essai sur la nécessité de la contingence. Paris: Editions de Seuil.
Morton, T. (2013). Hyperobjects. Minneapolis: Unversity of Minnesota Press.
Ong, W.J. (1982). Orality and literacy. The technologizing of the word. London: Methuen & Co.
Peters, J.D. (2004). Helmholtz, Edison and Sound History. In: L. Rabinovitz, & A. Geil (Eds.), Memory Bytes. History, Technology and Digital Culture. Durham: Duke University Press. https://doi.org/10.1215/9780822385691-008.
Pieraccini, R. (2012). The voice in the machine. Building computers that understand speech. Cambridge: MIT Press. https://doi.org/10.7551/mitpress/9072.001.0001.
Schaeffer, P. (2017). Treatise on Musical Objects . Oakland: University of California Press.
Schalkwijk, J. (2018a). A fi ngerprint for audio. https://medium.com/intrasonics/a-fi ngerprint- foraudio-3b337551a671.
Schalkwijk, J. (2018b). Hiding data in sound. https://medium.com/intrasonics/hiding-data-insound-c8db3de5d6e0.
Schroeder, M. (1965). New method of measuring reverberation time. J. Acoust. Soc. Am., 37, pp. 409-412. https://doi.org/10.1121/1.1939454.
Sterne, J. (2003). The Audible Past. Cultural Origins of Sound Reproduction. Durham: Duke University Press.
Sterne, J. (2015). Space within Space: Artifi cial Reverb and the Detachable Echo. Grey Room 60: 110-131. https://doi.org/10.1162/grey_a_00177.
van der Maaten L., Postma E., & van den Herik, J. (2009). Dimensionality Reduction: A Comparative Review. TiCC, Tilburg University, The Netherlands.
van Tilborg, H.C.A., & Jajodia, S. (Eds.). (2011). Encyclopedia of Cryptography and Security. New York: Springer US. https://doi.org/10.1007/978-1-4419-5906-5.
Voegelin, S. (2010). Listening to Noise and Silence. Towards a Philosophy of Sound Art. New York: Continuum.
Walczyński, M., & Ryba, D. (2019). Effectiveness of the acoustic fi ngerprint in various acoustical environments. IEEE Signal Processing 2019: Algorithms, Architectures, Arrangements, and Applications (SPA). https://doi.org/10.23919/spa.2019.8936781.
How to Cite
The journal allow the author(s) to hold the copyright without restrictions. The journal allows the author(s) to retain publishing rights without restrictions.