Comparing Field-Recordings: Listening in and listening to Hong Kong


Strolling through street markets in Monk Kok, bypassing the cacophony of EDM from Lan Kwai Fong’s nightclubs, slurping soup noodles in a Sai Ying Pun local restaurant or taking a breather in Sun Yat Sen Memorial Park; while recording Hong Kong city sounds with Soundman OKM II binaural microphones I asked myself many times what I actually was doing. In two fieldwork sessions, one in February 2017 and one in March 2018 I was collecting sound recordings from various places in Hong Kong with a focus on the quarters of Sai Ying Pun on Hong Kong Island and Mong Kok on the Kowloon peninsula. While most parts of Hong Kong are characterized by a high density of inhabitants, Mong Kok stands out with up to 130,000 inhabitants per square kilometer. On top of that it is perennially frequented by both tourists and locals for shopping, food and entertainment. Sai Ying Pun is a gentrified residential neighborhood that became increasingly popular with expats due to the newly built MTR connection to the city center. It is also densely populated (57,000 inhabitants per square kilometer) and characterized by pencil towers (cf. Christ and Gantenbein) that are located along the steep slopes of Victoria Peak and on reclaimed land along Victoria Harbour. Yet, its streets are less crowded than those in Mong Kok and especially up the hills it can be relatively quiet. Analyzing the sounds of these neighborhoods provides means for a time considering comparison of intensities of activity and usage of public space. The cases of Mong Kok and Sai Ying Pun are especially interesting, because while both are shaped by its vertical expansion, Mong Kok has a much higher amount of visitors moving in and out horizontally. Sai Ying Pun instead feels more like a community, where people walk from flat to shop and back to flat, but due to the high amount of flats, it still has a density of shops to be characterized as downtown. Roughly Sai Ying Pun could be categorized as a residential area with a lot of shops, Mong Kok as a shopping area with a lot of residents.

The use of in-ear microphones allows to reproduce the heard sounds three-dimensionally when played with headphones. This facilitates the understanding the actual experience of listening in places, even though it cannot substitute it. Listening with or without the microphones are two different things, and as I tried to listen to the city as casually as possible, the microphones felt like something that would interfere with a day-to-day Hong Kong listening experience. It seems that sound recordings are useful for typical soundscape studies, in which different city sounds are located, contextualized and distinguished into keynote sounds, signals, and sound marks (cf. Schafer). The actual experience of listening in a city thus seems unreproducible.

In this paper, I want to address different aspects of comparison that are crucial for an anthropology of sound. I want to show that listening always involves a form of comparison. I argue that the comparability of sound recordings was crucial for the development of what is now called ethnomusicology. I want to stress the difference between listening to sound recordings and listening in situ. I will try to illustrate my point with examples from my research considering the soundscape of density in Mong Kok and Sai Ying Pun. 

Uetz, A. (2019). Comparing Field-Recordings: Listening in and listening to Hong Kong. Contour Journal, (4).
1. Listening as comparing

In order to elaborate how listening involves forms of comparison, it is important to highlight the difference in definitions of listening and hearing first.

Hearing can be defined as an ability that is based on having ear-drums and a brain that can transform sound waves into neural activity. Listening instead requires the skills of hearing and focussing on heard sounds. It is the attempt to make sense of the transmitted sound waves; to compare them with memorised sound figures, contextualise or filter them, localise sound sources. Listening involves a sense of time, aesthetics and taste. Comparison is always part of a listening practice. I walk through the bar area of Lan Kwai Fong on Sunday morning and I find it quieter than it was on Saturday night. While hearing is immediate, listening is mediated. Following Emily Thompson [3] and Alain Corbin [4], there is a difference in what people hear (the sound of their era and environments) and what people listen to (the meaning of this sounds for their daily lives). Building their research on historical sources, both gather information on what could have been heard referring to artefacts, inventory lists, maps, building plans, descriptions of movement, city life and politics. (Thompson focuses on architecture and technology, Corbin on church bells and village structures.) What is heard relates to factual things, to architecture, materials, places. To find out how people were listening, both turn to reports that refer to emotions toward sounds (Corbin mostly court transcripts, Thompson newspaper reviews). There are many sub definitions or techniques of listening, e.g. scientific modes of listening such as monitory, diagnostic, exploratory, and synthetic listening [5] (p.14), or acousmatic listening [6] as listening without seeing the sound sources, or reduced listening [7] as listening without identifying the sound sources. Apparently listening is a quite sophisticated skill, and, considering musical pleasures or delectable storytelling, also requires some sort of education and practice. Even though such a definition of listening is challenged by the fact that most of our listening itself is ubiquitous [8], which means that also when we seemingly just hear sounds we listen to them too and vice versa. The main point for now is to stress the involvement of comparison in all of the described forms of listening, whether it is comparing the sounds with an ideal or stereotype or comparing them with undefined sounds, as when listening to the development of sounds in experimental music.

This differentiation of listening and hearing is very crucial for my research in Hong Kong, because I want to know what the city’s sounds signify to people and thus what sounds they are listening to. Sound recordings tell us what people hear, to find out what they are listening to, I need to go along [9] (p. 463) with them, ask questions, let them listen to my recordings and comment. The hope is, that by following people in their daily routines, by walking through a neighborhood with inhabitants, their listening focus will appear. I then also compare what locals listen to with my own listening experiences.

Lucius Burckhardt’s Promenadology explores how landscape is aesthetically and cognitively apprehended while taking a stroll. Its main point is that landscape is something created in the mind by comparing sites with a culturally shaped expectation. Therefore, landscape would rather be a subjective aesthetic experience than a factual site [10] (p.307). The aesthetic experience of a city could be similarly distinguished from its factual buildings. Parallel to that the soundscape seems to be constructed by our perception while listening in situ. It is the comparing of the heard sounds with an expectation of what a place should sound like. This is also why, if we cut sounds out of context with audio-editing, we might not be able to figure out their source. This counters Ingold’s [11] critique of the term soundscape as paradox separation of sound and place which binds the fluctuating medium of sound to the static idea of the landscape, but also reduces the manifold senses involved in the experience of landscape to hearing.

Listening in an urban environment is a multi-sensorial activity. To illustrate this, I refer to a sound walk with a local musician on Queen’s Road West in Sai Ying Pun. When we passed a group of people who were discussing loudly while unloading boxes from a truck she commented: “Dodgy.” When I now acousmatically listen to the sounds of the recording – meaning listening without being able to see the sound-sources [6] – I have no way of knowing whether she reacted to the sound of their loud voices, or whether she semantically listened to what they were saying in Cantonese, or whether she reacted to what she saw rather than what she heard. When I let her listen to the recording later, she commented that it was probably more the visual impression that led her to this comment, but the sounds alone were quite upsetting as well. To understand the nuances of listening it is helpful to combine different listening strategies, e.g. reduced listening [7] such as listening without identifying the sources of sounds, semantic listening (ibid.) or listening in place.

2. Comparing sound recordings

The invention of the phonograph by Thomas Edison in the late 19th century gave way to the development of the academic discipline of comparative musicology (Vergleichende Musikwissenschaft), which later became known as ethnomusicology. The availability of sound recordings from cultures all over the globe evoked hope that through comparison of scales, rhythm and other musical parameters a global history of music could be found [12] (p. 225). The phonograph could only reproduce a small range of mid frequencies and was only suitable for louder instruments or loud singing. Also, the recording time was limited to about two minutes due to the length of the wax cylinders and unstable in tempo and pitch due to its mechanical spring drive. Field recordings from the early twentieth century had to be staged and were far off from the unobtrusive ideal of the field recordings provided by tape and later digital technology. Yet, the new technology was praised as a progress towards scientific objectivity [13] (p. 81).

A similar macro, etic or almost quantitative approach to collecting and comparing music was in the 1960s promoted by Alan Lomax, who used a grid to compare music from different cultures:

This grid was not designed to replicate the music, already accurately recorded on tape, but to rate it on a series of rating scales (loud to soft, tense to lax, etc.) taxonomically applicable to song performance in all cultures. Thus song can be compared to song, song to speech, and hopefully to other aspects of behavior [14] (p.3).

Lomax believed that the style of music is revealing truth about the culture behind it. In style, he finds knowledge that goes beyond language, music becomes the key to understand cultural practice [14] (p.12).

Opposition against the etic, quantitative and structuralist influenced comparison of sound recordings from different places and cultures comes from ethnographers and anthropologists that practice emic and qualitative case studies. As a form of “doing anthropology in sound” Steven Feld in the 1990s [15] not only spent years with the Kaluli, an indigenous tribe in the Bosavi region of Papua New Guinea, he also recorded hours of sounds, from songs to working sounds, birds, rain and other sounds of the rainforest. In this context, he was able to show how the musical tradition of the Kaluli and the soundscape of the Bosavi region correlate. Interestingly Feld is not opposed to a global comparison of sounds, but stresses the necessity to first understand musical practice within the local circumstances:

I think we need to pioneer a qualitative and intensive comparative sociomusicology, without reified and objectified musical and social structural trait lists, without unsituated laminations of variously collected and historically ungrounded materials. Comparative sociomusicology should take the tough question and sort them out with the best materials available for detailed comparison: the thorough, long-term, historically and ethnographically situated case study. The meaningful comparisons are going to be the ones between the most radically contextualized case examples, and not between decontextualized trait lists [16] (p. 180).

With the “Voices of the Rainforest” CD, Feld [17] tried to compile his experience on an album with eleven tracks that represent a day in the Bosavi rainforest. In my opinion, the project does show the correlation between the soundscape and the musical practice and thus can be seen as a form of doing anthropology in sound. What it can only gradually implement is the experience of hearing while being in the rainforest. Would the subtropical heat influence the ears? Could one enjoy the sounds in a similar way when busy swatting mosquitos?

While contextualising musical practice in the local soundscape, with the process of recording and editing, sounds are detached from their origin. This de-contextualisation doesn’t have to be negative, as it allows better analysis and comparison of sounds, that otherwise couldn’t be studied as conveniently. This not only applies to field recordings but recorded music in general. Eisenberg for example emphasises that phonography and the radio helped to convince people, that Jazz was not just entertainment but music; while it was initially only heard through a filter of liquor and dope in nightclubs, it could then be listened analytically [18] (p. 60).

Still, what should be avoided is the trap to treat sound recordings as authentic representations of the hearing experience. While an usual studio production is not even trying to reproduce a live concert but create something of its own, a live recording represents the concert but it cannot reproduce the hearing experience in the concert hall. In the same way, a soundscape recording can only represent the soundscape, but it cannot reproduce the experience of hearing in place.

3. Listening in place

Listening in place means the locality in which one is listening, a geographically situated place in differentiation to the notion of space as an abstract dimension such as the cyberspace [19](p. 3). Even though one cannot escape being in a place, one can avoid listening to this place with the use of headphones, which allow to listen to other sound spaces, to be immersed in one’s own audiotopia [20] (p. 529). Such audiotopia in fact is the negation of the soundscape of place. When in the field, the musicologist is listening to the soundscape. What I was doing in Hong Kong was listening in Hong Kong as a place. When I am now listening to the recordings, on the contrary I am listening to Hong Kong as a space, to a memory captured in sound.

Being immersed [11] (p. 10) in the sound of Hong Kong as a place is fundamentally different from listening to its recordings. Being in a place requires actual physical presence. I do emphasize my presence in a place with my attention, with listening to the sounds I am immersed in.

When comparing field recordings, I compare Hong Kong as a space, I compare the sound of different spaces or moments in Hong Kong. While doing so, I ignore the soundscape of the place that I am currently in. Being immersed in the soundscape is listening in a place, which is different from listening to the sound-space of a recording.

This ontological difference between these two modes of listing correlates well with Ingold’s distinction between anthropology and ethnography:

Anthropology is studying with and learning from; it is carried forward in a process of life, and effects transformations within that process. Ethnography is a study of and learning about, its enduring products are recollective accounts which serve a documentary purpose [21] (p. 3).

For Ingold, anthropology is being in place, is doing rather than studying, is learning by doing, is making things, a performance rather than observation, it consists of emic case studies. Ethnography on the contrary is collecting and comparing data, it is observing from the outside, it can be both qualitative or quantitative fieldwork but it is characterised by a certain distance from the object of study.

Listening in place seems similar to this form of anthropology. Listening in Hong Kong was like a performance, an engaging and alienating experience, at most a participant observation but predominantly a delirious search for orientation in a maze of dense and intertwining sounds. Listening was selecting and filtering sounds, it was walking with open ears while the body could influence the heard soundscape by moving, resting, accelerating, escaping or getting closer. Wearing binaural in-ear microphones, I was aware that my movements were shaping what can be heard on the recording. Thus, walking developed into a means of composition, at least a process of selection. It couldn’t be much further away from the objectivity the pioneers of comparative musicology were talking about.

Yet, the recordings I made, as arbitrary and subjective they might be, can now be compared and analysed, they become ethnographic data. I can now listen to myself listening in Hong Kong. But with that, the immediacy of listening in Hong Kong is lost. How could I close the gap between these two forms of listening? How can we combine emic anthropological research with a comparative ethnographical approach?

4. Comparing Mong Kok and Sai Ying Pun

In order to gain knowledge about the impact of vertical densification on the soundscape I consider both recordings and impressions from walks in Mong Kok and Sai Ying Pun. I compare recordings and field notes from different times at the same place, for example Sai Yeung Choi Street South in Mong Kok, which at the weekends is turned into a pedestrian area and crowded with buskers competing for shoppers’ attention with a recording from the same spot on a Monday morning. Or I compare the sounds and impressions from a rooftop in Say Ying Pun to those of a rooftop in Mong Kok under similar conditions, e.g. on a weekday afternoon. I compare sounds from different altitudes of buildings to understand differences in noise exposure of residents, or I record in parks on various occasions to check for options of tranquillity within walking distance. On an abstract level that means I explore fluctuations of sound in one location as well as differences of sounds in various locations under similar conditions (ideally this would be at the same time, but I cannot be at multiple places in one moment).

One of the most interesting findings was to observe that in sound recordings more similarities can be found than when listening in the field. I am guessing it has to do with how different sensual impressions influence each other, which make the experience in either Mong Kok or Sai Ying Pun unique. Having lived in both neighborhoods and spent more or less the same amount of time in them, I am familiar and attached to the two of them and I find things that I like in both. But the intensity and overwhelmingness of sensory stimulation in Mong Kok due to its vibrancy and crowdedness with people makes Sai Ying Pun seem like a quiet place in comparison.

When listening to sound recordings from these neighborhoods in juxtaposition, the picture gets more obscure. Both areas have parts with high noise level, and both areas have parts that are more vibrant, where people are chatting and laughing at night, for example. Of course, Sai Ying Pun doesn’t have a Sai Yeung Choi Street South with buskers, but it has bars or restaurants that play loud music now and then, it has High Street with drinking expats on a Saturday night. Putting spectrograms of recordings from those areas next to each other, we don’t see that many differences.

E.g. if we look at two small sound excerpts that both were recorded on Saturday afternoons when the amount of people is at its peak. One is recorded in Mong Kok walking on the right side of Fa Yuan Street direction north on the 10th March 2018 at 3:52 p.m. while crossing Shan Tung Street. The other is recorded in Sai Ying Pun walking on the left side of Centre Street direction south (upwards the hill) on the 17th March 2018 at 12:09 p.m. while crossing Queen’s Road West.

Just comparing these two snippets of audio recording both neighborhoods sound very similar. It would be very difficult to tell which is which if it is not known. In fact, I find similar parts when listening to hours of recordings from different parts of the city. This mix of traffic sounds, the clicking of the zebra crossing street lights, boxes being unloaded from truck to trolley to shop, people shouting and talking in Cantonese, and the low-pitched buzz from air-conditions form something like a Hong Kong keynote sound [2].

The local differences appear in the bigger picture through contextualization. When listening in place this context is given through the combination of visual, tactile, olfactory and gustatory impressions. We filter and select to what we pay attention with all our senses. The senses influence each other, can amplify or mute, draw attention to certain phenomena while letting other things slip [22]. This is one reason why walking in Mong Kok feels more overwhelming, even if sound level and density of sounds are similar to Sai Ying Pun.

When listening to field recordings contextualization can be given by memory, photographs, fieldnotes, maps, etc. In my case it is also given by the raw and long field recording itself. Here promenadology comes into play. Because I was always recording while walking through neighborhoods I now get sonic blueprints of their diversity. Even though these two recordings sound very similar, the parts which were cut tell a different story. Whilst most streets in Mong Kok are busy 24/7, some parts of Sai Ying Pun are often quiet and some parts are only loud during the opening hours of its shops, bars and restaurants. In my subjective experience as part time resident of both quarters I felt that the quality of life in Sai Ying Pun is higher not because of the absence of noise, but thanks to the possibility to find both, vibrancy and tranquillity within minutes of walking distance. I argue that vertical expansion is making this vibrancy possible by gathering enough potential customers within the residential area (the inhabitants of the skyscrapers) to allow a big variety of restaurants and shops to be run in this area. At the same time traffic stays moderate because people move on foot within the neighborhood. Besides, when parks are surrounded by high rise buildings this can also block traffic noise from bigger streets and provide relatively quiet options for retreats from straining city life.

The overwhelming experience of walking through Fa Yuen Street on a Saturday afternoon comes from the encounter with a vast amount of tourists and locals from all over Hong Kong that are shopping for bargains on the latest sneakers (thus the nickname Sneaker Street). It comes from hundreds of mini busses and busses that are bringing people in and out of the neighborhood.

To me, listening to those recordings also helps to remember how it was to listen in place and to spot the differences. In fact, highlighting these differences is one of the bridges between listening in place and listening to sound recordings. Listening to the recordings of my first field trip in 2017 shaped the way I was listening during my second field trip in 2018. It’s like a hermeneutic circle based on comparison, going back and forth between recordings and listening in place.

5. Listening beyond comparison

Going back to Steven Feld’s idea of doing anthropology in sound, I want to rethink the heuristic value of the sound collages on “Voices of the Rainforest”. Despite the de-contextualisation of sounds in the recording process, the attempt to recompose the listening experience in the field goes beyond ethnographic comparison. The re-contextualisation of musical practice within the possibilities of studio production is in itself a form of art as soundscape-composition. Similar to poetry that tries to go beyond coherency, beyond making sense, beyond transporting a message [23] (p 9), the soundscape-composition tries to go beyond comparison, beyond identification and denomination of sounds. It does not reduce songs with scales and tempo analysis, instead it artificially creates or enhances the ritual context with the help of studio technology, e.g. the amplification of river sounds as claimed inspiration for a song [17].

Considering the act of listening, of walking in the shoes of residents, of strolling around in a city as anthropological fieldwork, the re-composition of that experience, be it with sounds or with words, could be a heuristic alternative. The comparison of different sound recordings later is closing the gap between anthropological re-enactment and ethnographical data collection. Writing about the experience of strolling in Hong Kong is a reflection on an activity rather than documenting and collecting data.

Strolling around with binaural in-ear microphones and reflecting on it is actually “doing anthropology in sound.” Comparison helps to go beyond the singular experience and helps to generate data of ethnographical quality.

If we go back to Burckhardt’s concept of landscape as selection of visual perception and adapt it to the soundscape, then the soundscape would be dependent on our selective attention [22] to the sounds that we consider essential for the particular place. E.g. if we expect to hear a sound recording from a French village, we might focus on the French tourist talking in the Sun Yat Sen Memorial Park and really mistake it for a French village. On many recordings, there are sounds which are hard to ignore due to their volume or character, but expectation can shape the way we hear.

In order to break or change expectations, it is helpful to change listening habits, i.e. listen to very short clips of sounds from a longer recording, filter certain sounds out or change the volume of specific frequencies, change the speed of the recording, etc. Listening beyond comparison could also mean embracing schizophonia [24] (p.7); the deliberate de-contextualisation of sounds from their origin in order to challenge hearing practices. There are endless possibilities for experimentation with sound editing programs; amplification or muting of frequencies, adding reverberation to change the sound-space, using filters or distortion, and many more.

To recapitulate the steps in this paper: First comparison was found to be fundamental to the practice of listening. Then the comparison of sound recordings was considered an essential method in ethnomusicology. Yet, the recording of sound was determined as separation of sound and place, which could compromise soundscape studies. Finally, it was proposed that an anthropology of or in sound should go beyond the comparison of sound recordings, be it through compositional recontextualisation, decontextual focus on particular sounds or in other ways. Last but not least field recordings can not only be compared with other recordings but also with one’s own memories and notes from the fieldwork. Sound recordings facilitate analysis, but they can’t replace the study in the field, in my case listening in the place of Hong Kong.

Declaration of sources of funding and conflicts of interest:

This article is based on research for a PhD-Project (Working Title: Soundscape of Density) at the Institute of Musicology at Berne University. It is part of the research project “Sound, Density, and the Environment” funded by the Swiss National Science Foundation (SNF).


