A Journal of Rhetoric, Writing, and Culture

Layering Additional Tracks: A Review of Steph Ceraso’s Sounding Composition: Multimodal Pedagogies for Embodied Listening

Kati Fargo Ahern, SUNY Cortland

(Published March 9, 2020)

“But what happens when you have a situation that, like the Goethe and Schelling adage that ‘architecture is nothing but frozen music,’ becomes reverse engineered, remixed into a different scenario—and we thaw the process. Music becomes liquid architecture. Sound becomes unbound.”—Miller, 2008, p. 7


In Steph Ceraso’s much-awaited book, Sounding Composition: Multimodal Pedagogies for Embodied Listening, she offers readers an array of listening topics connected to embodiment, sonic design, and acoustic engineering, as well as research spanning composition and rhetoric, sound studies, materiality, multimodality, and multisensory experience. Despite the fluidity in bringing together such diverse scholarship and sonic artifacts, Ceraso’s book is tightly constructed around three chapters: “Sounding Bodies,” “Environments,” and “Sonic Objects.” This highly structured and dynamic approach offers readers an experience akin to what Miller refers to in the epigraph above—sound unbound by disciplinary silos, yet organized, like liquid architecture.

In addition to her three main chapters, Ceraso includes pedagogical interchapters, which she refers to as reverberations, so named for “the persistence and blending together of ideas from each chapter after it has ended” (Ceraso 11). In combination, these elements of Ceraso’s book help to concretize her overarching question: “How can we help students to cultivate relevant listening practices that allow them to capitalize on the affordances of sound in digital contexts while retraining them to become perceptive listeners-composers in any setting?” (5).

At the end of her introduction, Ceraso offers the following call to action:

Ultimately, the ideas I share in this book are intended to create a BOOM that will shake and unsettle disciplinary approaches to sound and listening. My hope is that this felt noise will result in a more imaginative, inclusive, and transformative sonic education. Like sound itself, the pedagogical suggestions and practices I offer are malleable and should be revised and altered to meet the needs of different disciplines, teachers, and students. In short, this book should be played with, not simply read. I want to encourage readers to sample, mangle, hack, remix, and reinvent its contents. (13)

In this book review, I take up Ceraso’s invitation, not through hacking or remixing, but through what I hope is a layering of sounds. Unlike visual-centric notions of knowledge or even the practice of “identifying gaps” in a work, I conceive of the book review as a sonic relationship. Here I am “layering in additional tracks” based on polyphony, simultaneity, and Ceraso’s own sense of the amplitude of a “felt noise.” My “tracks” loosely follow Ceraso’s chapter structure in exploring 1) people, in the form of a “mixtape” or “playlist” of additional scholarly “pieces,” 2) places, through visual cartography of the fields of research presented, and 3) transcripts and captioning assignments as designed objects. Finally, I will conclude with one last “track,” an entirely nonverbal soundscaping of Ceraso’s chapters as an exercise in transcript design.

Airek Beauchamp, of the sound studies blog Sounding Out!, writes: “Ceraso’s ontology re-centers all experience–and thus the rhetoric and praxis of communicating that experience–back into the whole body.” By layering in additional tracks to people, places, and designed sonic objects, I am not so much re-viewing Sounding Composition as I am communicating my multimodal, sensory experience with the generative and capacious nature of Ceraso’s work.

1. Adding Tracks: People

While no single monograph can cover all extent research or complimentary research areas, Ceraso’s book manages to move with ease from sound studies to composition and rhetoric to material rhetoric, multimodal composition, musicians, sound designers, and other sound practitioners. There is an enviable range of scholarship afforded presence and space within her work. In the interest of “layering in additional tracks,” however, this section presents some additional voices. While this section may look like an annotated bibliography, I hope it will function more like a mixtape or playlist. I’m borrowing the “mixtape” approach from David Green who, in reviewing Kynard’s book in Composition Studies writes:

While I have often used “mixtape” as a way of getting students to think differently about the texts they read and write about in class, Kynard presents, by my account, the first intellectual mixtape monograph. In hip-hop music the mixtape has historically served as an unconventional collection of remixed, unreleased, or original music. Unlike commercially released albums, mixtapes usually target audiences invested in experimentation and playful and critical thought. (162)

I offer the following four pieces with that idea of experimentation, play, and perhaps connections across different audiences. In keeping with one of the great achievements of Ceraso’s book, I’ve kept this “mixtape” focused on composition and rhetoric work that fuses theory and pedagogy.


Sheridan, David M., Ridolfo, Jim, & Michel, Anthony J. The Available Means of Persuasion: Mapping a Theory and Pedagogy of Multimodal Public Rhetoric. Anderson, SC: Parlor Press, 2012.

While not exclusively focused on sound, The Available Means of Persuasion offers a framework for considering multimodal pedagogy, asking students to compose public texts that consider kairos, circulation, and the use of different technologies, media, and means of production (Sheridan et al. 116-117). This attitude toward composing as expansive and multisensory aligns with Ceraso’s own argument for Jody Shipka’s concept of “multimodal soundness.” Ceraso notes: “Employing a theory of multimodal soundness involves asking students to experiment with the rhetorical effects of sound in various contexts, attend to how sound is integrated with other modes and materials, and consider how their compositional choices influence the design as a whole” (130). Both texts seek to expand notions of what it means for students to write, compose, or design.


Halbritter, Bump. Mics, Cameras, Symbolic Action: Audio-Visual Rhetoric for Writing Teachers. Parlor Press, 2012.

Bump Halbritter’s book complicates choices involved in recording audio (and video), such as pickup patterns, and rhetorical reasons behind wanting an omnidirectional, cardioid, or hypercardioid pattern. On the surface, this may seem overly technical, but these discussions create a basis for judging principles of “matching listening systems to listening situations.” Ceraso is similarly concerned with “listening situations” and how the acoustics of a built environment must match its embodied practices (Ceraso 77). For instance, she discusses how churches must support either a focus on speech, music, or a mix of the two, and that this requires different levels of reverberant acoustic design.


Green Jr, David F. It's Deeper Than Rap: A Study of Hip Hop Music and Composition Pedagogy. [Dissertation], 2011.

Although his dissertation outlines numerous connections between hip hop and composition pedagogy, David Green also strongly cautions against appropriating these practices decontextualized from “cultural implications of the construction [of practices like sampling] and its roots in the African American signifyin tradition” (Green 131). Thus, while he discusses crate-digging, cipha, mixtapes, and emceeing within a writing pedagogy, it is also through notions of embodiment and situated context within cultural and community practices. In her reverberation chapter on an assignment called “My Listening Body,” Ceraso notes a tendency for her students to decontextualize listening from embodied identity. She reflects on the need to better emphasize Jennifer Stoever’s concept of the “embodied ear,” and “that physical experience and embodied identity are not separate” (Ceraso 64).


Rodrigue, Tanya K., Artz, Kate, Bennett, Julia, Carver, M.P., Grandmont, Megan, Harris, Dan, Hashem, Danah, Mooney, Anne, Rand, Mike, & Zimmerman, Amy. “Navigating the Soundscape, Composing with Audio.” Kairos: A Journal of Rhetoric, Technology, and Pedagogy vol. 2, no. 1, 2016.  from http://kairos.technorhetoric.net/21.1/praxis/rodrigue/index.html

This elaborate webtext not only gives presence to graduate student and future teacher voices, but also includes theoretical sections, sonic strategies for composing, nine audio projects, and reflections. Alongside the rich complexity and practical use of Ceraso’s reverberation sections, Rodrigue et al.’s webtext offers an obvious companion space for thinking about, reading about, and listening to composed audio pieces. While Ceraso’s reverberations feature discussion of her students’ work becoming “listeners-composers,” “Navigating the Soundscape” also gives voice to students as “teacher-composers.”


There are many, many other voices. An additional place to continue such layering work is Jon Stone’s 2011 HASTAC post linking to a shared bibliography of published pieces on sound: https://www.hastac.org/blogs/jwstone/2011/07/09/sound-studies-bibliography-and-wiki

2. Adding Tracks: Mapping Fields and Places

Stepping away from the mixtape metaphor, this “track” considers a visual/cartographic approach to layering in additional “fields and places.” When Ceraso talks about work that might “bridge” or connect sound studies with rhetoric and composition, though, she does not make a visual map; it could seem like the territory of sonic research exists in these places alone, like this:


Figure 1. A Diagram Showing the Need for Intersection between Sound Studies and Composition and Rhetoric with Two Non-Overlapping Ovals.

Figure 1. A Diagram Showing the Need for Intersection between Sound Studies and Composition and Rhetoric with Two Non-Overlapping Ovals.

As readers of enculturation understand, the terrain of sonic research is multifaceted, and Ceraso herself cites from a far more diverse body of research. Therefore, it might be beneficial to map the need for bridges not only between Sound Studies and Rhetoric and Composition, but also Communication, Media Studies, or English. Thus, a mapping of research in sound, writing, and rhetoric might look more like this:

Figure 2. A Diagram Showing a More Complicated Terrain the Study of Sound in Academic Disciplines. (The Visual Conveys Communication Oval Overlapping with Rhetoric and Composition and Media Studies, with a Smaller Oval of Sound Studies Positioned in the Middle)

Figure 2. A Diagram Showing a More Complicated Terrain the Study of Sound in Academic Disciplines. (The Visual Conveys Communication Oval Overlapping with Rhetoric and Composition and Media Studies, with a Smaller Oval of Sound Studies Positioned in the Middle)

Even this mapping is unsatisfactory in the way it may appear to condense or too neatly locate concepts like materiality. Additionally, where would we locate work such as Theo van Leeuwen’s oft-cited book, Speech, Music, Sound, or multimodal empirical work, such as Sigrid Norris’ Analyzing Multimodal Interaction?

enculturation has recently published innovative work that responds to the need for different methodologies for sonic archives (Jon Stone) and video-sonic methods (Crystal VanKooten). Truly mapping the terrain of sonic research might also mean developing a mapping methodology topographically, in a three-dimensional relationship. In other words, new visual/cartographic methods may be necessary to better map disciplines and fields studying sound. Mapping is not only useful for archiving the development of a field, but is also useful in invention—to make present occluded divots and hidden caves or channel ways to distinct avenues of sonic thought. A mapping methodology for sonic research could help similar research trajectories come into closer conversation across (perceived) disciplinary divisions in the way Ceraso’s work advocates for a bridge between Sound Studies and Composition and Rhetoric.


3. Adding Tracks: Transcript and Caption Assignments as (Designed) Objects

Pieces such as Janine Butler’s “Where Access Meets Multimodality: The Case of ASL Music Videos,” Sean Zdenek’s Reading Sounds, and Melanie Yergeau et. al.’s Multimodality in Motion stand out as exemplary scholarship on accessibility. While Ceraso’s third chapter takes up the car as “designed sonic object,” for my own “layering” I consider a different object, one that has become increasingly present in conversations about sonic composition, pedagogy, and accessibility—transcripts or caption assignments.

Ceraso’s concept of “multimodal listening” as an embodied practice distinct from “earing” is one that she first lays out in “(Re)Educating the Senses.” In that article, she draws on interviews with Dame Evelyn Glennie, a famous deaf musician, to further demonstrate multimodal listening practices and the visual/tactile possibilities of sound. Since the publication of that article, Ceraso is frequently cited in connection to accessibility and the need to expand notions of sound and listening. In fact, Butler cites Ceraso’s article as an example of scholarship that builds on “accessibility, embodiment, and multimodality” in her Composition Forum article on embodied captioning.

Butler argues that while captioning has often been treated as a process that happens after a video has already been created, composers of video should instead move toward design choices that both integrate and embody captioning and subtitles concurrent with all other video design choices. An example she gives for this is recording video with space within a frame to add captions close to the action or facial expressions of the actors.

While Butler’s work has challenged conventions of captions as designed objects, Jen Ware and Ashley Hall have questioned the transcript process for sound files that often leaves out nonverbal sound. In “The Vibratorium,” Ware and Hall consider how to create rich, accessible transcripts when compositions involve nonverbal sounds entirely. In a Computers and Writing presentation, Ware suggested transcripts could better align with theories of translation or other genres, such as a modification of the broadcast transcript. At the same 2019 Computers and Writing Conference, several other presenters including Philip Choong and Leah Heilig discussed assignments involving transcripts. Choong noted his use of a “kinetic typography” exercise to have students think about nonverbal components in vocal delivery, and Heilig discussed the importance of podcasting transcript assignments.

What is most striking is the many different ways that these assignments, exercises, or larger theoretical questions about transcripts and captions challenge relationships to accessible design for sound. Like Ceraso’s reverberation sections challenge notions of cordoning off pedagogical discussion or moving it to “the end,” transcript and caption assignments challenge the notion of designing for multiple channels of engagement without integration. This scholarship highlights the design of captions and transcripts as the design of multimodal, multisensory objects themselves.


4. An Ending Sound

In his conclusion, Beauchamp’s review of Sounding Composition laments the necessary silence of Ceraso’s text, which so generatively evokes sound: “Multimodal composition is not the rule of the day and though the digital is our current realm, text is still the lingua franca. Though it may seem like it will never arrive, Ceraso is preparing us for the many different attunements the future will require.” Therefore, I offer one last track—made entirely in nonverbal sound.

My purpose in composing this “soundscaping” of Sounding Composition is two-fold. First, I wanted to contribute a layer of sound to a review all about sound. In the file above, I interpreted listening through the nonverbal sound of whispers, as a sound that often elicits closer listening or even an embodied orientation, turning “toward” the sound. Next, I used drum beats, vibrations on strings and metal pipes to evoke the sensation of sounding bodies and embodied listening as a negotiation of felt vibrations. Finally, I moved through “places” and “sonic objects” through the sounds of church bells, coffee shop chatter, a can of soda, and a dog toy.

However, my sound file had a second, more complex purpose: I wanted to play with some of the theories of transcripts as designed objects, discussed in the section above. I have created three different versions of a transcript for my soundscape file. These transcripts may be viewed in their entirety in the link at the end of this review; however, I will also discuss some aspects briefly here. The first is a more traditional attempt to transcribe my sound file, with the inclusion of some limited typographical design and two columns for simultaneity:

Figure 3. Transcript in Two Columns with Some Typography Choices Such as “BOOONg.”

Figure 3. Transcript in Two Columns with Some Typography Choices Such as “BOOONg.”

In this first attempt, I visualize temporality and simultaneity, but what was lost may be dimensions such as volume, importance, or meta-commentary within the transcript. In the second transcript, I was inspired by Ware’s broadcast transcript, modifying it to add columns not for speech/voice and sound, but for levels of volume or attention:

Figure 4. A Riff on a Broadcast Transcript with Columns for Coreground, Mid-ground, Background Sound.

Figure 4. A Riff on a Broadcast Transcript with Columns for Coreground, Mid-ground, Background Sound.

In this second version, meta-commentary is more prominent, but there were still elements lost to the unfolding and simultaneous nature of sounds. In the third version, I designed a transcript based on Butler’s argument for integral captions:

Figure 5. Audacity (sound editing program) file open with multiple tracks in blue and labels overlaid on the tracks themselves, sometimes with arrows.

Figure 5. Audacity (sound editing program) file open with multiple tracks in blue and labels overlaid on the tracks themselves, sometimes with arrows.

When I attempted to compose an “integrated transcript” in Figure 5, my purpose instead was to try to capture aspects of relative volume, temporality, and simultaneity through captioning the sound file itself. What this meant was that I not only had to add labels and arrows when I was “finished,” but I also had to compose my original sound file knowing that I wanted to screenshot and add descriptions. In order to do this, I needed to “squish tracks to size” and leave spaces by using more tracks than I might otherwise need. My purpose in designing these transcripts so differently is not to argue for one or the other, but to note the importance of offering comparative transcripts with different channels and possibilities for engagement.

At the end of Sounding Composition, Ceraso writes: “My hope is that multimodal listening pedagogy will embolden teachers to encourage sonic play, experimentation, and invention in their classrooms . . . and to continue to reimagine how to take a capacious, multisensory approach to multimodal composition and experience” (154). I hope that this review offers an example of that experimentation, multimodality, and attests to the richness of scholarship contained in Sounding Composition, which is reverberant.

[link to transcripts page]

Works Cited

Beauchamp, Airek. “SO! Reads: Steph Ceraso’s Sounding Composition: Multimodal Pedagogies for Embodied Listening.” Sounding Out! May 20, 2019. https://soundstudiesblog.com/2019/05/20/so-reads-steph-cerasos-sounding-composition-multimodal-pedagogies-for-embodied-listening/

Butler, Janine. “Where Access Meets Multimodality: The Case of ASL Music Videos.” Kairos: A Journal of Rhetoric, Technology, and Pedagogy, vol. 21, no.1, 2016.

Butler, Janine. “Embodied Captions in Multimodal Pedagogies.” Composition Forum vol. 39, Association of Teachers of Advanced Composition, 2018.

Butler, Janine. “Integral Captions and Subtitles: Designing a Space for Embodied Rhetorics and Visual Access.” Rhetoric Review, vol. 37, no.3, 2018, pp. 286-299.

Ceraso, Steph. Sounding Composition: Multimodal Pedagogies for Embodied Listening. University of Pittsburgh Press, 2018.

Ceraso, Steph. “(Re) Educating the Senses: Multimodal Listening, Bodily Learning, and the Composition of Sonic Experiences.” College English, vol. 77, no. 2, 2014, pp. 102.

Choong, Philip. Listening for the Overtones: Readings, Transcripts, and Interviews in Soundwriting Classrooms (panel). Computers and Writing Conference, 2019.

Green, David. F. “Race, Language Policy, and Silence in Composition Studies.” Composition Studies, vol. 44, no. 1, 2016, p. 160.

Heilig, Leah. Transcription as Play: Introducing Accessibility through Podcasting. (panel). Computers and Writing Conference, 2019.

Miller, Paul. D. (Ed.). Sound Unbound: Sampling Digital Arts and Culture. Routledge, 2008.

Norris, Sigrid. Analyzing multimodal interaction: A methodological framework. Routledge, 2004.

Stone, Jonathan W. “Listening to the sonic archive: Rhetoric, representation, and race in the Lomax Prison recordings.” enculturation vol. 19, 2015.

VanKooten, Crystal. “Methodologies and Methods for Research in Digital Rhetoric.” Enculturation: A Journal of Rhetoric, Writing, and Culture vol. 17, 2016.

Van Leeuwen, Theo. Speech, Music, Sound. Macmillan International Higher Education, 1999.

Ware, J., Hall, E. Ashley. “The Vibratorium.” In Detweiler, E., Rice, J., and Graham, C. Rhetorics Change/Rhetoric’s Change Proceedings for Rhetoric Society of America 2016, Indiana University, IA. 2018.

Ware, Jen. “Dynamic Equivalence in Soundscapes: Scripts, Transcripts and Somewhere in Between.” Computers and Writing Conference, 2019.

Yergeau, M., Brewer, E., Kerschbaum, S. L., Oswal, S., Price, M., Salvo, M. J., & Howes, F. “Multimodality in Motion: Disability and Kairotic Spaces.” Kairos, 2013.

Zdenek, Sean. Reading Sounds: Closed-Captioned Media and Popular Culture. University of Chicago Press, 2015.