A Journal of Rhetoric, Writing, and Culture

The Music, The Movement, The Mix: Listening for Sonic and Multimodal Invention

Crystal VanKooten, Oakland University

(Published October 27, 2017)

I sat across from former student Ara in the commons a semester after our first-year writing (FYW) course ended, interviewing him about the choices he made when composing the final assignment, a digital video entitled “Never Forget.” In the video, Ara provided facts about the Armenian Genocide that occurred during World War I, and he argued that the persistent denial of the genocide by Turkey must come to an end. At the start of our interview, I asked Ara to recount his composing process for “Never Forget,” and in response, he mentioned that he began to compose with the music already in mind. “When I was doing the video,” he explained, “I felt like the music was the most important part to me.” He then narrated how he first selected songs to represent different aspects of Armenian culture, then crafted images and words around the sounds and lyrics. Ara’s description of his process made me sit up and take notice: music, for him, was not background, nor was it simply a complement to other elements like images. Instead, music was the first and central site of invention for Ara’s ideas, one that led to complex authorial decisions involving multiple communicative elements that included melodies, harmonies, instruments, lyrics, timing, video clips, images, words, animations, and colors. 

Over fifteen years ago, the New London Group called literacy researchers and teachers to expand their definitions of literacy beyond the alphabetic and to pay attention to the relationships between multiple modes of expression like those that Ara used in “Never Forget.”  These modes include the linguistic, visual, audio, gestural, spatial, and multimodal designs involved in today’s global communication practices. The New London Group also pointed out the need for a multimodal metalanguage: “a language for talking about language, images, texts and meaning-making interactions” necessary for understanding how relationships within and across modes work (24). In recent years, many scholars of computers and writing have thus studied multiple modes of expression, theorizing what composing with modes outside of the written word looks and sounds like (Ahern; Alexander and Rhodes; Arroyo; Brooke; Ceraso; Halbritter, Mics) as well as developing multimodal pedagogies for classrooms (Selfe, Multimodal; Shipka; Wysocki et al.). 

To address sound and audio design in particular, some scholars have turned to sound studies, an “emergent scholarly community” with interdisciplinary roots that speaks with and through rhetoric, performance studies, film and media studies, literacy, and digital composition (Stone para. 3). Recent work in sound studies has focused on histories and cultural contexts for sound (Pinch and Bijsterveld; Selfe, “Movement;” Sterne; Stone); theoretical frameworks and terminologies to describe how sound communicates (Cox; McKee; Van Leeuwen); theorizations of voice, digital vocality, and critical listening and speaking (Anderson; Comstock and Hocks; Halbritter and Lindquist); discussions of aural and musical rhetoric (Halbritter, “Aural Tools,” “Musical Rhetoric”; Stedman); sonic performances and interactive artwork (Barness; Kanouse; Miller; Scheidt); and the inclusion of sound within digital writing pedagogy (Ahern; Ceraso and Ahern; Halbritter, Mics; Hocks and Comstock; Rodrigue et al.; VanKooten). 

Heidi McKee and Theo Van Leeuwen each take what Katherine Fargo Ahern would label an “acoustics or musicology approach” to studying sound (Ahern 79).  They present frameworks for describing various components of sound and music that contribute to meaning-making for both sound authors and audiences. McKee offers a four-part framework that explores vocal delivery, music, special effects, and silence within multimodal composition (337), and applies this framework by analyzing webtexts and online poems. In a section on music, McKee draws terminology from composer Aaron Copland, emphasizing music’s “sensuous,” “expressive,” and “sheerly musical” qualities—areas concerned with sound quality, emotions, and structure respectively (344). Van Leeuwen presents an even more detailed vocabulary for talking about the meanings and potentials of integrated sounds (speech, music, and effects together), and his key terms include perspective, time, interaction, melody, voice quality/timbre, and modality (4).

While McKee, Van Leeuwen, and others have made strides with respect to naming and describing how music and sound communicate, there is still much to learn about how sound is experienced, understood, and used by multimodal authors, especially in writing classrooms. Tanya Rodrigue and coauthors address this need, examining how voice, silence, music, and sound effects coalesced and interacted in their own sonic compositions in one graduate classroom. Rodrigue et al. highlight the importance of play, flexibility, and reflection for sonic composers, and their analytical approach highlights sonic interaction and multimodality, not just isolated sound. Steph Ceraso advocates for such a focus on multimodality, calling those in English studies to address the “affective, embodied, lived experiences of multimodality in more explicit ways,” (104) where sound is one aspect of a holistic pedagogy of full body listening and experimentation. Likewise, Michelle Hocks and Mary E. Comstock describe an approach to teaching sonic rhetoric that foregrounds resonance, where sound is “vibrational interaction within a complex environmental system” (137). Ahern offers the metaphor of tuning as a possibility for teaching auditory rhetoric that bridges acoustics (description) and phenomenology (embodied, unique experience), noting that students in classrooms may benefit from explicitly discussing listening practices, terminologies, and diverse experiences with sound (82-3). What Ceraso, Hocks and Comstock, and Ahern all point to is a need for rhetoricians and compositionists to investigate how descriptive language, like that offered by McKee and Van Leeuwen, interacts with the body’s experiences, expressions, and movements to convey and contribute to inventing with sound, as well as a need to pay more attention to how language and the body come together to enable and express the complicated, multimodal interrelationships among sound and other modes of expression.

In this article, I address these gaps through examining the multimodal composition and invention experiences of nine college students in various sections of FYW as they authored a digital video project and presenting findings drawn from videotaped interviews. Through their accounts, the participants indicate that they are inventing with sound—particularly with music—in multimodal ways that involve the body, various rhetorical appeals, organizational strategies, and attention to intersections between sound and other media assets. Some of these inventional moves can be described using terminologies from sound scholars, but the data also reveals that we need additional tools and terminologies to help us investigate other inventional strategies. Along with using terms from established theory, I draw from rhetorical terminology and from students’ felt experiences and bodily movements to begin to add to a multimodal metalanguage for sonic invention, which emerges in the interaction between theory and practice. 

Research Methods

I designed an IRB-approved interview study where I could employ listening as a method of inquiry into the following questions: 

  • How are students composing with sound—and with music in particular? 
  • What rationales for musical and multimodal choices do students have, and how are they articulating these rationales? 
  • What specific language, if any, are students using to describe musical and multimodal choices? 
  • What extralinguistic methods, if any, are students using to invent with sound and music and to describe choices?  

To seek answers, I conducted twenty-one videotaped interviews with nine students from four different FYW classes. 

Throughout the following analysis of the interviews, I listen closely to accounts of how and why students chose to use music and other resources as they composed videos. I use listening here as a methodological frame, situating my study within a feminist research methodology that uses interviews and video technologies to aim for reciprocity, shared experiences, shared knowledge, and a privileging of participant voices (Almjeld and Blair; Selfe and Hawisher). Similar to video interviews conducted by Selfe and Hawisher, the interviews in this study were not used simply to extract information from participants, but instead as a means to share and co-construct knowledge (36). Video recordings allowed me to listen to students’ words as well as to view accompanying body language, gestures, and facial expressions.  As Selfe and Hawisher note, these elements add important “additional semiotic information to alphabetic representations” (44-5) of participant experience.

Below, along with written analysis, I present video data juxtaposed with parts of student-composed video products. Presenting the data and findings in multiple formats aligns with Kristie Fleckenstein, Clay Spinuzzi, Rebecca Rickly, and Carole Papper’s ecological model for writing research, where interdependence, feedback, and diversity are valued (394), and there are “multiple coherent stories” that describe the experiences of research participants (390, 404). Fleckenstein et al.’s notion of interdependence involves “a web of interlocking social, material, and semiotic practices” (394); their emphasis on feedback involves using the flow of information to “weld together” research elements (396); and their focus on diversity seeks out “multiple sites of immersion, multiple perspectives, and multiple methodologies” (401). I utilize such multiple yet fused practices, elements, perspectives, and methodologies by providing visual and aural access to original video recordings and interview questions, juxtapositions of this footage with student compositions, and more traditional written analysis. Through the various formats, you—the readers of this webtext—can listen to (and look at) student-authored video products and portions of interviews. You can also read the alphabetic analysis and make alternate (or congruent) meanings and interpretations as you consider pieces of the research ecology.

The Research Context, Participants, and Analysis

Nine participants—Ara, Gabriella, Kaitlyn, Lauren, Logan, Marlee, Shannon, Travon, and Vivian—were enrolled in four different sections of FYW at a large public university in the Midwest.1 Three participants were enrolled in courses where I was the instructor, and six participants were enrolled in courses taught by colleagues. All courses focused on academic argumentation through written essays and an open-topic digital video assignment where students envisioned their own purposes and audiences. I conducted interviews with students during and after their courses. I asked questions about the students’ writing, the video composition assignment, and how and why students chose to use sound and music in their videos.2 Pedagogies for video composition varied across the courses. Some students, for example, were required to create and remix media assets from any source, while others were required to compose “copyright clean” videos. These assignment conditions affected invention with music, and the student videos presented in this article thus include some work that uses copyrighted media assets and some work that does not.3

After interviewing was complete, I used a grounded theory approach (Corbin and Strauss; Merriam) to apply descriptive codes to the data set, using the words of participants when possible as a form of listening and co-construction of meaning. My second pass through the data was analytical. I grouped the codes into categories, working toward integration, where research threads are pulled together “to construct a plausible explanatory framework” about participant experience (Corbin and Strauss 264). I arrived at six overarching categories that when combined provide an overview of how the nine participants were inventing with music as they composed videos. Below, I explore these categories and consider them in light of McKee’s and Van Leeuwen’s vocabularies for music and sound. 


Table 1 contains a list of the top six categories across all nine interviews, as well as a list of the codes I used within each category, how many occurrences there were within each category, what the most frequent codes within each category were, and how many students made comments within the categories.

Table 1: Top six categories and codes, with frequency information across the interviews.4

The data here indicates that the role of music in multimodal video composition is complex and layered; that is, music does a lot for a rhetor in one short moment and is involved with invention at multiple levels. The layers of sonic invention that became observable involved (1) the physicality of music; (2) various musical-rhetorical appeals to logic, emotion, and character; (3) the organizational potentials of music across modes; and (4) music’s ability to combine with other elements for a new effect within a multimodal sequence. I discuss each of these aspects of invention in the sections that follow, pointing to useful vocabulary from McKee, Van Leeuwen, and others that corresponds to the data and extending their frameworks to highlight how students used rhetoric, gesture, and their own experiences and knowledge to invent with sound. 

The Physicality of Music

Using physical properties of music for effect was the most frequent way participants talked about why they chose the songs and sounds they did. In fact, this was the only category in which all nine students made comments. Participants talked about how fast or slow the music was (the tempo, to use a musical term), how loud or soft it was (the dynamics), the pacing, the beat, and the flow, as well as how they timed these properties with other visuals, sounds, and words. Students most commonly discussed music’s tempo, with eleven comments addressing how fast or slow the music went. McKee might categorize some of these statements under the label movement, which considers rhythm, meter, and tempo, while Van Leeuwen might categorize them under his heading of time. While these linguistic labels are useful for theorists as they explore how music moves bodies and is structured, study participants used their own words and gestures to indicate and describe the movement, timing, and literal sounds of music. Thus, bodily experiences of listening to and composing with music were valuable for multimodal invention, even when these experiences were not fully articulated. 

Travon, for example, composed a video for incoming and current freshmen about the university’s summer bridge program. He described the second song in his video as “more of an upbeat tone for the fall. Not really that upbeat, but it’s more like, ok [bobs head and moves hands around each other in a circular motion as he talks], it’s starting to pick up.” Travon first used the word upbeat to describe his perception of an accelerated tempo in the song and how the speed aligned with a change from summer to fall in his video. Upbeat, however, was imprecise to describe his choice—he bobbed his head in rhythm and moved his hands around each other in a circle to reveal that the song was just a little faster than the first song, for which he used a smaller and slower circular hand gesture (please view the video linked below to see and hear this interview excerpt).

It is notable that Travon needed both words and hand gestures to describe the musical pacing he perceived; words alone were insufficient to describe the inventive materials he used and the choices he made. McKee’s and Van Leeuwen’s terminologies might have provided him with more specific linguistic descriptors, but their overarching labels of movement and time remain broad. How a song moves and measures time requires a much more specialized vocabulary. Van Leeuwen, for example, explores how sonic timing can be measured or unmeasured, fluctuating or continuous, polyrhythmic or monorhythmic, metronomic or nonmetronomic, and regularized or nonregularized (61). Considering that learning such specialized terminologies in a course like FYW would be overwhelming and time-consuming, Travon’s method of using his hands and body to help describe the physicality of sound’s movement worked well for him as he considered and articulated his sonic authorial choices. 

Several other participants used the phrase “how the music sounds” to describe choices about the physical qualities of music. In her video about religious intolerance, Gabriella selected “Jesus Walks” by Kanye West because “the way the song sounds, it was very strong. And it was kind of aggressive, and that’s the approach I wanted to take.” As she stated the word strong, Gabriella also made a hand gesture: with both hands, she brought her fingertips together and shook them twice for emphasis. While her words and gesture combined provide a clear rationale for her song choice—an aggressive approach to her topic—Gabriella’s comments about what particular aspects of the song were strong weren’t very specific, and I did not prompt her for more detail. She used phrases such as “great lyrics,” “great beats,” and “very powerful” to describe the sounds within the song, but she did not mention elements such as West’s voice, the tempo, the instrumentation, or any electronic accents or sound effects. 

In this case, linguistic labels like McKee’s musical categories of medium (what generates sound), dynamics and intensity (the tone, uniformity, and special effects used), or movement (the rhythm, meter, and tempo) might have been useful to aid Gabriella in making and further describing her choices; alternatively, if prompted to provide more detail, Gabriella may have supplied her own terminologies, additional hand gestures, or body movements. But even without more specific linguistic scaffolds, it is obvious that Gabriella made effective authorial choices about her sound using her own heard and felt experience of the music’s qualities and attributes. 

Gabriella, Travon, and others make clear that the student authors were hearing, experiencing, feeling, and utilizing physical effects of music such as movement, timing, dynamics, and rhythms. They were able to express and articulate these choices, sometimes linguistically and sometimes with gestures and movements. The students were tuning ears and bodies, to use Ahern’s metaphor, as they listened for and employed both language and experience. The data also indicates that a more specific vocabulary relating to movement, time, and sound quality could be useful to open up even more possibilities for sonic invention, but equally important are extralinguistic experiences that are heard and felt.

Video Transcript

Music as Rhetoric: Appealing to Logos, Pathos, and Ethos 

The students talked about music having the rhetorical power to communicate through appeals to logic, emotion, and character, often simultaneously. These sites for sonic invention can be described using the rhetorical terminology of appeals to logos, pathos, and ethos—concepts that were part of the curricula in the participants’ rhetorically based FYW courses. Music provided one opportunity for student authors to apply knowledge of these appeals and to extend and overlap them within a multimodal sequence. The data in this section reveals that considering sonic invention through a seemingly simple rhetorical frame can be useful because rhetoric is, as Bump Halbritter states, multidimensional in nature (Mics xi). Appeals to logos, pathos, and ethos that some might consider basic or introductory can be layered and augmented, especially in a multimodal space like video. 

Most frequent across the interviews were comments about music that focused on variations of an appeal to logos: eight students talked about music functioning as thesis or argument. One way that such a logical appeal was enacted was through aligning song lyrics with a point or concept. Marlee described how she thought about lyrics for her video about the off-campus program Camp Davis, composed for fellow students who might not be familiar with the program: 

I tried to time it so that the words that were being said in the song were appropriate to what was going on in the videos. . . . It was background music at some times, but also I wanted it to be on the forefront when I wasn’t talking over the video. 

Music and voiceover thus took turns in the foreground, and song lyrics “spoke” as part of the argumentative, thematic mechanism of the composition. Van Leeuwen describes similar sonic layering using the terms perspective and interaction, where simultaneous sounds are played at different volumes to create social distance, multiple perspectives, or various kinds of sonic relationships (30, 85). Marlee used three different songs for sonic and cross-modal interaction of the kind Van Leeuwen discusses, fading music in and out to pair lyrics and instrumental sequences with images and to alternate with the voiceover.

Marlee’s description of precisely how lyrics interacted with other elements remained generalized. Lyrical phrases that appear to be paired carefully with particular images can be found in her video, but lyrics do not always reinforce or interact with visuals, and not all of them speak to Marlee’s theme at all times. Even so, Marlee’s comments highlight lyrics as an exciting site for invention through the layering of appeals—one that perhaps could become more exposed and explored using tools like Van Leeuwen’s terms. What if Marlee had been prompted to further consider and describe how lyrics and other modes interacted as she composed? A more specific linguistic scaffold, in this case, might have opened up even more opportunities to experiment with layers of appeals across modes. 

Like logos, pathos was a popular musical-rhetorical appeal: eight students talked about music eliciting an emotional response from an audience or providing “tone” or “feeling”—what McKee calls listening to music on the expressive plane. Travon, for example, stated that in the opening for his video about the university’s summer bridge program, he put music, written text, and an image of a dormitory together “because it appeals to pathos.” He continued, “the music, it just relaxes them, and then it’s just like the big bang, [the image of the dorm]! And they’re like, awwwww! The music helps.” Travon wanted the music to relax a very specific audience (classmates and other students who would recognize the dormitory) and get them ready to feel a twinge of nostalgia upon seeing the image of the dorm where they lived during the summer term as they continued to hear the calming music—his musical/visual appeal to emotion was connected to bodily response (relaxation) and to memory. 

It is notable that Travon used the term pathos several times in his descriptions of this sequence, as did Lauren, his classmate, when describing a particular sequence in her video about keeping the arts in schools. Rhetorical appeals were a big part of the curricula in these students’ first-year course, and the appeals were emphasized during the video project and as students worked on alphabetic compositions. While students in the study who were enrolled in other sections of FYW used related terms such as emotion, tone, and feeling to describe inventional moves, Travon and Lauren each designed particular sequences around the use of a noticeably multimodal pathetic appeal, which they labeled with the term pathos. Being very familiar with a linguistic label for a rhetorical technique may have reminded and encouraged Travon and Lauren to purposefully experiment with the appeal using various modes of expression. 

Finally, four students discussed their chosen music as recognizable or as representing a particular ethnic culture, and I group these examples together under ethos: drawing on the character and recognizability of music. Leveraging music’s ethos is an inventive practice that has no well-matched linguistic descriptor within McKee’s or Van Leeuwen’s frameworks, but Halbritter discusses music’s ethical and associational power, suggesting that ethos is often aurally determined, especially in film (“Aural Tools” 189). Several students, like Shannon, enacted ethos aurally in this way, picking songs for videos because they were familiar for audiences. Others, like Ara, appealed to ethos by using songs to represent a particular ethnic culture. His video about the Armenian genocide, composed specifically for an audience of Armenians, included excerpts from “Deli Aman,” an Armenian folk song played on the duduk (an Armenian instrument similar to a clarinet), as well as excerpts from the Armenian group System of a Down’s popular song “Aerials.” Ara explained that he used an extended version of “Aerials” that includes “Der Voghormya,” a church song played at the Armenian apostolic church. These song choices clearly included culturally significant and recognizable melodies, Armenian instruments, and Armenian artists performing the music, elements that worked together to communicate an Armenian ethos for his video. 

Ara, however, did not use the specific term ethos as he described his musical choices, nor did any other study participants. One conclusion could thus be that students were guided by what Halbritter calls “a ‘felt sense’ about the rhetoric of image or music,” a sense that “may not guide them productively to make … critical determinations” (“Aural Tools” 191). Halbritter suggests a move toward linguistic description as a remedy, where students might “establish a terminology that will enable critical discussions” (191). Asking students to specifically consider ethos in their multimodal compositions might be one step in this direction. Ara’s case reveals, though, that student authors are able to make critical and complex compositional moves even without theoretical terminologies. Using his own resources and vocabulary, Ara listened, remembered, felt, drew on personal knowledge of language and religious practices, and coordinated sound and multimodal resources accordingly.  

Video Transcript

Music as an Organizational Tool for Multimodal Rhetoric

Six students talked about using music as an organizational device within their multimodal work—as introduction, conclusion, transition, and contrasting element. Since McKee’s and Van Leeuwen’s frameworks focus specifically on sound, none of their terminologies take into account the cross-modal nature of the musical-organizational moves that the students in this study made as they coordinated music with images and words. Most common within this category were examples of students using music to create a noticeable contrast between sections or to transition from one part to the next. Lauren, for example, crafted a transition in her video that was marked by the music fading away into silence while interviewees gave testimonials about the importance of arts in schools. She described “the music being on for one part of it and then turning off and having it just have constant interviews for the next part of it, I think that was a stylistic change, so that shows a shift.” The shift she intended, she mentioned repeatedly, was in the type of persuasive appeal, from logos to pathos, and her use of sound, the spoken content of the testimonial clips, and silence reflected this contrast. Interestingly, a Mozart piano sonata was part of the appeal to logos, which faded to silence as part of a “more serious” appeal to pathos. Lauren also illustrated this shift through hand gestures during her interview: she gestured to her left when she talked about the logos section and moved her hands to the right when discussing the pathos section, indicating a clear separation between parts. 

Kaitlyn and Marlee also used song changes to indicate separate ideas across modes. Kaitlyn explained that within her video about gender inequality in sports, “I wanted the music to be transitions, too, so each section of each song is kind of like a paragraph, or an idea, in a traditional piece, or a traditional written piece. So I definitely used that as an organizational tool.” Songs thus grouped images and written words together and signaled the next point in the argument. Similarly, Marlee described selecting three songs to indicate different elements of her argument about Camp Davis. A class discussion about musical rhetoric helped her come up with this option. She recounted, 

I thought, ok, well I have three themes or chunks that I can use different music for . . .  but I was really glad we looked over that [musical rhetoric] in class, because otherwise I think I would have just taken one song and let it go through the whole video. And a lot of time, that can get boring. It doesn’t apply to the whole movie or all of your subideas. 

Both Marlee and Kaitlyn used songs as a prose writer might use paragraphs—to organize and group ideas and subpoints and to help a reader move through the material. For these authors, though, music was a tool for more effective multimodal rhetoric, and they worked across modes to introduce, knit together, or differentiate media assets.

Class discussions relating to logos, pathos, and ethos and “the rhetoric of music” preceded these students’ compositional choices where music became a multimodal organizational tool. While these conversations were helpful for invention, other terms such as cross-modal organization or structure, cross-modal juxtaposition, or cross-modal contrast might have more specifically highlighted the potentials of music to organize across modes. These phrases take labels for compositional concepts that students were already using—e.g., organization and contrast—and emphasize that they might be applied using sounds, visuals, words, gestures, movements, or combinations of these modes. Again, Lauren, Kaitlyn, and Marlee made these kinds of choices without such a specialized vocabulary, but the terms could prove useful for spotlighting multimodal organizational moves more purposefully. 

Video Transcript

Music within a Multimodal Sequence

Perhaps it is clear from my analysis thus far that participants rarely talked about their music in isolation. In fact, many of the examples I have placed in the categories above might fit within multiple categories including the current category, where authors described selecting music as a key part of a multimodal sequence. Travon pointed out that music changes the way other elements in a video are interpreted. Music, he said, shapes “how you will notice the images and the colors and stuff.” The students’ constant attention to music as part of multimodal composition foregrounds the importance of McKee’s point that it is useful to move beyond an examination of sound in isolation—she calls us “to consider sound (and all the elements of sound) in relation to other modes of representation” (338). Michel Chion labels sound’s interaction with image the “added value” of sound: “value with which a sound enriches a given image so as to create the definite impression . . .  that this information or expression ‘naturally’ comes from what is seen” (5). Thus, the combination of modes of expression, including sounds, is in itself a site of sonic and multimodal invention as new meanings and values are composed. 

Seven students mentioned moments where they used music to combine strategically with other media assets. Ara described one particularly memorable sequence: 

I have a video playing of [Armenians] marching through the deserts, and then the song builds up, and right as it’s about to climax, I cut the music off, and just have static, and saying, “we’ve lived to never forget.” I have the video of a girl, and the music comes back on when she opens her eyes. 

There are many layers involved here: a video of Armenian refugees, written words typed on the screen (with certain words highlighted in red, blue, and yellow), silence and the sound of static, the action of a girl opening her eyes, and the momentum, beat, dynamic change, and lyrics of “Aerials.” In the closing moments of the sequence, the sounds and music alter the way the image of the girl opening her eyes is seen, and the image likewise alters the experience of the music and sounds—Ara composed them carefully to add value to each other. Ara’s facial expression also revealed how much he enjoyed putting this sequence together and how rhetorically effective he thought it was: he smiled widely as he played the sequence for me on his laptop, and he continued to smile as he recounted how and why he combined the music, silence, static, and video clip (please refer to the linked video below to see and hear this interview excerpt).   

Gabriella, too, discussed lining up the timing of images with a particular moment in the song “Jesus Walks” where artist Kanye West raps and alternately gasps for air. She described, “between the breaths of when they’re speaking, I just flashed really quick images. I just feel like the flow of that was really strong. It took a lot of hard work to get the timing down perfectly.” Gabriella shortened the duration of how long certain images appeared during this sequence to match West’s fast-paced gasping. Again, Gabriella used the general descriptor strong to describe this sequence, but she also used her hands to illustrate how the sounds and images worked with each other: she clamped her fingertips together and pulled her hand toward her body to represent the intake of West’s breaths, and she performed several quick, opposite motions opening her fingers away from her body to represent the pacing of the images. 

When I asked her why she put this sequence together this way, Gabriella said that she remembered West’s gasping when she chose “Jesus Walks,” stating, “I really wanted to enhance the fact that, the lyric in the song. That he’s like, I think it’s, he’s breathless, because he’s, the world is completely making him tired. And I really wanted to show that.” Gabriella referred to the added value of lyrics, gasping, images, and timing as they came together for emphasis and illustration. Where her linguistic descriptions might be considered generalized, her gestures and overall narrative indicate a carefully designed reciprocal relationship between parts. There are many terminologies, though, that might benefit students like Gabriella and Ara as they compose complex multimodal sequences, some of which are mentioned or implied here: layering, multimodality, and added value. We might look to rhetoric and composition, social semiotics, literacy, sound studies, film and media studies, and beyond as we try, test, and compile these and other labels to discuss and enact multimodal invention. 

Video Transcript

Music as a Mystery

Even as words, rhetorical vocabulary, and gestures assisted study participants in enacting and describing choices related to sonic invention, all nine students struggled to varying degrees to articulate how and why they made choices with music or what specific qualities of music they were drawing on or might use. Students used vague descriptors that may have had various meanings in relation to their music—words such as tone, feel, fit, or flow. Logan, for example, had trouble picking a song for her video about romantic relationships aimed at college-aged students: “I really didn’t know what to do with the music. Because I would hear other people pick their music, and it seemed like it would flow so perfectly. And nothing seemed to go. And I just happened randomly, I just happened to pick it.” Picking music that “flowed” and “went with” other elements of her video seemed a mysterious process that Logan struggled with, and thus, she relied on luck to find a song. Perhaps with more specific language to describe musical options, Logan might have been able to pinpoint (or initially decide upon) musical aspects that she wanted to use as an author: was she looking for lyrics to align with her argument or a fast song to provide movement within a particular sequence? Without guiding concepts for sonic invention, Logan looked for “flow” and simply felt lucky to find it. 

Vivian wanted to find music that was licensed for reuse but that also fit her satirically themed video about the stereotypes put upon children with no siblings. She stated, “I couldn’t pick the music for a really long time,” and her frustrations were part of the reason she decided to abandon the plan to make the video satirical:

It was kind of hard, the satirical music, to be kind of restricted by those means [i.e., using copyright-free music]. But I also didn’t really know how to search for the music that I wanted because it was just— [pauses] yeah. So I just looked through all the music, a couple pages, and I found some.

Vivian struggled with locating music because she didn’t know how to articulate or search for what she needed, especially within a limited database of music licensed for reuse. Pausing and finishing her comment only with “yeah” as she tried to describe why locating music was difficult could indicate that she was in need of a more specific sonic vocabulary to help her develop search terms that she might use within music remixing websites. 

When asked to summarize the role of music in her video, Kaitlyn—who throughout her interview described using music as an appeal to logic, emotion, and character, as an organizational device, and as a way to add value to other modes—stated simply that “the music was more for emotional effect, and to just make everything more powerful.” Interpreted within the context of her interview as a whole, Kaitlyn’s comment that music “make[s] everything more powerful” is true, but this short statement seems to oversimplify her own deeply complex use of music. What kind of sonic vocabulary or bodily movements might help students like Logan, Vivian, and Kaitlyn more fully articulate their choices and invent new, innovative authorial moves? And how might such a metalanguage be developed?

A Multimodal Metalanguage Built at the Intersection of Theory and Practice

My analysis illustrates that participants invented with music in complex ways that can be partially described using terminologies presented by McKee and Van Leeuwen; in particular, movement, time, and interaction were important in the student work and narratives. The participants’ sonic and multimodal invention practices, though, moved beyond these sound-focused frameworks by considering the physicality of sound; using combined appeals to logos, pathos, and ethos; tapping into music’s multimodal-organizational potentials; and experimenting with cross-modal layering and interaction. Looking across the interviews reveals that participants used terminologies based in rhetoric, words from their own vocabularies, and bodily experiences and movements to articulate how and why they composed with music. Some of them made effective, critical composing decisions without using any theoretical terminology to describe what they were doing, instead drawing on heard and felt experiences and personal knowledge of sounds, music, and other media assets. Even so, these participants might have benefitted from more specific linguistic descriptors in order to revise or enact additional authorial moves, and some authors were stymied by a lack of useful search terms or specific descriptors. 

Clearly, talking about music is not the same as hearing it—music is felt in the body and in the mind. Often, music’s communicative power is extradiscursive, beyond the power of words. Many of the participants illustrated the extradiscursive nature of music as they waved their hands, grinned, bounced in their seats to a rhythm, mimicked sounds and instruments with voices, hummed, and sang. It is likely, then, that multimodal composers need both heard and felt experiences and specific vocabularies as they learn to wield musical-rhetorical tools and to invent with music and other sounds in writing courses. As Ahern suggests, students might tune their ears, eyes, mouths, and bodies to listen, to feel, and to describe—all useful for multimodal invention. Some sonic resources might be felt and illustrated through movement—hand gestures, tapping, head-bobbing, or dancing, for example—and some resources might be described through movement and words together. Other resources might be described using terminologies from established theory—logos, pathos, ethos, multimodality, or added value—and still others might be described using terms collaboratively negotiated and built from experiences—cross-modal juxtaposition, cross-modal contrast, or layering. Of course, the movements and terms I list here are by no means exhaustive, and fields beyond our own, such as sound design or film, might help us to continue to build and shape multimodal vocabularies. 

Activities like in-class listening labs or sonic reflection journals, where students would be asked to move to and/or specifically describe what they hear in various sonic and multimodal texts, would be helpful for such metalanguage compilation and development. Similar classroom activities are modeled by Hocks and Comstock in their article “Composing for Sound,” where they provide assignment samples and a description of a “reduced listening” technique that helps students focus on sounds as objects (140). As Ahern mentions, though, there are different ways to describe sounds, and she uses thick (additive) and relative (based on comparison) descriptions with her students (83-4). The data from this study suggests that description through movement and description through theory might be useful additions to these options.

More compositional and sonic play—like that recommended by Anderson and Ceraso—would also encourage students to invent with the ear and the body even as they learn to articulate multimodal invention practices through precise language. Ceraso calls such activity “multimodal listening,” where “sonic play and experimentation” are imperative (117). Students could enact multimodal listening through using audio- or video-editing software to informally combine music or other sounds with visual and written elements and then experience and respond to the combinations created. Students might write out descriptions and reflections, as Shipka encourages her students to do through what she calls a “statement of goals and choices” (114), and such written products could be a site to try out and use theoretical terminologies. The data from this study, though, indicates that oral and bodily descriptions of sound and sonic resources were also useful for authors. For many participants, spoken reflections during interviews provided an option to describe with faces and bodies as well as with words, and we might build similar opportunities to hear and to move in response into our classes. 

Ara told me near the beginning of this research project that he felt like the music was the most important part of his video. His experiences, along with the movements and narratives of the other participants, remind me again of the importance of feeling, and of hearing, listening, moving, and speaking as part of sonic and multimodal invention. So often we who study rhetoric and writing look (and listen) first to words, linguistic description of compositional choices, and metalanguage to understand a phenomenon, and these are indeed important. But the participants in this study nudge me to open my ears and my body to really listen—cross-modally, with all my senses tuned. 

Appendix A: Interview Protocols

Interview protocol for three initial student participants

  1. Can you introduce yourself? 
  2. What are you working on / studying currently? 
  3. Why did you choose to write about the topics you chose in [first-year writing]? 
  4. What assignments from [first-year writing] mattered most to you and why? 
  5. Why did you choose to compose a video for the last assignment? 
  6. What is your favorite part of your video?
  7. Talk about the composition process for your video. How did you begin composing? How many drafts did you do? Describe the end result.
  8. Where did the idea for the argument in the video come from?
  9. What were the steps in the composing process?
  10. When / how did you begin to think about the music you wanted to use in the video? What kind of music did you use in the video? Describe. Why did you choose these songs to include?
  11. How do the text, images, sounds, and music work together in your video to make the argument? 
  12. Talk about the publication process for the video.
  13. What kinds of rhetorical work does the music in your video do?
  14. What similarities and differences were there in composing your video vs. composing a more traditional written paper? 

Interview protocols for six additional student participants

Interview Protocol, beginning of course

  1. Can you tell me a little bit about yourself? 
  2. How would you describe yourself as a writer and your writing abilities? 
  3. Can you tell me about a piece of writing that you’ve done in the past that you consider successful? 
  4. Could you describe what kinds of writing or composition you have used before this class in school? 
  5. Could you describe what kinds of writing or composition you have used before this class outside of school, either in the workplace or on your own? 

Interview Protocol, after the video unit

  1. Could you tell me about the video you composed?
  2. Can you tell me about the resources and tools you used to compose the video? 
  3. Could you tell me about the video reflection essay you composed? 
  4. Are there any concepts or terms from the video unit that you think will stick with you over time? 
  5. How would you describe your overall experience with video composition in this assignment? 
  6. Some people would say that new media composition does not belong in an academic writing course like [first-year writing]. What would your response be to them? 

Interview Protocol, after the course

  1. Not counting the video composition, could you tell me about another assignment from [first-year writing] that mattered to you? 
  2. What connections do you see between the major assignments in your [first-year writing] course? 
  3. How would you describe yourself as a writer and your writing abilities now at the end of [first-year writing]? 
  4. Can you describe for me your general approach to writing now? 
  5. What might be a writing challenge that you would encounter this/next semester, and how might you approach it? 
  6. Can you describe what you believe the purpose of a college writing class like [first-year writing] is? In your view, did your class fulfill this purpose? Why or why not?
  • 1. I use the real names of the student participants when they have given me permission to do so. Logan and Shannon preferred to remain anonymous, and their names are pseudonyms.
  • 2. See Appendix A for interview protocols.
  • 3. Exploring the affordances and limitations of searching for and composing with copyrighted and copyright-free media assets in detail is beyond the scope of this article. The examples below provide brief glimpses into some composing experiences with different kinds of media assets, and these examples indicate that a careful consideration of how assignment requirements enable and constrain multimodal invention should be an important consideration for teachers of digital writing.
  • 4. Concerning "music/silence" in the category of "Music forwarding theme, point, or argument (logos)," silence is an aspect of sonic rhetoric that McKee argues should be more central to our analyses. While I did not code specifically for silence in this study, some student comments indicate that silence was needed as an important sonic resource, one that should receive more emphasis in future studies.
Works Cited

Ahern, Katherine Fargo. “Tuning the Sonic Playing Field: Teaching Ways of Knowing Sound in First Year Writing.” Computers and Composition, vol. 30, no. 2, 2013, pp. 75–86. 

Alexander, Jonathan, and Jacqueline Rhodes. On Multimodality: New Media in Composition Studies. CCCC/NCTE, 2014. 

Almjeld, Jen, and Kristine Blair. “Multimodal Methods for Multimodal Literacies: Establishing a Technofeminist Research Identity.” Composing(media) = Composing(embodiment), edited
by Kristin L. Arola and Anne Frances Wysocki, UP of Colorado, 2012, pp. 97-109. 

Anderson, Erin. “Toward a Resonant Material Vocality for Digital Composition.” enculturation: A Journal of Rhetoric, Writing, and Culture, vol. 18, 2014, http://www.enculturation.net/materialvocality.

Arroyo, Sarah J. Participatory Composition: Video Culture, Writing, and Electracy. Southern Illinois UP, 2013. 

Barness, Jessica. “Common Sounds.” Currents in Electronic Literacy, 2011, http://currents.dwrl.utexas.edu/2011/commonsounds.html.

Brooke, Collin Gifford. Lingua Fracta: Toward a Rhetoric of New Media. Hampton P, 2009. 

Ceraso, Steph. “(Re)Educating the Senses: Multimodal Listening, Bodily Learning, and the Composition of Sonic Experiences.” College English, vol. 77, no. 2, 2014, pp. 102–123. 

Ceraso, Steph, and Kati Fargo Ahern. “Composing with Sound.” Composition Studies, vol. 43, no. 2, 2015.

Chion, Michel. Audio-Vision: Sound on Screen. Columbia UP, 1994. 

Comstock, Michelle, and Mary E. Hocks. “Voice in the Cultural Soundscape: Sonic Literacy in Composition Studies.” C&C Online, 2006.

Corbin, Juliet, and Anselm Strauss. Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory, 3rd ed., SAGE Publications, 2008. 

Cox, Christoph. “Beyond Representation and Signification: Toward a Sonic Materialism.” Journal of Visual Culture, vol. 10, no. 2, 2011, pp. 145–161.

Fleckenstein, Kristie S., et al. “The Importance of Harmony: An Ecological Metaphor for Writing Research.” College Composition and Communication, vol. 60, no. 2, 2008,  pp. 388–419. 

Halbritter, Bump. “Aural Tools: Who’s Listening?” Digital Tools in Composition Studies: Critical Dimensions and Implications, edited by Ollie O. Oviedo, Joyce R. Walker, and Byron Hawk. Hampton P, 2010, pp. 187-220. 

---. Mics, Cameras, Symbolic Action: Audio-Visual Rhetoric for Writing Teachers. Parlor P, 2013. 

---. “Musical Rhetoric in Integrated-Media Composition.” Computers and Composition, vol. 23, no. 3, 2006, pp. 317-34. 

Halbritter, Bump, and Julie Lindquist. “Can you Hear Me Now?: Voice, Voices, and the Teaching of Voicing.” Computers and Writing Conference, St. John Fisher College, Rochester, NY. 21 May 2016. 

Hocks, Mary E., and Michelle Comstock. “Composing for Sound: Sonic Rhetoric as Resonance.” Computers and Composition, vol. 43, 2017, pp. 135-146. 

Kanouse, Sarah. “Don’t Mourn.” Liminalities: A Journal of Performance Studies, vol. 3, no. 3, 2007, http://liminalities.net/3-3/dontmourn.html.

McKee, Heidi. “Sound Matters: Notes toward the Analysis and Design of Sound in Multimodal Webtexts.” Computers and Composition, vol. 23, no. 3, 2006, pp. 335–354. 

Merriam, Sharan B. Qualitative Research: A Guide to Design and Implementation. Jossey-Bass, 2009. 

Miller, Jackson B. “Performing Pennsylvania Hall: Aural Appeals in Angelina Grimke’s Abolitionist Discourse.” Liminalities: A Journal of Performance Studies, vol. 3, no. 3, 2007, http://liminalities.net/3-3/grimke.htm.

The New London Group. “A Pedagogy of Multiliteracies: Designing Social Futures.” Multiliteracies: Literacy Learning and the Design of Social Futures, Routledge, 2000, pp. 9-37. 

Pinch, Trevor, and Karin Bijsterveld, editors. The Oxford Handbook of Sound Studies. Oxford U P, 2013. 

Rodrigue, Tanya K., et al. “Navigating the Soundscape, Composing with Audio.” Kairos: A Journal of Rhetoric, Technology, and Pedagogy, vol. 21, no. 1, 2016, http://kairos.technorhetoric.net/21.1/praxis/rodrigue/.

Scheidt, David Darius Jiri Sander. “DJ Parasite: An [au]/[o]-Tophonographic Sound Track.” Liminalities: A Journal of Performance Studies, vol. 3, no. 3, 2007, http://liminalities.net/3-3/DJParasite.htm.

Selfe, Cynthia L., editor. Multimodal Composition: Resources for Teachers. Hampton P, 2007. 

---. “The Movement of Air, the Breath of Meaning: Aurality and Multimodal Composing.” College Composition and Communication, vol. 60, no. 4, 2009, pp. 616–663.

Selfe, Cynthia L., and Gail E. Hawisher. “Exceeding the Bounds of the Interview: Feminism, Mediation, Narrative, and Conversations about Digital Literacy.” Writing Studies Research in Practice: Methods and Methodologies, edited by Lee Nickoson and Mary P. Sheridan, Southern Illinois UP, 2012, pp. 36-50. 

Shipka, Jody. Toward a Composition Made Whole. U of Pittsburgh P, 2011. 

Stedman, Kyle D. “How Music Speaks: In the Background, In the Remix, In the City.” Currents in Electronic Literacy, 2011, http://currents.dwrl.utexas.edu/2011/howmusicspeaks.html.

Sterne, Jonathan, editor. The Sound Studies Reader. Routledge, 2012. 

Stone, Jonathan W. “Listening to the Sonic Archive: Rhetoric, Representation, and Race in the Lomax Prison Recordings.” enculturation: A Journal of Rhetoric, Writing, and Culture, no. 19, 2015, http://enculturation.net/listening-to-the-sonic-archive.

VanKooten, Crystal. “A New Composition, A 21st Century Pedagogy, and the Rhetoric of Music.” Currents in Electronic Literacy, 2011, http://currents.dwrl.utexas.edu/2011/anewcomposition.html.

Van Leeuwen, Theo. Speech, Music, Sound. MacMillan, 1999. 

Wysocki, Anne Frances, et al. Writing New Media: Theory and Applications for Expanding the Teaching of Composition. Utah State UP, 2004.