"Look Ma, No Hands!": Voice-Recognition Software, Writing, and Ancient Rhetoric

Kim Hensley Owens, University of Rhode Island

Enculturation 7 (2010): http://enculturation.net/look-ma-no-hands

"Our writing tools are also working on our thoughts."
-Friedrich Nietzsche

Materiality

Lee Honeycutt, writing in Computers and Composition in 2003, argues that scholars in composition must pay attention to emerging voice-recognition technologies, both for ourselves and for our students. This essay begins to answer this call by critically examining my own composing process with voice-recognition software, contextualizing that experience—a physical and intellectual challenge—within and against ancient and contemporary rhetoric and composition theories.

Material conditions of writing have always delimited both the process and product of writing, so "writing" has held drastically different meanings over time, from paintings on cave walls to inscriptions on waxen tablets to intangible letters and symbols on computer screens. Each incarnation of writing possesses at least two common elements: the desire to communicate and the use of human hands. This paper, however, has been completed, by medical necessity, almost entirely without my hands. These words appear onscreen, and subsequently on paper, not as keystrokes entered by my fingers, but as words heard and then imitated by a voice-recognition software program. In some ways, this method of writing is akin to a whole language approach, in which phrases and words supersede letters or syllables; at the same time, this method of writing is all about phonics—what appears onscreen is what the computer "hears." In essence, the software imitates what it imagines I would write: what appears as my "writing," an imitation of my speech, is never more than a simulacrum. One could argue, however, that all writing is the equivalent of simulacra: thoughts and feelings never translate exactly into writing; speeches, once written, become somehow different than their oral form. Classical rhetors debated the uses of writing, sometimes characterizing writing as a practice in service of good oration, or vice versa, and other times treating both speaking and writing as legitimate pursuits.1 Here I examine how the circumstances that create the need and desire for voice-recognition software allow advice offered by classical rhetors to resurface and connect with contemporary rhetoric in surprising and productive ways.

Over the last two years, as acute tendonitis in my wrists and hands has worsened, reaching the point where typing is virtually impossible, numerous people, including scholars in our field, have suggested I "try voice-recognition software."2 These suggestions have ranged from quite serious to quite skeptical, from those who truly believed in the possibility of my using such software to those who viewed voice-recognition software as a tool about as accessible as a shuttle to the moon. Whatever their conception of voice-recognition software, however, all seemed sure that if only I had it, my writing problems would be solved.

My experiences using three different voice recognition software programs have led me to see that moving from manual writing/typing to "oral writing" is not unproblematic, perhaps especially for those who, like me, have learned to "write” using keyboards. The injury that has led to my being unable to write/type was brought about by pedagogies, technological capabilities, and production expectations unique to my time.3 The software that enables this paper to be both spoken and written simultaneously is also unique to my time. I have learned to compose through writing—as a physical process including such steps as freewriting, drafting, cutting and pasting, revising, editing—and not through speaking. Now that my composing process is further mediated by voice and by voice recognition software, I am more aware of the ways in which technologies not only assist in writing, but shape writing. What I can compose is limited—and perhaps extended—by the manner in which I compose it, and my oral delivery is now of the utmost importance. In this regard, my experience recalls Demosthenes, for whom delivery was supreme.

In "On the Trierarchic Crown," Demosthenes emphasizes the importance of making an argument in part through display: "but I want to show this, that these men alone have no right to speak about the crown. What evidence will best make this clear?" (41). To illustrate the rhetorical issues raised in composing orally with voice-recognition software, I need not (Automaker Hal) only tell what it's like to speak an article, I also need to show it.4 Although I intend to correct my errors (and indeed am) and use mostly standard grammar, in order to demonstrate both the difficulty and the immense humor involved in the using speech recognition software, I will at times use as evidence the mistakes I/my dragon (Dragon NaturallySpeaking Software) make. As with the Automaker Hal example above, I mark these "mistakes" in parentheses in a boldfaced italics: this both demonstrates the error and draws attention to the ways in which my speaking is writing (one cannot speak in boldface or italics) and my speaking is not writing (I would never type "Automaker Hal" when I meant "only tell").

Perhaps we in composition think that delivery is no longer an essential element of rhetoric, that we need not think about it because we simply type and write and read. My experience losing the ability to type has upended that viewpoint—for me, writing is now speaking and delivery really is all. Like Demosthenes, who trained himself to enunciate by speaking with pebbles in his mouth (or so the story goes), I am learning to (Nancy A.; in Nancy eighths, and Nancy eighths) enunciate more clearly than I ever have had to before. What this temporary disability teaches me is not only that delivery is important today for those who cannot type, it also reinforces the importance of delivery in writing and in speaking, which may be an overlooked element of what scholars do as we present conference papers, give workshops, interview, and perform myriad other activities that require our voices over our hands. The concept of delivery, however, in contemporary times encompasses not only speaking but the writing product as well. Writing must be delivered nearly perfectly—perfect spelling, perfect grammar, perfect formatting—to bolster a writer's ethos. If a written product does not look a certain way, if it is not delivered in a particular way, it will not be received as "professional" or "acceptable." As a writer learning a new writing process that profoundly affects the product, I must take care with my delivery in both senses. Not only must I enunciate and carefully arrange my phrases as I speak, I also must take special care to correct (what appears to be) my writing, lest I lose the confidence of my readers with what appear to be sloppy errors. While I could once type perfect copy, I cannot speak perfect copy, at least not yet, and although small mistakes in speech are often overlooked, because the product of oral writing looks identical to that of typing, different standards do not (usually) exist for its delivery.

My challenges using voice-recognition software are in many ways nothing more than the material difficulties any writer faces. Quintilian noted the limitations faced by a writer using pen and parchment (the advanced writing technology of his time, one that assisted those with a visual impairment in much the same way that voice-recognition software assists those with manual impairment today) instead of a waxen tablet. He pointed out that the moving of the hand back and forth from parchment to ink pot "causes delay, and interrupts the current of thoughts" (Quintilian 145). These difficulties correspond with those I encounter as I compose with my voice: delays result from (and hands) homonyms, from pronunciation/imitation difficulties, from learning new commands, and from having to find and fix unexpected and often odd errors. Clearly each writing technology comes with its own set of (in writing a Socratic) idiosyncratic problems.

Imitation as teaching/learning model

Voice-recognition software, unlike other writing technologies such as the typewriter, has some agency in that it "learns." It becomes better able to recognize my voice and my terms with practice; I, too, with practice become better able to tailor my speech to its needs. Thus, the software and I are both positioned in dual roles as teachers and learners. These roles are fluid, because while I "train" the software, the software also "trains" me—we both limit and expand what the other is capable of. This is perhaps the ideal teacher-learner relationship, but one rarely effected, and certainly not explicitly anticipated by ancient rhetors who tended to see themselves as authorities responsible for teaching, but not necessarily learning from, their students. Quintilian's second category of teaching methods, "imitation," demonstrates one way in which students were trained, with no mention of an instructor being changed by the process. James Murphy's introduction to On the Teaching of Speaking and Writing outlines the specific exercises Quintilian identifies:

a. Reading aloud (lectio)
b. Master's detailed analysis of a text (praelectio)
c. Memorization of models
d. Paraphrase of models
e. Transliteration (prose/verse and/or Latin/Greek)
f. Recitation of paraphrase or transliteration
g. Correction of paraphrase or transliteration (xxx-xxxi)

Analyzing my experience using voice recognition software may highlight the fluidity of roles between software and user as teacher and learner and illustrate the ways composition and new technologies collide. As students of ancient Roman rhetoric began learning imitation by reading aloud, I began to learn voice-recognition software by reading aloud set passages. This process enabled the writing program to become accustomed to my voice and pronunciation while enabling me to become accustomed to speaking at a certain pace and in specific ways: we are each positioned simultaneously as instructor and student and continue to be as I learn more about the program and it learns more about me.

The second step in training voice-recognition software is for the program to analyze all the documents on my computer. In this manner, the software becomes familiar with particular vocabularies and what the instructions described as my "writing style." In this step, the software is positioned as the instructor, akin to Quintilian's "Master's detailed analysis of a text," although perhaps this connection is somewhat tenuous because this analysis is invisible to the learner. This analysis does not serve as a sample that the learner may later imitate; instead, the software's analysis of the learner's text enables it to better imitate or approximate what it hears the learner say. This step too, then, illustrates a shifting of roles.

As I became fully invested in using voice recognition software to its highest potential, I discovered that rote memorization was required of me. Just as the software learned my vocabularies, I also had to learn the terms it prefers. Because the computer is better able to distinguish longer sounds than shorter sounds, it prefers phrases rather than word-by-word composition—difficult for anyone carefully weighing her word choices. For the same reason, I needed to memorize the military’s phonetic (out for that) alphabet to spell any word the computer didn't recognize. For example, to spell "recognize" I would need to say, "Spell that: [pause] Romeo Echo Charlie Oscar Golf November India Zulu Echo." Also, because the commands are quite specific, I had to unlearn the many tricks and commands I was comfortable with as a keyboarding writer. For example, where years of word processing experience would lead me to press "backspace" or highlight text and press control x in order to delete something, I had to learn to say "scratch that" or "delete previous two words"; my keyboarding tricks did not translate simply to the new program.

The paraphrase, transliteration, and recitation steps of Quintilian's imitation process share fewer connections with my learning voice recognition software, but correction is closely related. As I corrected the software's interpretation of my words, I also self-corrected, noticing where my pronunciation could have been more clear and which phrases may have been better understood if spoken of a piece rather than word by word (as I tend to think). Here again, I was positioned as both instructor and instructed in this phase of imitation.

While these may seem to be simple growing pains associated with any new writing technology, their interference with my writing process remains significant. Writing with voice-recognition software has impeded my usual process to such an extent that I now question whether I should continue encouraging my students to adopt a similar process. It is imperative that we do not, with our tendency to see new technologies as relatively simple solutions to physical problems, suggest that voice recognition software can unproblematically replace word processing, either for ourselves or our students.

Ancient advice for contemporary writers: Training oneself to write "right"

"It is not enough to have a supply of things to say, it is also necessary to say it in the right way"
-Aristotle, On Rhetoric

While not every ancient rhetorician/rhetor offers the same advice about speaking or writing, each does offer advice or a model to follow. Some focus more on process and practice, others on product. Three works whose advice resonates for users of voice recognition software, albeit in somewhat conflicting ways, are Plato's Phaedrus, Quintilian's On the Teaching of Speaking and Writing,and Cicero's On Oratory and Orators.

Plato's Phaedrus offers the following exchange about writing:

Socrates: Then the conclusion is obvious, that there's nothing shameful in the mere writing of speeches.
Phaedrus: Of course.
Socrates: But in speaking and writing shamefully and badly, instead of as one should, that is where the shame comes in, I take it.
Phaedrus: Clearly.
Socrates: Then what is the nature of good writing and bad? (Plato and Hackforth 115)

Here Plato conflates written and spoken discourse to a certain extent; he allows Socrates to determine not that writing is itself bad or shameful, but that writing, like speaking, can be done badly. He speaks here not of the quality of delivery or enunciation, but the quality of the content. He toys with the possibility of explicating how writing itself can be somehow "wrong," that there is a "should" to writing. Plato emphasizes the need for the writer to be knowledgeable, to know the truth. For Plato, "right" writing has to do not with how one composes or how one delivers that composition, but with what that composition contains. To Plato, it would seem that the method of composition would be rather inconsequential when compared to the character and knowledge of the speaker/writer. This advice would apply, then, to any method of composition, whether composing a speech orally, in writing, or using dictation or voice recognition software. Quintilian, like Plato, also addresses the question of writing well. Unlike Plato, however, he has distinct ideas about how writing should proceed. He offers advice that may be counter-intuitive to contemporary writers and writing teachers because he does not see value in drafting. However, should our writing processes and pedagogies need to change to protect our own and our students' bodies, Quintilian's advice may seem prescient. He prescribes particular methods which will result in good writing:

not practice only will assist (and in practice there is doubtless great effect) but also method, if we do not, lolling at our ease, looking at the ceiling, and trying to kindle our invention by muttering to ourselves, wait for what may present itself, but... set ourselves to write like reasonable beings—for nature herself will supply us not only with a commencement but with what ought to follow. (Quintilian and Murphy 142)

Quintilian's advice seems to speak as directly to writers of our era as to his. Any observer of a writer in the early stages of a writing task is likely to observe that writer pausing, looking around, trying to get comfortable, and using whatever form of brainstorming is available, whether that be talking to oneself, clustering, or writing lists. Clearly Quintilian would not approve of this behavior, because for him, this is not how a "reasonable being" should write. (He would not be a fan of Peter Elbow's theories of writing.) Although Quintilian does not explicate precisely how one should write, his statement that "nature herself" will supply both inspiration and what follows suggests that for him, composition is not difficult work, rather a matter of tapping into something that already exists. Although earlier in his text, he does discuss invention, his advice here seems to leave little room for the creative components of this first rhetorical canon.

The five canons of rhetoric are usually presented in this order: invention, arrangement, style, memory, and delivery. And the canons remain relevant and essential for composers using voice recognition software; however, the explicit order of these canons becomes destabilized. Delivery, while always the final step for the ancients, becomes a crucial aspect of invention. For a writer composing with voice-recognition software, the computer serves as both tool and audience for initial ideas. Delivery, in the ancient sense of oral precision, is as important throughout all stages of writing with voice recognition as the delivery of a final speech was for the ancients. Contemporary notions of delivery in writing, like ancient notions of oral delivery, apply to final products; we now focus on precision in presentation, design, and grammar. As I have practiced using voice-recognition software, however, I have found that I must follow the advice of the ancients regarding oral delivery even as I draft; I cannot reserve this for a final polishing step. If I don't speak clearly, forcefully, and with precision, I lose the thread of what I'm saying/writing, spending more of my time correcting errors than composing. I attempt to write in the manner to which I have become accustomed, beginning with a focused freewrite, which Elbow describes as "writing where you pour words down on paper quickly without planning or worrying about quality, but you stay on one subject...focused freewriting is especially useful for the hardest thing about writing: getting started" (5-6). I continue to expect that this oral freewriting, or "freespeaking," will provide all the benefits of freewriting, however I have discovered that when I "freespeak," the errors tend to be so numerous that even I, who "wrote" those words, cannot decipher them.

Although I'm accustomed to using writing as an anchor for my memory, getting rough ideas out on screen or on paper before I shape them into recognizable prose, using voice-recognition software, however, I find that I cannot rely on the screen to save my thoughts. Quintilian would, no doubt, approve of the way I am now forced to write. He offers the following caution against excessive speed in composing, explaining that some write

what they call a rough copy, which they then go over again, and arrange what they hastily poured forth; but though the words and rhythm of the sentences are mended, there still remains the same want of solid connection that there was originally in the parts hurriedly thrown together. It will be better, therefore, to use care at first, and so to form our work from the beginning that we may have merely to polish it and not to mold it anew. (142)

If only Quintilian had trained me to write "right," I might not be in the pickle I am in now. As it stands, however, I have learned to generate ideas by tossing them onto paper with little regard as to their order or their eventual use in a final product. It seems that having been well-schooled in process pedagogy places me at a disadvantage as an oral composer. This new understanding may also have implications for me as a writing teacher.

Collisions and elisions between technology and orality

"Technology is entrenched in our history."
-Heidegger, qtd. in Kittler

Voice-recognition software is only the latest in a series of oral mediations of writing. Each of these mediations has included some form of technology; what is unique about the emergence of voice-recognition software is that this instance of oral mediation, unlike traditional dictation, does not require a human intermediary. Dictation itself has a long history: for centuries, writers have used dictation to accomplish tasks their bodies are not suited for; on the surface, voice-recognition software offers but one more option for dictation. Both dictating to a person and to a software program involve speaking, imitation, and translation; both require learning new skills and adapting writing practices to fit the situation; both offer writers a written voice they otherwise may not have; both inevitably result in different sorts of errors than physical writing. Both, then, provide opportunities even as they also structure and limit possibilities. Dictating to voice recognition software represents a move away from human collaboration; further, it represents a highly concentrated reliance on technologies less within a writer's direct control than earlier technologies such as a pen and paper. It could be argued, however, that the dictationist operates more as an extension of a machine or a writing technology than as a human collaborator; under this view, dictation to a person and dictation to a software program would be nearly indistinguishable. I argue that the differences are significant, however, at least as significant as the move from handwriting to typing.

Friedrich Kittler explains that for Nietzsche, typing, unlike handwriting, “is no longer a natural extension of humans who bring forth their voice, soul, individuality through their handwriting. On the contrary, […] they turn from the agency of writing to become an inscription surface” (210). According to this view, the writer is not using a machine, but is rather being used by the machine. While voice-recognition software is technologically further removed from handwriting than typing or word-processing are, it does require one's voice—not the "voice" defined as the writer's persona that comes through the writing, but the actual spoken voice of the "writer." Voice takes on its literal, bodily meaning in addition to the metaphorical meaning many writing teachers have become accustomed to.

Any type of writing, regardless of the technology through which it is mediated, requires the body: we tend to think of writing as requiring the hand or hands, but dictation, especially with voice-recognition software, offers the possibility/requirement of writing through the voice. Friedrich Kittler contends that "all media conceptions at the turn of the century" manifested "a crucial link between physiology and technology" (Kittler 73). Although Kittler's example, a corpse repurposed as part of a gramophone, is not relevant here because voice recognition does not attempt to use the body as a new technology, his linking of physiology and technology remains crucial: the human body enables technology to work. This revelation, or reminder, supports (dairy dyes) Derrida's assertion that writing provides an instance of an author's “distant presence,” toppling the traditional understanding that the author is absent from writing (but not from speaking). If voice is as important to a writer's presence as we might imagine, the description of "distant presence" becomes, in a sense, even more accurate in dictation than when the author herself writes (or types) because, in dictating, the author's presence is made known solely through the voice.

Dictation requires not only the author's voice, however, but the dictationist's hands and voice, or in the case of voice recognition, the software and hardware of a computer. These physical requirements and layers of distance affect the writing process. Whether one's words are mediated through a person or a machine, any oral mediation slows the writing process and hinders the writer's ability to effectively and efficiently communicate her ideas. Quintilian notes problems with dictation as a composing process:

It happens [when dictating] that not only inelegant and casual expressions, but sometimes unsuitable ones, escape us [...] expressions which partake neither of the accuracy of the writer nor of the animation of the speaker; while, if the person who takes down what is dictated should prove, from slowness in writing, or from inaccuracy in reading, a hindrance, as it were, to us, the course of our thought is obstructed, and all the fire that had been conceived in our mind is dispelled by delay, or sometimes, by anger at the offender. (Quintilian 143)

Quintilian's reasons for disapproving of dictation mirror my own difficulties with voice-recognition software. He sees the results of dictation as far too easily prone to sloppiness; like contemporary audiences, he would see errors as "inelegant," "casual," and even "unsuitable." Unlike contemporary audiences, however, Quintilian saw no value in revision—he believed a text, once produced, should be a final product. Clearly none of us teaching composition believes this today—we take both students' and our own writing through multiple drafts. Although I teach this way and write this way, I cannot help but feel a pedagogical nostalgia for Quintilian's theories, wishing I could teach myself or my students to write what some call the "one draft wonder." The technology that allows us to quickly and easily revise seems also to force us to do so. The affordances and constraints of writing technologies, then, are not always equally balanced—the slightest shift in physical capability heavily weights the constraints side of the equation.

Quintilian suggests that a writer dictating may become angry at the dictation-taker for hindering the writing by being slow or inaccurate. This anger, addressed at a human being, may prove somehow productive; the dictationist may improve out of a desire to please the dictator (ha) or simply by learning the writer's style. Unlike a dictationist, however, voice-recognition software does not desire to please me, does not respond to anger, and although it does learn, does not learn the lessons I would most like it to.

Yet another challenge with voice-recognition software is finding appropriate space to use it. Where traditional writing and typing can be and often are both silent and private, oral writing is necessarily loud and public. Virginia Woolf's (office-repeated) oft-repeated call for a room of one's own is even more crucial for one using dictation or voice recognition software than for one writing or typing. Unlike most writers today, who can write in coffee shops, shared offices, or anywhere they can take a laptop, writers using voice recognition software do not have these options. Most writers, and especially students, are unlikely to be able to find spaces where they can compose orally without disturbing others. Even those writing in their own homes must have a room of their own in order to be able to compose without over-sharing or imposing.

Well-meaning friends and colleagues who suggest that I "try voice-recognition software" seem to expect this technology to be a solution. As this paper demonstrates, the technology solves one problem, but creates a host of others. As we in composition studies know, at least on a theoretical level, new technologies will always provide both opportunities and limitations. They will also encounter resistance and perhaps foster inequities. However advanced voice recognition software seems, the principles of its use are rooted in ancient principles of rhetoric and teaching: the five canons, whatever their order, remain essential; the interaction of body and technology, however problematic, persists; oratory, whether as product or element of process, never ceases to be a significant, even essential, aspect of communication.

While this paper may in some ways be about all the new problems that voice recognition software creates for a writer, at the same time it is about the ways in which all of these problems are brought about by particular understandings of the writing process. As writing instructors, we have a duty to our students to help them learn to write well; often we encourage more and more writing under the theory that more writing begets better writing and better thinking. While I have subscribed to this theory for all of my professional life and much of my life as a student, I find myself now in a position to question it. That writing, which I have always found to be liberatory and essential to my self-definition, could cause me to lose the ability not only to write, but to perform the most basic tasks of self-care, is profoundly disturbing.

Using voice-recognition software has led me to see potential pitfalls in my own writing process as well as in my writing pedagogy. I question our emphasis on more, more, more writing. I also, however, see new possibilities for teaching: perhaps students who have physical disabilities may find freedom in voice recognition software; perhaps we can train students to draft and/or revise orally or in their heads, minimizing reliance on keyboarding technologies; perhaps if we expose students to a variety of composing processes, reducing reliance on revision, their processes may not become as entrenched and limiting as my own, allowing them more flexibility both as authors and users of technology.

Notes

1. Cicero (in J.S. Watson's translation of On Oratory and Orators) opines that "[w]riting is said to be the best and most excellent modeler and teacher of oratory" (42); Quintilian writes (in James Murphy's translation of On the Teaching of Speaking and Writing): "by writing we speak with greater accuracy, and by speaking we write with greater ease.' (157); Plato suggests (in R. Hackforth's Plato's Phaedrus) that writing and speaking both have merit (115).
2. This article was submitted and accepted for publication in 2005; readers should be aware, then, that the timeline in the narrative is somewhat outdated.
3. Specifically, extensive drafting and revision of my own work and long, narrative responses to student writing for a summer tutoring job conducted over the Internet, combined with a second job designing a website, and topped off with apparently excessive emailing and instant messaging, left me unable to hold a cup, much less type.
4. As I explain in more detail below, I leave this mistake ("Automaker Hal" for "only tell") to give readers a glimpse into the challenges of composing with voice recognition software. I deliberately place these errors before the correction to allow readers to feel some of the frustration and confusion that can occur, although for readers' sake I limit the number of these retained errors.

Search form

You are here

"Look Ma, No Hands!": Voice-Recognition Software, Writing, and Ancient Rhetoric