Skip to main content


Rhythm plays a crucial role in creating cohesion and structure in multimodal texts and communicative events that unfold over time. It has its basis in the rhythms of our body – the heartbeat and the breathing cycle. Understanding body rhythm and the physicalities of motor behavior is therefore fundamental to understanding semiotic structuring and multimodal communication. In semiosis, as Kristeva has put it (1980: 183) “instinctual rhythm becomes logical rhythm”

The essence of rhythm is a regular alternation between ‘accented’ and ‘unaccented’ moments – between an up and a down, a tense and a lax, a long and a short, a loud and a soft, and so on. The ‘up’ moments, or accents, demarcate ‘rhythmic feet’, equally timed units consisting of an accented syllable plus or minus one or more unaccented syllables, with the accented moment higher in pitch, louder, or longer, or some combination; and the duration of the equally timed feet constitutes tempo.  In the case of body movement, the accented gestures or other movements will have greater force or extent, greater ‘vividness’ (cf. Martinec 2000: 291). But accents, whether in speech, music or gesture, also play a key role in articulating meaning because they foreground the sounds or movements that carry the most important information of each rhythmic foot. The meaning of the following line of dialogue might still come across if we just heard the accented syllables (Why think work this?) but not if we only heard do you I am king like (rhythmic feet are separated by oblique strokes and accents are italicized):

/Why do you/think I am/working like/this/

Rhythm also organizes rhythmic feet (up to seven) into rhythmic phrases, also referred to as ‘breath groups’ as their duration is similar to that of the breathing cycle. Between the phrases there is some kind of break – a short pause, a drawing out of the final sound or movement, or a change of tempo or other momentary discontinuity. In music, where the beat usually continues uninterrupted, one or two silent beats may substitute for the pause at the end of a phrase. Rhythmic phrases also play a key role in the meaningful structuring of time-based texts. They ‘frame’ the moves in the ongoing communicative act, as demonstrated in this example, an extract form a research interview with a radio announcer (the phrases end with a double oblique stroke and are enclosed in square brackets):

[I’ve read/news at all/sorts of/places and the the//] [blokes that/write the news/ here//] [em//] [certainly//] [far easier to/ read//] [and that’s/ not just because I’m/working here/now//][than/any news I’ve/read anywhere/else//]

In this extract each rhythmic phrase propels the speaker’s train of thought a step further (or buys him time to do so, in the case of em). The whole of the extract then forms a larger scale rhythmical unit, the rhythmic ‘move’, followed by a longer pause and functioning as a stage in the generic structure of the interaction, in this case a ‘response’ turn in an interview.

Films and television programs integrate different rhythms, the rhythms reproduced on film or video (those of the actors’ movements, those of the dialogue or narration, and those of the music and other sounds, or both) and the rhythms stemming from the process of filming itself (the rhythms of camera movements and edits). In a given scene, one of these, usually either the action, the speech or the music, provides the basic ‘beat’ of the sequence. The others are then synchronized to this rhythm during the editing and the soundtrack laying process. In a dialogue scene, for instance, there is synchrony, not just between the actors’ movements and the dialogue but also between these and the edits. Both Van Leeuwen (e.g. 1985, 1999, 2005) and Martinec (2000) have analyzed sequences from feature films on this basis. The theory of rhythm is also taken up in Baldry and Thibault’s (2006) compendium of multimodal transcription and analysis.

Rhythm has also played a role in the analysis of everyday interaction. In the 1960s and 1970s a number of American researchers began to engage in the highly detailed analysis of rhythm in everyday interaction. The medium of 16 mm film, recorded at high speed (94 frames per second) made it easy to slow down, speed up and stop-frame body movement and was therefore ideal in bringing out the subtleties of interactive rhythm and the rhythmic integration of behaviour in interaction. Films were made of people engaged in conversation, e.g. by Erickson, who used musical notation with a stave for each of the participants of a dinner conversation, and concluded that ‘rhythm seems to be the fundamental glue by which cohesive discourse is maintained in conversation” (Erickson, 1982: 65). Hall (1983) described the interaction of children in a playground, where the rhythms of children playing in different parts of the playground turned out to be in sync not only with each other, but also with those of a girl whose “skipping and dancing and twirling” from one group to another appeared to “orchestrate the movements of the entire playground” (Hall, 1983: 169). His conclusion was that “individuals are dominated in their behavior by complex hierarchies of interlocking rhythm, comparable to fundamental themes in a symphonic score” (Hall, 1983: 153), and that rhythm provides the basic building blocks of behaviour.

It is regrettable that rhythm, so far, has not yet achieved a more central place in multimodal text analysis and multimodal conversation analysis, because it is the one thing all time-based semiotic modes have in common, making rhythmic structure a very important actor in integrating different time-based modes into a multimodal whole. Rhythm is fundamentally multimodal, and almost inevitably invokes comparisons between dance, music, and speech.  As Couper-Kühlen (1993: 112) has said, in a linguistic account of rhythm: “The principle of organization of (speech, verse and music) are surprisingly similar and allow for the same play off between abstract construct and underlying structure and actual realization.”

Citing this entry

van Leeuwen, Theo. 2015.  “Rhythm.” In Key Terms in Multimodality: Definitions, Issues, Discussions, edited by Nina Nørgaard. Retrieved


Baldry, A. and Thibault, P.J. (2006) Multimodal transcription and text analysis. London, Equinox.

Condon, W.S. (1978) “An analysis of behavioural organization’ Sign Language Studies 13: 285-318.

Couper-Kühlen, E. (1993) English speech rhythm: Form and function in everyday verbal interaction. Amsterdam, Benjamins.

Erickson, F. (1982) ‘Moneytree, lasagna bush, salt and pepper: Social construction of topical cohesion among Italian Americans’. In D. Tannen, ed. Analyzing discourse: Text and Talk, Washington DC, Georgetown University Press.

Hall, E.T. (1983)The dance of life: The other dimension of time. New York, Anchor Books.

Kristeva, J. (1980) Desire in language: A semiotic approach to literature and art, Oxford, Oxford University Press.

Martinec, R. (2000) ‘Rhythm in multimodal texts’, Leonardo 33(4): 289-97.

Martinec, R, (2002) ‘Rhythmic hierarchy in monologue and dialogue’, Functions of Language 9(1): 39-59.

Van Leeuwen, T. (1985) ‘Rhythmic structure of the film text’ in T.A. van Dijk, ed. Discourse and Communication: New approaches to the analysis of mass media discourse and communication. Berlin, De Gruyter.’

Van Leeuwen, T. (1999) Speech, Music, Sound. London: Palgrave Macmillan

Van Leeuwen, T. (2005) Introducing Social Semiotics. London, Routledge.