Skip to main content

Multimodal text

We can define text, in the simplest way perhaps, by saying that it is language that is functional. By functional, we simply mean language that is doing some job in some context […] So any instance of living language that is playing some part in a context of written, or indeed in any other medium of expression that we like to think of. (Halliday 1989: 10)

Traditionally, the term “text” has been used to refer “mono-modally” to verbal text. However, according to Kress and van Leeuwen (2001), all texts are multimodal, meaning that all texts always and without exception involve the interaction and integration of several semiotic modes. This is obviously the case which texts such as websites, which draw on – and integrate – a variety of semiotic modes such as wording, (still and moving) images, typography, layout, colour, etc. for their meaning-making. Nevertheless, texts which are traditionally seen as mono-modal, for instance a business letter, face-to-face conversation or a conventional novel, are also multimodal. The business letter and the conventional novel, for example, both depend on and cannot be realised without choices made from the modes of wording, layout, colour and typography. A face-to-face conversation, on the other hand, involves the modes of wording, voice, gesture, facial expression and maybe even touch. The advantage of approaching text from a multimodal perspective is the consequent acknowledgement of all the modes which are involved in its semiosis.

The meaning-making realised by multimodal texts does not simply consist of an adding together of a number of different modes. Instead, multimodal semiosis springs from a far more complex interplay of the modes involved:

Multimodal texts integrate selections from different semiotic resources to their principles of organisation. (…) These resources are not simply juxtaposed as separate modes of meaning making but are combined and integrated to form a complex whole which cannot be reduced to, or explained in terms of the mere sum of its separate parts. (Baldry and Thibault 2006: 3)

Multimodal texts can be very small and simple, but they can also be very large and complex. A simple logo may thus function as a text as may an entire city. Imagine somebody teaching in a classroom, projecting an image on a screen, writing text about it on the black board, talking about it while gesticulating. In this example, the image may be seen as a (multimodal) text in itself, as may the wording on the black board and what the teacher says. At the same time, all the above can be seen as one text; a lesson on art, for example. The lesson, of course, takes place in a particular room. In principle, this room with its interior design, the students, the teacher, etc. may also be seen as a text. This text, in turn, may be seen as part of the larger text of the university building in which it is placed. Furthermore, the university building may be a part of the larger text of the city where it is situated. If, in our analysis, we look at the art lesson as the text, the rest must be considered context. If, on the other hand, we look at the city as a text, the university, the classroom, etc. must be seen as parts of that text. None of these texts are a priori better or more important than the others. What is important is the perspective – or the “text zoom” (cf. Boeriis 2012) – we choose for our analysis.

Texts involve many different interacting systems of different kind on different levels of textual organisation. (…) Texts are embedded in, and help to constitute, the contexts in which they function. Texts are thus inseparable parts of the meaning-making activities in which they take part. A functional and semiotic definition of text seeks to understand the ways in which the intrinsic properties of texts and their organisation enable them to be coupled to their contexts. (Baldry & Thibault 2006: 3)

In the Odense School, we currently have vigorous discussions about the very nature of “text”. A recurrent example is the throwing of a stone and the extent to which this action may be considered a text, and, if so, then when? And to whom? Is the throwing of a stone a text in itself, or does it only become a text when the action is recorded or otherwise represented? Finally, the demarcation lines between “text” and “communication” are not clear and should be investigated further.

Citing this entry

Boeriis, Morten and Nørgaard, Nina. 2015. “Multimodal text”. In Key Terms in Multimodality: Definitions, Issues, Discussions, edited by Nina Nørgaard. Retrieved


Baldry, P. & P.J. Thibault (2006). Multimodal Transcription and Text Analysis. London: Equinox.

Boeriis, Morten & Jana Holsanova (2012). “Tracking Visual Segmentation: Connecting Semiotic and Recipient Perspectives”. In: J. Holsanova (Ed.), Multimodal Methodologies. Special issue of Journal of Visual Communication, August 2012, 11:3. London: Sage.

Halliday, M.A.K. (1989). “Part A” in M.A.K. Halliday & Ruqaiya Hasan (eds.) Language, Context and Text: Aspects of language in a social-semiotic perspective: 1-49. Oxford: Oxford University Press

Kress, G. & T. van Leeuwen (2001). Multimodal Discourse. The Modes and Media of Contemporary Communication. London: Arnold.