Skip to main content

Intonation Swap Study: Using Robots in Prosody Research

Two Keepon Robots

 Hi Selina, can you give a quick introduction to yourself?

My name is Selina Sara Eisenberger, I’m from Austria and I am enrolled in a bachelor’s programme in International Business Communication. I’m working as a research assistant in a project on Improving Second Language Pedagogy at the Human-Robot Interaction lab in Sønderborg. In this role, I work on the Intonation Swap Study.

What is the Intonation Swap Study about?

We analyse the effects of speech melody in different languages by swapping them. The languages we investigate are English, German and Danish. 

How does intonation swap work?

The final intonation contours in English, German and Danish differ considerably; in our experiments, we analyse the speech melodies of questions in each language, then ‘swap’ the detected intonation and finally investigate the reception of the manipulated sentence by participants of our studies by means of a questionnaire. Hence we can measure the effects of intonation on the perception of the respective speaker.

What is the set-up of the study?

Our experiment preparation consists of at least three major parts: recording of the original stimuli and manipulating the audio files to match the intonation contour of the other language; recording a robot video; and finally merging the audio files with the robot video. 

How is the recording procedure for audio?

We need to analyse questions in different languages. For this reason, we record audio files with native speakers of English, Danish and German. The native speakers read the questions of a demographic questionnaire to an interviewee. We then cut the whole session into separate audio files, each of these containing a single question.Then we swap the intonation of questions in one language onto the questions in another language by manipulating the audio files with Praat.

Praat Screenshot: Original intonation of German language

The frequency-response curve of the original German question "Wie alt sind Sie?" shows a rising intonation to the end of the sentence.

Praat Screenshot: German question with manipulated Intonation

The frequency-response curve of the  manipulated German question "Wie alt sind Sie?" shows a falling intonation to the end of the sentence. This intonation is characteristic for Danish Language.

How is the recording procedure for video?

We video tape two moving Keepon robots and delete all audio signals from the source video. The result are robot videos without an audio track. At last, we combine the video files of the robots with the audio files of the native speakers and with the manipulated files. As a result we have one robot speaking with the original voice of a native speaker and another robot talking with the manipulated intonation.

So all in all a rather elaborate setup.  How do you conduct the experiment?

The participants of the experiment are asked to fill out a questionnaire. We conduct  an online survey with LimeSurvey, where participants get to see videos in which one Keepon asks demographic questions using the original speech melody, whereas the other Keepon asks demographic questions using the manipulated speech melodies. At the end of the questionnaire the participants are asked to rate the robots.

In which categories are the robots rated by the participants?

The participants have to judge which robot sounds more natural or friendly. For this reason, the participants assess the respective robot in different categories. These categories comprise dominance, sociability, formality and insecurity, for instance. The rating reflects the participant’s impression of the robot.

What was the advantage of using robot speakers instead of humans?

Robots can be controlled better than people, and it is also more natural to rate a robot than a person. The two robots look exactly the same, the only difference between them being their intonation contours when asking questions.

Can you give an example of an intonation swap?

Questions are perfect for testing because they can differ very much in intonation between languages. A good example is the German question “Wie alt sind Sie?”, which means “How old are you?” in English. In German, a question prototypically ends with a rising intonation contour. The Danish translation of the question “Hvor gammel er du?”, however, has a falling intonation contour at the end of the question. After the analysis of the intonation patterns of the different languages, we can swap their intonation with Praat. This software is well suited for the manipulation of sound files.  A manipulated sound file of the German question “Wie alt sind Sie?” would then have a falling intonation similar to Danish, where a  question normally ends with a falling intonation.

The left Keepon robot asks the original German question "Wie alt sind Sie?" with the rising intonation to the end of the sentence.

The right Keepon robot asks the manipulated German question with the falling intonation to the end of the sentence. This intonation is characteristic for Danish.

How important is intonation in general?

Understanding the intonation of the target language is very important when learning a second language. In my experience, most language classes today unfortunately don’t focus on intonation well enough. Traditionally, teachers focus more on grammar and vocabulary. However, emphasis, rhythm and intonation – all that you call language prosody – is equally important for the understandability of spoken language and of the impression the speaker makes. 

Please tell me something about the tool you use for the intonation swap. What kind of software is Praat?

Praat is a free desktop application which is supported by Mac, Windows and Linux operating systems. It’s open source software, and the developers provide updates regularly. As far as I know, it’s kind of standard software for phoneticians. Praat allows profound manipulations in sound files, such as the intonation swap in our studies.

I've just visited the Praat website. It does look a bit outdated, right?

The Praat website may look a little bit old fashioned visually, but it provides all the information you need. The software itself is good and well documented with beginners’ manuals in different languages. We use Praat for not only for intonation swapping, but for audio cutting, format conversion or volume adjustments, too.

Screenshot Praat Website

Form strictly follows function: The layout of the Praat is frugal but the free software is a very powerful tool for phonetic research.

What was your practical experience when using Praat for the intonation swap?

In general we try to change the intonation while trying to preserve the natural sound of the spoken languages as well as possible. It turned out that it was very difficult with female voices. When we tried to manipulate a woman's voice, the outcome often sounded robotic and unnatural.

Were the male voices easier to handle?

The intonation swap was definitely easier with male voices! Another thing: As we record our test files in an office environment at the university, there is always some unwanted side noise, for example, door slamming. We delete acoustic noise by cutting out these parts with Praat.

As you mentioned earlier, you embedded these audio files into an online survey, using LimeSurvey. Why did you decide on LimeSurvey?

LimeSurvey is an established open source software for the conduction of online surveys. It allows the inclusion of video files and it is free of charge, but a bit tricky to use.

LimeSurvey Logo

LimeSurvey is free Open Source Software for online surveys.

But in the meantime SDU set up a policy to use SurveyXact exclusively, right?

That is correct, I don’t know the reason but it is likely connected with recent GDPR-requirements in SDU. It’s much easier to fulfill these if you use the same standard software throughout the organization. So we are not allowed to use LimeSurvey any more. However, the functionality of LimeSurvey was and is still very good. From a pure technical point of view, LimeSurvey is still a highly recommendable application for doing surveys.

How was your experience when using LimeSurvey as an admin?

Well, it’s a richly featured tool. As an admin, you will need some time to get used to the interface. The design of the program's interface is good, but it will need some experience to dig your way through all the menus and pages. It would be recommendable to use LimeSurvey on a regular basis. Just knowing where every feature is located will speed up your workflow in LimeSurvey a lot because you get lost less often!

Was the technical setup difficult?

The technical setup of the questionnaires and even embedding the videos – we use YouTube-videos hosted on a personal account – was no big deal. We implement some randomisation, but that works well, too. LimeSurvey is a quite mature tool with many features, and according to my experience, it does offer features even for more exotic use cases. The hard part, as I said before, can be to find the right buttons in the interface.

How intuitive are questionnaires in LimeSurvey from the participants’ point of view?

My personal impression and what I heard from participants is, that the questionnaires are intuitive and easy to manage.

Do you miss any features in LimeSurvey?

I am not fully satisfied with the default-design of the survey. But you can adjust the design in LimeSurvey by using the built-in template system of the software.


Intonation Swap LimeSurvey Setup

LimeSurvey has a very clean interface. It is easy to set up and to edit questions in the application. 

LimeSurvey Frontend

The default design of the frontend with the user questions is simple and functional. It is possible to adjust the design to own needs.

For the intonation swap studies, YouTube videos were embedded into the questions.


How is the setup of the questionnaires?

For the intonation swap study, we wrote a short introduction, followed by 10 questions. We embedded short videos in 8 questions. Most questions were short and simple on purpose so that the sentence structure was similar across the three languages. The participants have to answer mostly with either Yes or No. Only the last question has to be answered by using a Likert scale.

Which question types do you use in LimeSurvey?

The question types we use are text-display for video, long free text and an array for the Likert scale. All in all the participants should not need more than 7 minutes for all the questions. You can participate using a desktop computer or a mobile device, the survey is fully responsive and will adjust to different screen sizes.

What are the obstacles while setting up the questionnaire?

On a technical level, it is finding the correct question types in LimeSurvey. We had to dig through the interface a little bit, it’s not overly intuitive. On a content level, it is the wording of the questions. Writing simple, short questions that still fulfil the intended purpose really takes some time and effort.

How do you represent the multi-language approach of your project in LimeSurvey?

It is possible to create one combined multi-language questionnaire in LimeSurvey, but it turned out to be easier to set up separate questionnaires for each language. So we set up separate questionnaires for each of the three languages investigated, German, Danish and English. This allows native speakers to judge the sound files in their native language.

Wrapping it up: What is – so far –the main conclusion of the Intonation Swap Study?

Intonation is crucial for the use of a second language, and language classes in school should reflect that - which is, unfortunately – at least at the current state – mostly not given. Even little mistakes in intonation do negatively affect the receiver’s reaction to the speaker. If you want to master a language, you have to take care of the details like the intonation. Investing some time in proper intonation does pay out.

Selina, thanks for the interview. Mange tak!

Selv tak.


This interview had been conducted by Sascha Steinhoff in Sønderborg, 16th of August 2018