Conversation's response in two and more bubbles



Hello, community! I’m trying to split text into several bubbles (speech messages) continiously followings each other , but generally I always get one bubble. I’ve tried ssml, speechEditor and other variants described in doc. Does anybody solved this issue?


Good point.
I’m waiting for a Pull Request and will publish the feature tomorrow.

It will look like this

    displayText: 'Hello',
    textToSpeech: 'Test'


@AlexSwe it’s quite similar to , but does it resolves the issue with multiple bubbles? I’ve found here that someone tried to investigate this case earlier, bit seems no luck!


Published the feature a couple of minutes ago.

Yes, this will add a second bubble. (it’s not possible to add more)

Full sample:

    displayText: 'Hello',
    textToSpeech: 'Test'


Have tested and should say it works fine! By the way, could we send in a cast of second speech buble e.g. Basic Card?


Try this:

const basicCard = new BasicCard()
                url: '',
                accessibilityText: 'accessibilityText'})
            .setFormattedText('Formatted Text')


Thank you! I got it. When I use SSML with markup my agent pronounce all tags . Can I avoid it?


Can you share some code? Usually SSML-tags aren’t pronounced


I use SpeechBuilder

function get_help_info(e_g_jovo_obj) {

    return e_g_jovo_obj.speechBuilder()
        .addSentence('You are playing XXX quiz.')
        .addSentence('Do you need to hear the question once more?');

So, the output contains tags <speak>, <s>.

let info = get_help_info();
                displayText: ...,
                textToSpeech: ...

like that needs some processing of info object , 'cause appendSimpleResponse won’t work with it and when I use info.speech property then SSML tags are shown in the output and assistant pronounce tags also.


So, you want to remove the SSML tags from the string?
You could use a SpeechBuilder helper method.

const { App, SpeechBuilder } = require('jovo-framework');

In your case:



Not actually. Yes, I can strip tags, but it will cause effect on both display text and text to speech. What I really want is to use appendSimpleResponse for second bubble , show html-stripped text on device screen , but still hear voice with delays(breaks) and other SSML features.


I may misunderstand something, but the sample below should solve the problem?! :smiley:

let textWithoutSSML = SpeechBuilder.removeSSML(info.speech);

                displayText: textWithoutSSML ,
                textToSpeech: info.speech


I think I’ve been testing this one :slight_smile: I’ll check one more time. Thank you!


const { SpeechBuilder } = require('jovo-framework');
let sp = this.speechBuilder()
        .addSentence('Hello, how are you?')
        .addSentence('Hope you are doing well.');

let textWithoutSSML = SpeechBuilder.removeSSML(sp.speech);
        displayText: textWithoutSSML,
        textToSpeech: sp.speech

Unforunately, my rich response output is

"richResponse": {
"items": [
"simpleResponse": {
"ssml": "<speak><s>Hello and welcome back, my friend!</s> <break time=\"300ms\"/> <s>You are on </s> <say-as interpret-as=\"ordinal\">2</say-as> level. Get ready for your question. <break time=\"100ms\"/> <s>The right answer contains </s> <say-as interpret-as=\"cardinal\">4</say-as>  letters. <break time=\"300ms\"/></speak>"
"simpleResponse": {
"displayText": "Hello, how are you?  Hope you are doing well.",
"textToSpeech": "<s>Hello, how are you?</s> <break time=\"300ms\"/> <s>Hope you are doing well.</s>"

and all meta tags from appendSimpleResponse() was sayed by assistant