How do you Structure your Voice App Content?

cms

#1

Hey all,

I wanted to bring up a topic that we’ve been discussing for a while: How to structure your i18n content in different keys.

We usually use nested objects like in this template:

"welcome": {
      "speech": "Hello World! What's your name?",
      "reprompt": "Please tell me your name."
    },

And then access it in the code like this:

HelloWorldIntent() {
        this.ask(this.t('welcome.speech'), this.t('welcome.reprompt'));
    },

The more complex content structures we built, the more we thought about standardizing it more. Having everything in a speech element felt weird, e.g in cases where you want to repeat things.

The question is: How can we split parts of the speech up into structured keys?

@marktucker shares interesting Alexa Skill development tips in a GitHub repository, and mentions several content types, including message, hint, and prompt for the speech:

He defines the different parts of the output in conversation mode like this:

This made me think if we could use something like this as a best practice:

"welcome": {
      "message": "Hello World!",
      "prompt": "What's your name?",
      "reprompt": "Please tell me your name."
    },

With the SpeechBuilder, we could then build it like this:

HelloWorldIntent() {
        this.$speech.addT('welcome.message')
            .addT('welcome.prompt');
        this.$reprompt.addT('welcome.reprompt');

        this.ask(this.$speech, this.$reprompt);
    },

However, this feels a little redundant. So for people who want to use that structure, we might be able to do this (and grab the values in the background after validating that the key returns an object):

HelloWorldIntent() {
        this.ask(this.t('welcome'));
    },

This would free up some redundant code and allow content creators to add hints (and even rules for hints?) at the CMS level, making additional work on the code for changes like this unnecessary.

What do you think?


#2

This is a situation where a convention can help. Check for the convention of the structure

This:
“welcome”: {
“message”: “Hello World!”,
“prompt”: “What’s your name?”,
“reprompt”: “Please tell me your name.”
},

Or This:
“welcome”: {
“message”: “Hello World!”,
“hint”: “Some hint that can go away after user has experience with skill”,
“prompt”: “What’s your name?”,
“reprompt”: “Please tell me your name.”
},

Or This (array indicates pick one at random):
“welcome”: {
“message”: [“Hello World!”, “Hi there”],
“hint”:[“hint 1”, “hint 2”],
“prompt”: [“What’s your name?”, “What is your full name?”],
“reprompt”: [“Please tell me your name.”, “Tell me your name”]
},

And if it is not followed, then allow for any structure.


#3

Agree! Thanks for sharing this @marktucker.

There are a few things that I’m currently thinking about:

  • How could multiple reprompts (for Google Assistant) be added? Maybe in an additional object called reprompts?
  • Many people like to “flow through” the app logic and use the Jovo SpeechBuilder to create responses step by step with addText depending on certain parts of the app. The question is if a structure like this could still work for them. Maybe by using the i18next Nesting to reference other keys
  • How could visual output be added?

#4

Great question, @jan, and thanks to both you and @marktucker for your insights! :+1:

Let me share how I typically structure texts in the voice apps I build.
First of all, I admit that I like to work with states. I’m aware that thinking in states in discouraged pretty much since ASK SDK v2 was introduced by the Alexa team, but I think it still has its merit, as long as it doesn’t limit the conversation too much.

The way I think about a state is that it’s about retrieving a piece of information that’s required to move the conversation forward. It’s currently lunchtime as I write this, so let’s imagine a voice app that lets you configurate your burger (for ordering). One of the pieces of informations it needs is which type of bread you want, so there’s a bread state.
So let’s look at the anatomy of how I would structure the texts for the bread state:


The response keys would be the following:

  • bread-intro: We offer a range of breads for your burger bun.
  • bread-prompt: Which type of bread would you like?
  • bread-help: We offer three types of bread: Oregano, whole grain and italian, which is our customers’ favorite.
  • bread-unhandled: Sorry, this* is not a type of bread we know. We have oregano, whole grain and italian.
  • bread-confirm: This* sounds good!
  • bread-confirm: Sorry, this* is not available right now!

Based on these elements, you can build the happy path (bread-intro, bread-prompt, valid user input, bread-confirm) through the state, and from there move on to the next state (maybe beginning with topping-intro), as well as ways to handle help, reprompt, repeat and invalid input cases.
The elements marked with an asterisk* could be even improved by stating what the voice app understood instead of a generic description.


#5

So instead of grouping by response (welcome), you group by state (bread). This is interesting. In a nested object, this would look like this:

"bread": {
    "intro": "We offer a range of breads for your burger bun.",
    "prompt": "Which type of bread would you like?",
    "help": "We offer three types of bread: Oregano, whole grain and italian, which is our customers’ favorite.",
    "unhandled": "Sorry, this* is not a type of bread we know. We have oregano, whole grain and italian.",
    "confirm": "This* sounds good!",
    "reject": "Sorry, this* is not available right now!"
},

So and interaction could look like this:

  • App: previous-state-confirm + bread-intro + bread-prompt
  • User: Oregano!
  • App: bread-confirm + topping-intro + topping-prompt

This is an interesting concept and is closer to the “flowing through” the app logic concept I mentioned above.

When we’re thinking abut terms, the one you use are still close to the ones mentioned above, I think, confirm and intro could be part of the message, and help could be understood as hint.


#6

Expanding this idea past content, this is a common pattern. How about providing a class where this flow is black-boxed into the state that you define?


#7

Sounds interesting! Do you have an idea how this could look like?

(I’m also adding this to the “Feature Request” category now)


#8

Amazon provides something that I call “conversation flows”: packaged, multi-turn conversations with accompanying language model, logic, and responses. For example, for In-Skill Purchasing there is a point where the skill developer hands over the conversation to Amazon so that the user confirms a purchase and their account is charged outside the skill code.

I imagine something similar in Jovo. Think of an extended Plugin. It would contain a build-time piece that could add intents and slots to the language model. It would likely contain one or more states to handle the different conversation paths. For responses, it would use content keys and come with a default implementation for i18n. There would be a way to use other CMS options instead. The states, intents, and slots would be namespaced to avoid collisions. Would have to figure out how to override intents in case there was a collision with existing intent utterances.

All this would be packaged in such a way as to allow the developer to npm install the conversation flow, set some configuration, and then call it from an intent handler. Depending on the outcome of flow, some pre-determined handlers would be called (ex: Yes, No, Error).

Would also need to take into consideration config settings and session data.

Want to send a text message via Twilio? Install and call the TwilioSendSmsFlow. Want to get the user’s mobile phone number and if the permission is not enabled then ask for it validating that it is 10-digits and if it is only 7 then ask for the area code? Install and call the GetMobilePhoneFlow.


#9

Yes! Great suggestions! We’re working on something like this internally with the code name :sparkles: Conversational Components :sparkles:. They would contain a language model, logic, and i18n. We’re working on a feature proposal (cc @Kaan_Kilic and @AlexSwe) that we’re hoping to post in this forum soon to be able to discuss it with the community.

Yes. I like it how Twilio Autopilot does it (cc @steve). After a successful walk through an action, you can add a redirect in an on_complete element (example from their docs):

{
	"actions": [
		{
			"collect": {
				"name": "make_reservation",
				"questions": [
					{
						"question": {
							"say": "Great, I can help you with that. What's your first name?"
						},
						"name": "first_name",
						"type": "Twilio.FIRST_NAME"
					},
					{
						"question": {
							"say": "When day would you like your reservation for?"
						},
						"name": "date",
						"type": "Twilio.DATE"
					},

					// More questions...
				],
				"on_complete": {
					"redirect": {
				        "uri": "REPLACE THIS!!!",
				        "method": "POST"
			}
				}
			}
		}
	]
}

#10

Let me know when I can be a beta tester.

Although not specifically called out in my previous post, Conversational Components would include validation.

What Conversational Components will be the first that you release?

  • Zip/Postal code
  • Phone number
  • Send text
  • Connect call between user’s mobile phone and customer support
  • Survey

Feature Proposal: Conversational Components
#11

Agree. We’re working on this in a separate feature proposal with @rubenaeg :+1:

Those all make a lot of sense. We already have the “phone number” one built out, so this will likely be the first one we’ll use for testing and demonstrating how conversational components could look like.


#12

Hi Mark, here is the feature proposal about Input Validation. Would love to get your thoughts on this. Thank you!