Voice apps for education



It’s still early days in voice (or “the Wild West”, as Jon Myers phrased it) and a lot of use cases are just being explored and developed. One such use case, or rather field, is education, or ‘the sleeping giant’ as I heard it being called at Amazon.

So, I hope I can trigger a bit of discussion here around how voice technology can be used to educate, and neighboring topics such as:

  • What business models are possible around education and voice?
  • Which technological capabilities are still missing?
  • Which fields of education can benefit in particular from voice technology?
  • Which existing use cases of conversational interfaces can inspire voice apps?

Two things that come to my mind are

  1. Voice interfaces for knowledge bases
    For example, there’s a ‘Simpleclub’ Skill in the German Alexa Skill store that lets users access a rich collection of educational videos on a wealth of topics especially in STEM courses.
  2. Educational forums
    Such a Skill would allow users to ask questions, provide answers and probably connect to other learners and/or subject matter experts. I’m not aware of a successful example of such a Skill, but my friend Dominik Bleilevens has developed a prototype of such a Skill, named ‘Sir Albert’.
  3. Conversational language learning
    This idea is very intuitive to me: Use a voice assistant to train oral communication, both in expression and comprehension. I’ve heard about this with conversational chatbots (or ‘socialbot’ as they are sometimes called) like Mitsuku, but obviously this is more about written communication. For oral comprehension you could imagine hearing a text in a voice app and then answering questions about it (‘Tricky Genie’ by Amy Stapleton), but it’s dfficult to image how it could work for expression, especially on an intent-based language model like with DialogFlow and Alexa.

So… Looking forward to hear your thoughts on this!


I like all of those areas. I always thought that something like an AnkiApp for voice (where everyone can create and share flashcards) could be extremely valuable. We worked on something like this internally but the results weren’t great because it was too reliant on free-form input.

However, interesting that Amazon is going into that diection as well with free Flashcard Blueprints that anyone can create for themselves. Big use case is Alexa in the car in my opinion.


Thanks @Florian for your mention!

Quick remark about Sir Albert:
Sir Albert was a prototype, probably it won’t work anymore as we used some Watson interpretation and databases we haven’t looked after for a while. The challenge with Sir Albert was the same with any community in which you need two sides: The person helping and the person who needs help. As it was just a side project, we decided to stop working on it – for the moment.
Maybe I will pick it up again, I don’t know yet. If anyone is interested in, send me a message!

Regarding your other ideas:

  1. I think knowledge bases for Alexa (if developed by 3rd party developers) have to be really specific as this is a field Amazon and especially Google are proficient and are really keen to improve in general. So if you don’t have a super specific topic I think the voice assistants will have anyways an answer for it.
  2. Language learning is also a super interesting topic. A skill for oral comprehension is rather easy I think, when it comes to expression I think it’s really tough. It’s hard to create “wrong” intents so you can recognize if a person is saying/pronouncing something wrong. Furthermore if you think about getting the input from a human who decides if the pronunciation is correct, is also not possible as you can not retrieve the raw audio. I think the only solution here is to move to an app (maybe in combination with a comprehension Alexa skill), but a voice skill only for correct expression is really hard in my opinion.

@jan I like the flashcard example! I didn’t know they are offering sth like that in their blueprints!


I agree. Just had a conversation about this with someone yesterday. The more structured information is already available, the more difficult it gets for third-party apps to compete with native features. Maybe it’s interesting to think about adding a “voice layer” to information that is freely available but not structured enough for native features to parse?


Yes, I think it has to be something special so it actually delivers more value than information “just” read out by the assistant. Actually this was also one of the USPs of Sir Albert – having actual audio files spoken by experts in a specific field. It’s information/media which is currently not available in any way.
Do you have more ideas/thoughts about “not structured enough (…) to parse”?


Might not be 100% fit with the topic education, but I just found this: HelixAI Turns Alexa Into a Science Lab Assistant

So instead of asking Gary to fetch that dusty book on the shelf or read an online search aloud, scientists can simply say, “Alexa, ask Helix to help me with the recipe for 2M hydrochloric acid solution.”

I think the challenge here is not only structure the data in a great way, but also to train the language model in a way so that Alexa and Google Assistant understand difficult, domain-specific terms.


Thanks @Florian for mentioning Tricky Genie. When it comes to education, I really like the idea of creating voice experiences that involve an element of critical thinking (I’ve heard educators sometimes refer to this as "higher level thinking). I don’t know what all the educational “standards” are, but apparently there are a lot of them and training the higher level thinking skills is one of the important standards. The Tricky Genie game is a fairly simple attempt at engaging critical thinking, as it requires the player to figure out which solution is the best one for a given situation–but the decision has to be made before all the choices can be explored. From a technology standpoint, there’s nothing standing in the way of creating more of these critical thinking types of skills. The issue is just that it takes a lot of effort to create all the content. In a “game” such as Tricky Genie, there have to be lots and lots of problem situations to solve, otherwise the game is boring. (Very little, other than the formula, can be re-used from one session to the next). I’ve had numerous ideas for similar “problem solving” games, but the issue is the content creation hurdle.
To add on to the topic of aiding with pronunciation and/or speech pathology training, I’ve actually had serious inquiries about how to make this work. I know the ASR is not perfect for detecting flaws with pronunciation, but I think there are workarounds that still make pronunciation training skills possible.


Yes I think if you’re not developing a domain-specific skill I think your skill we be soon useless. Compare e.g. a 3rd party Wikipedia skill with Helix. Reading out information on Wikipedia is rather easy, the recipe for 2M hydrochloric acid solution is really specific.

Agreed. The content creation hurdle is I think esp. challenging for people who do not have a lot of experience with creating this kind of content. Of course I can do the voice concept, but having all the needed content for an awesome experience is a completely different thing.


Thanks a lot for joining the discussion, @Talks2Bots, @Dominik and @jan! :star_struck:

These are some excellent points you’re raising, about how different domains of education each have their own steep challenges to both the content and the interface.

Once concept that I love, and that I see as a potential solution for some of the issues discussed here, is user-generated content (I think @Dominik had something like this in mind for Sir Albert, and I might have gotten some of this idea from his his episode of the gamification podcast :de:). Imagine a platform or even marketplace where learners can ask questions via voice assistant - Like on StackExchange, but entirely in spoken language. Similar requests would be grouped / de-duped and prioritized based on how often variants of it are asked. Users could answer these questions, also via voice assistant. Both questions and answers could be rated, and so you’d arrive at the StackExchange of voice. Optimally people would not only provide short linear answers, but also conversational content (like tests) that requires uses to answer with yes or no.
Of course such a community-based platform would require a critical mass of users, and also quite some quality assurance - As well as an easy way to record content. This aspect could probably be solved using a mobile app similar to Castlingo.


I also wanted to share this YouTube video of this exciting beta project that Adva Levin of PretzelLabs conducted together with Israel’s Edtech innovation center MindCET.

In summary, it uses broad Alexa instead of a dedicated Skill for English Language Leaning, such that school kids are exposed to smart speaker and are then told to perform some tasks and actions with it. Quite neat approach, I think!

The downside is that it requires at least partially trained staff to conduct these kind of projects.