I Love My Watson Chatbot - How Do I Make Changes To It?

Photo by Chris Lawton on Unsplash

This has also been published at Medium.com.

Note: This article was updated on April 29, to add links for a Leigh Williamson blog post that I found informative.

In an earlier blog post, IBM Watson Fueled Chatbots — For Our Health, I talk about the costs and benefits of a chatbot, and show you how a chatbot is within the reach of most organizations. I then branch out and discuss how to make updates to your chatbot, in my post called, I Love My Watson Chatbot — How Do I Update It? Now we look at doing some continuous testing, and how to do some basic version control and deploy new chatbot updates.

How Are We Doing?

In order to see how well your chatbot is performing, you need to put some automated testing of your chatbot in place. This will help you get some objective measures of chatbot accuracy and performance, and will also give you some insight into areas where your chatbot could stand to improve. Keep in mind that testing cognitive applications is fundamentally different from testing more traditional applications. Traditional applications relied on path testing, and code coverage. Due to the non-deterministic nature of cognitive applications, we end up having to use statistical models for our testing.

I STRONGLY suggest using the Python notebooks listed below. These can be run on your own machine, or from within Watson Studio. They are a great starting point for your automated analysis of your chatbot. Over time you can improve these notebooks and expand on some of the testing techniques in them.

  • Dialog Skill Analysis Notebook — a notebook that will do an analysis of your Dialog skill.
  • Log Analysis Workbooks — these two notebooks focus on slightly different areas, and do an analysis of your chatbot based on your logs.
  • My K-Fold Notebook — not as full-featured as the two notebooks above, but I like mine because it’s simple. For a new Python user, it also has a straightforward approach to k-fold testing that is easier to understand.

Getting the feeling that you need to know a little bit of Python? So am I. You don’t have to be a Python expert to do any of this, but you should at least understand the basics of Python, and how to use and manipulate Python notebooks. Don’t get intimidated — you don’t need to know EVERYTHING before you start — just jump in and learn things as you go.

Making Changes Without Impacting Production

Some change management scenarios are enabled by using Watson Assistant’s new versioning capability. This versioning is a bit different than some versioning that you might be used to — it’s not like a typical software development check out, check in paradigm. Instead, it is more like a baseline establishment paradigm, where a version (or baseline) is created as a snapshot of the model at some point in time. The current state of the model (which you see in the UI) is always the tip, or development version.

So if you go into your Watson Assistant Plus instance, and go into your skill, you can just click on “Versions” in the left-hand nav bar, and you will see the versions screen.

Versions Screen in Watson Assistant

In order to drop a new version (or baseline), just click on the “Save a new version” link in the upper right corner. You should then fill in something useful (think about naming conventions!) for the description, before pressing “Save”. We will talk more about naming conventions below, when we start to talk about some typical change management workflows.

Typical Change Management Workflows

A typical change management workflow will allow you to easily scope, contain, and then deploy changes to your chatbot. What I am proposing here is something that might not satisfy a “DevOps Purist”, since we have production and dev/test resources residing in the same service. 

Keep in mind that this approach CAN easily be modified so that instead of moving the versions around within a Watson Assistant instance, we could export the model from one Watson Assistant environment (say the test environment), and then import it into another environment (say the production environment). These environments could be located in different resource groups, with individualized access and permissions tailored for each environment/resource group.

So without further explanation, here is a high-level view of a typical lightweight change management process for your Watson Covid-19 chatbot:

  • Get suggested changes for the next cycle. Save these in some agreed upon format/tool with changes for your Covid-19 bot.
  • Agree on changes and scope of changes for the next cycle. For those of you who are familiar with Agile development principles, think of this as sprint planning….
  • Apply these changes in the tip (development) version of your Covid-19 bot
  • Test development Covid-19 bot version — validate your changes — make final revisions
  • Run automated quality tests against development bot version (yes, I wasn’t kidding about automated tests earlier, do this now — and you will thank me later)
  • Create a new version in the Covid-19 bot — call it “<YYYY-MM-DD> Production”
  • Get formal sign off for promotion of new version to production.
  • Move the Assistant pointer for the production Covid-19 bot to the new version (“<YY-MM-DD> Production”)
  • Backup the Covid-19 bot, with an export of the dialog skill to JSON — store the JSON file with name “<YYYY-MM-DD>_Production.json”
  • Move the Assistant pointer for the development Covid-19 bot to the tip version
  • Go back to step 1 and do it again……

Note that I have been non-specific about tickets/issues/tools to be using. I just trust that you are using some sort of change control — no matter how simple. You want to get your stakeholders in the habit of making change requests and then APPROVING those requests — just so we avoid any “Why did you do that?” situations. You can implement this on the IBM Cloud (a free version of GitHub hosted on the IBM Cloud), where you can do real lightweight change management, or you can use your existing change management tooling and infrastructure. It’s completely up to you.

Change Management for the Advanced User

The simple change management process outlined above will work for simple deployments, and is probably robust enough for a chatbot that is not expected to be deployed on a long term basis. What about change management for more established chatbots? Or for chatbots that are part of a larger system that is currently under some more formal type of change management? What can we do in those cases?

In those cases, and for users wishing for a more robust and formal change management or DevOps infrastructure,  you will want to read A comparison of DevOps choices for IBM Watson Assistant by Leigh Williamson. Leigh is a peer of mine who has spent some time thinking about this, and more importantly, he has spent some time actually implementing these types of change management systems.

What’s Next?

So if you have been following this series of blog posts, you have deployed a longtail chatbot that answers Covid-19 questions. You have learned how to update that chatbot, and how to test and version control your changes. What’s left to do? At this point you know enough to be dangerous — it might be time to fill in some of those learning gaps. My blog posts tell you just what you need to know, but there are so many other Watson techniques and capabilities that you can use to make your end-user experience even better. Maybe you should check out some of the Watson Covid-19 specific learning resources that are out there.

Conversational Assistants and Quality with Watson Assistant – Revisited

By Daniel Toczala

Originally posted on Medium on February 11, 2020 at https://medium.com/@dtoczala/conversational-assistants-and-quality-with-watson-assistant-revisited-123fb3bb9f1f.

Note: I updated the original Conversational Assistants and Quality blog post in February 2020 to add a link to a much better testing notebook that I discovered, and to do a slight rewrite of that section. This blog post is a complete update to that original post – and it restates a lot of what I highlighted in the original post. The BIG difference is the new Python testing notebook – which is located out on GitHub, as CSM-Bot-Kfold-Test.

In early February of 2020 I was informed of this great blog post and Python notebook, on How to Design the Training Data for an AI Assistant. I REALLY liked this Python notebook MUCH better than my original k-fold notebook (from August of 2019). The other nice thing is that you can discover this Python notebook in the catalog in Watson Studio, and just apply it and have it added to your Watson Studio project. The only big difference with this notebook is that you need to have your testing data in a separate CSV file – it doesn’t break up “folds” based on your training data. It didn’t even do folds – just straight training and testing data.

I wasn’t a big fan of that approach, I liked my basic approach of pointing at only a Watson Assistant instance, and using all of the training data in a series of k-fold tests. Nobody wants to manage this data, that data, this file, that file….. it’s an opportunity to screw things up. Most of my customers are NOT AI experts, they just want a suite of tools that they can point at their chatbot engine that will allow them to do some automated testing of their chatbot. I have also noticed that many will use ALL of their training data, and not hold back some as test data. Doing k-fold testing using all of the training data in an existing Watson Assistant instance addresses this.

However, I really liked some of the analysis that they had done of the training data, and some of the other insights that they provided. So I decided to dive in and spend a little time merging the best of both of these approaches together. First, let’s start with some basic “rules” that you should be following if you are developing a chatbot.

Getting Started with Your Conversational Assistant

Back in July of 2019, I was working with a group of like-minded people inside of IBM, and we decided to create an IBM internal chatbot that would capture a lot of the “institutional knowledge” that some of our more experienced members knew, but that didn’t seem to be captured anywhere. We wanted our newer team members to be as effective as our more seasoned members. 

We spent a week or two coming to a common vision for our chatbot.  We also mapped out a “growth path” for our chatbot, and we agreed on our roles.  I cannot begin to stress how important this is – Best Practice #1 – Know the scope and growth path for your chatbot.  We had a good roadmap for the growth of our chatbot.  We mapped out the scope for a pilot, where we wanted to be to release it to our end users, and a couple of additional capabilities that we wanted to add on once we got it deployed.

My boss graciously agreed to be our business sponsor – his role is to constantly question our work and our approach.  “Is this the most cost-effective way to do this?”, and, “Does that add any value to your chatbot?”, are a couple of the questions he constantly challenges us with.  As a technical guy, it’s important to have someone dragging us back to reality – it’s easy to get focused on the technology and lose sight of the end goal.

Our team of “developers” also got a feel for the roles we would play.  I focused on the overall view and dove deeper on technical issues, some of my co-workers served primarily as testers, some as knowledge experts (SME’s), and others as served as UI specialists, focusing on the flow of conversation.  This helped us coordinate our work, and it turned out to be quite important – Best Practice #2 – Know your roles – have technical people, developers, SME’s, architects, and end users represented.  If you don’t have people in these roles, get them.

Starting Out – Building A Work Pipeline

As we started, we came together and worked in a spreadsheet (!?!), gathering the basic questions that we anticipated our chatbot being able to answer.  We cast a pretty wide net looking for “sample” questions to get us kickstarted.  If you are doing something “new”, you’ll have to come up with these utterances yourself.  If you’re covering something that already exists, there should be logs of end user questions that you can use to jumpstart this phase of your project.

Next, we wanted to make sure that we had an orderly development environment.  Since our chatbot was strictly for internal deployment, we didn’t have to worry too much about the separation of environments, so we could use the versioning capabilities of Watson Assistant.  Since our chatbot was going to be deployed on Slack, we were able to deploy our “development” version on Slack, and also deploy our “test” and “production” versions on Slack as well.  These are all tracked on the Versions tab of the Watson Assistant Skill UI.  This gives us the ability to “promote” tested versions of our skill to different environments.  All of this allowed us to have a stable environment that we could work and test in – which leads us to Best Practice #3 – Have a solid dev/test/prod environment set up for your Conversational assistant or chatbot.

How Are We Doing? – K- Fold Testing

As we started out, we began by pulling things together and seeing how our conversational assistant was doing in real-time, using the “Try It” button in the upper right-hand corner of the Watson Assistant skills screen.  Our results were hit and miss at first, so we knew that we needed a good way to test out our assistant. 

We started out with some code from a Joe Kozhaya blog post on Training and Evaluating Machine Learning Models.  I ended up modifying it a little bit, and posting it on my Watson Landing Page GitHub repo.  We also read some good stuff from Andrew Freed (Testing Strategies for Chatbots) and from Anna Chaney (Data DevOps Rules of Engagement),  and used some of those ideas as well.

In February of 2020 I was informed of this great blog post and Python notebook, on How to Design the Training Data for an AI Assistant. I liked this Python notebook MUCH better than my old K-fold notebook, but I liked my approach better. So I went to work combining the best of both worlds into a new Python notebook. My new Python notebook does this – and provides some great insight into your chatbot. Go and find it on GitHub, where it is stored as CSM-Bot-Kfold-Test.

This highlights our next best practice – Best Practice #4 – Automate Your AI Testing Strategy.

Using Feedback

As we let our automated training process take hold, we noted that our results were not what we had hoped, and that updating things was difficult.  We also learned that taking time each week to review our Watson Assistant logs was time well spent. 

It was quite difficult to add new scope to our conversation agent, so we looked at our intents and entities again.  After some in-depth discussions, we decided to try a slightly different focus on what we considered intents.  It allowed us to make better use of the entities that we detected, and it gave us the ability to construct a more easily maintained dialog tree.  We needed to change the way that we were thinking about intents and entities.

All of this brings us to our next piece of wisdom – Best Practice #5 – Be Open-Minded About Your Intents and Entities.  All too often I see teams fall into one of either two traps. 

  • Trap 1 – they try to tailor their intents to the answers that they want to give.  If you find yourself with intents like, “how_to_change_password” and “how_to_change_username”, then you might be describing answers, and not necessarily describing intents. 
  • Trap 2 – teams try to have very focused intents.  This leads in an explosion of intents, and a subsequent explosion of dialog nodes.  If you find yourself with intents like, “change_password_mobile”, “change_password_web”, “change_password_voice”, then you have probably fallen into this trap.

We found that by having more general intents, and then using context variables and entities to specify things with more detail, that we have been able to keep our intents relatively well managed, our dialog trees smaller and better organized, and our entire project is much easier to maintain.  So, if our intent was “find_person”, then we will use context variables and entities to determine what products and roles the person should have.  Someone asking, “How do I find the program manager for Watson Assistant?”, would return an intent of “find_person”, with entities detected for “program manager” and “Watson Assistant”.  In this way, we can add additional scope without adding intents, but only by adding some entities and one dialog node. 

Why K-Fold Isn’t Enough

One thing that we realized early on was that our k-fold results were just one aspect of the “quality” of our conversational assistant.  They helped quantify how well we were able to identify user intents, but they didn’t do a lot for our detection of entities or the overall quality of our assistant.  We found that our k-fold testing told us when we needed to provide additional training examples for our classifier, and this feedback worked well.

We also found that the “quality” of our assistant improved when we gave it some personality.  We provided some random humorous responses to intents around the origin of the assistant, or more general questions like, “How are you doing today?”.  The more of a personality that we injected into our assistant, the more authentic and “smooth” our interactions with it began to feel.  This leads us to Best Practice #6 – Inject Some Personality Into Your Assistant

Some materials from IBM will break this down into greater detail, insisting that you pay attention to tone, personality, chit-chat and proactivity.  I like to keep it simple – it’s all part of the personality that your solution has.  I usually think of a “person” that my solution is – say a 32-year old male from Detroit, who went to college at Michigan, who loves sports and muscle cars, named Bob.  Or maybe a 24-year-old recent college graduate named Cindy who grew up in a small town in Ohio, who has dreams of becoming an entrepreneur in the health care industry someday.  This helps me be consistent with the personality of my solution.

We also noticed that we often needed to rework our Dialog tree and the responses that we were specifying.  We used the Analytics tab in the skill we were developing.  On that Analytics tab, we would often review individual user conversations and see how our skill was handling user interactions.  This led us to make changes to the wording that we used, as well as to the things we were looking for (in terms of entities) and what we were storing (in terms of conversation context).  Very small changes can result in a big change in the end-user perception.  Something as simple as using contractions (like “it’s” instead of “it is”), will result in a more informal conversation style.

The Analytics tab in Watson Assistant is interesting.  It provides a wealth of information that you can download and analyze.  Our effort was small, so we didn’t automate this analysis, but many teams DO automate the collection and analysis of Watson Assistant logs.  In our case, we just spent some time each week reviewing the logs and looking for “holes” in our assistant (questions and topics that our users needed answers for that we did not address), and trends in our data.  It has helped guide our evolution of this solution.

Summary

This blog post identifies some best practices for developing a chatbot with IBM Watson Assistant – but these apply to ANY chatbot development, regardless of technology.

  • Best Practice #1 – Know the scope and growth path for your chatbot
  • Best Practice #2 – Know your roles – have technical people, developers, SME’s, architects, and end users represented
  • Best Practice #3 – Have a solid dev/test/prod environment set up for your Conversational assistant or chatbot
  • Best Practice #4 – Automate Your AI Testing Strategy
  • Best Practice #5 – Be Open Minded About Your Intents and Entities
  • Best Practice #6 – Inject Some Personality Into Your Assistant

Now that you have the benefit of some experience in the development of a conversational assistant, take some time to dig in and begin building a solution that will make your life easier and more productive.

Conversational Assistants and Quality with Watson Assistant

By Daniel Toczala

Note: I updated this blog post in February 2020 to add a link to a much better testing notebook that I discovered, and to do a slight rewrite of that section.

Recently my team inside of IBM decided that we needed to capture some of the “institutional knowledge” that some of our more experienced members knew, but that didn’t seem to be captured anywhere so our newer team members could be as effective as our more seasoned members.  We also wanted an excuse to get our hands dirty with our own technology, some of us had been focused on some of the other Watson technologies and needed to get reintroduced to the Watson Assistant since the migration to the “skills based” UI.

I went looking for something good about developing a chatbot (or conversational assistant), with Watson Assistant, and some good lessons learned.  I found some good information, but not one spot with the kind of experience and tips that I was looking for.  So I thought it might be good to capture it in my own blog post.

Getting Started with Your Conversational Assistant

We spent a week or two coming to a common vision for our chatbot.  We also mapped out a “growth path” for our chatbot, and we agreed on our roles.  I cannot begin to stress how important this is – Best Practice #1 – Know the scope and growth path for your chatbot.  We had a good roadmap for the growth of our chatbot.  We mapped out the scope for a pilot, where we wanted to be to release it to our end users, and a couple of additional capabilities that we wanted to add on once we got it deployed.

My boss graciously agreed to be our business sponsor – his role is to constantly question our work and our approach.  “Is this the most cost-effective way to do this?”, and, “Does that add any value to your chatbot?”, are a couple of the questions he constantly challenges us with.  As a technical guy, it’s important to have someone dragging us back to reality – it’s easy to get focused on the technology and lose sight of the end goal.

Our team of “developers” also got a feel for the roles we would play.  I focused on the overall view and dove deeper on technical issues, some of my co-workers served primarily as testers, some as knowledge experts (SME’s), and others as served as UI specialists, focusing on the flow of conversation.  This helped us coordinate our work, and it turned out to be quite important – Best Practice #2 – Know your roles – have technical people, developers, SME’s, architects, and end users represented.  If you don’t have people in these roles, get them.

Starting Out – Building A Work Pipeline

As we started, we came together and worked in a spreadsheet (!?!), gathering the basic questions that we anticipated our chatbot being able to answer.  We cast a pretty wide net looking for “sample” questions to get us kickstarted.  If you are doing something “new”, you’ll have to come up with these utterances yourself.  If you’re covering something that already exists, there should be logs of end user questions that you can use to jumpstart this phase of your project.

Next, we wanted to make sure that we had an orderly development environment.  Since our chatbot was strictly for internal deployment, we didn’t have to worry too much about the separation of environments, so we could use the versioning capabilities of Watson Assistant.  Since our chatbot was going to be deployed on Slack, we were able to deploy our “development” version on Slack, and also deploy our “test” and “production” versions on Slack as well.  These are all tracked on the Versions tab of the Watson Assistant Skill UI.  This gives us the ability to “promote” tested versions of our skill to different environments.  All of this allowed us to have a stable environment that we could work and test in – which leads us to Best Practice #3 – Have a solid dev/test/prod environment set up for your Conversational assistant or chatbot.

How Are We Doing? – K- Fold Testing

As we started out, we began by pulling things together and seeing how our conversational assistant was doing in real-time, using the “Try It” button in the upper right-hand corner of the Watson Assistant skills screen.  Our results were hit and miss at first, so we knew that we needed a good way to test out our assistant. 

We started out with some code from a Joe Kozhaya blog post on Training and Evaluating Machine Learning Models.  I ended up modifying it a little bit, and you can find the modified notebook in my Watson Landing Page GitHub repo, under notebooks, in a Python notebook stored as ANYBOT_Test-and-Deploy.ipynb.  We also read some good stuff from Andrew Freed (Testing Strategies for Chatbots) and from Anna Chaney (Data DevOps Rules of Engagement),  and used some of those ideas as well.  This led me to create that modified Python notebook, which I used to provide automated k-fold testing of our assistant implementation. 

In February of 2020 I was informed of this great blog post and Python notebook, on How to Design the Training Data for an AI Assistant. I really like this Python notebook MUCH better than my own K-fold notebook mentioned above. The other nice thing is that you can discover this Python notebook in the catalog in Watson Studio, and just apply it and have it added to your Watson Studio project. The only big difference with this notebook is that you need to have your testing data in a separate CSV file – it doesn’t break up “folds” based on your training data. This highlights Best Practice #4 – Automate Your AI Testing Strategy.

After all of this was in place, our team fell into a predictable rhythm of work and review of our work.  Since this was a side project for all of us, some of us contributed some weeks, and didn’t contribute on other weeks.  We were a small team of people (less than 10), so it was easy to have our team manage itself.

Using Feedback

As we let our automated training process take hold, we noted that our results were not what we had hoped, and that updating things was difficult.  We also learned that taking time each week to review our Watson Assistant logs was time well spent. 

It was quite difficult to add new scope to our conversation agent, so we looked at our intents and entities again.  After some in-depth discussions, we decided to try a slightly different focus on what we considered intents.  It allowed us to make better use of the entities that we detected, and it gave us the ability to construct a more easily maintained dialog tree.  We needed to change the way that we were thinking about intents and entities.

All of this brings us to our next piece of wisdom – Best Practice #5 – Be Open-Minded About Your Intents and Entities.  All too often I see teams fall into one of either two traps.  Trap 1 – they try to tailor their intents to the answers that they want to give.  If you find yourself with intents like, “how_to_change_password” and “how_to_change_username”, then you might be describing answers, and not necessarily describing intents.  Trap 2 – teams try to have very focused intents.  This leads in an explosion of intents, and a subsequent explosion of dialog nodes.  If you find yourself with intents like, “change_password_mobile”, “change_password_web”, “change_password_voice”, then you have probably fallen into this trap.

We found that by having more general intents, and then using context variables and entities to specify things with more detail, that we have been able to keep our intents relatively well managed, our dialog trees smaller and better organized, and our entire project is much easier to maintain.  So, if our intent was “find_person”, then we will use context variables and entities to determine what products and roles the person should have.  Someone asking, “How do I find the program manager for Watson Assistant?”, would return an intent of “find_person”, with entities detected for “program manager” and “Watson Assistant”.  In this way, we can add additional scope without adding intents, but only by adding some entities and one dialog node. 

Why K-Fold Isn’t Enough

One thing that we realized early on was that our k-fold results were just one aspect of the “quality” of our conversational assistant.  They helped quantify how well we were able to identify user intents, but they didn’t do a lot for our detection of entities or the overall quality of our assistant.  We found that our k-fold testing told us when we needed to provide additional training examples for our classifier, and this feedback worked well.

We also found that the “quality” of our assistant improved when we gave it some personality.  We provided some random humorous responses to intents around the origin of the assistant, or more general questions like, “How are you doing today?”.  The more of a personality that we injected into our assistant, the more authentic and “smooth” our interactions with it began to feel.  This leads us to Best Practice #6 – Inject Some Personality Into Your Assistant

Some materials from IBM will break this down into greater detail, insisting that you pay attention to tone, personality, chit-chat and proactivity.  I like to keep it simple – it’s all part of the personality that your solution has.  I usually think of a “person” that my solution is – say a 32-year old male from Detroit, who went to college at Michigan, who loves sports and muscle cars, named Bob.  Or maybe a 24-year-old recent college graduate named Cindy who grew up in a small town in Ohio, who has dreams of becoming an entrepreneur in the health care industry someday.  This helps me be consistent with the personality of my solution.

We also noticed that we often needed to rework our Dialog tree and the responses that we were specifying.  We used the Analytics tab in the skill we were developing.  On that Analytics tab, we would often review individual user conversations and see how our skill was handling user interactions.  This led us to make changes to the wording that we used, as well as to the things we were looking for (in terms of entities) and what we were storing (in terms of conversation context).  Very small changes can result in a big change in the end-user perception.  Something as simple as using contractions (like “it’s” instead of “it is”), will result in a more informal conversation style.

The Analytics tab in Watson Assistant is interesting.  It provides a wealth of information that you can download and analyze.  Our effort was small, so we didn’t automate this analysis, but many teams DO automate the collection and analysis of Watson Assistant logs.  In our case, we just spent some time each week reviewing the logs and looking for “holes” in our assistant (questions and topics that our users needed answers for that we did not address), and trends in our data.  It has helped guide our evolution of this solution.

Summary

This blog post identifies some best practices for developing a chatbot with IBM Watson Assistant – but these apply to ANY chatbot development, regardless of technology.

  • Best Practice #1 – Know the scope and growth path for your chatbot
  • Best Practice #2 – Know your roles – have technical people, developers, SME’s, architects, and end users represented
  • Best Practice #3 – Have a solid dev/test/prod environment set up for your Conversational assistant or chatbot
  • Best Practice #4 – Automate Your AI Testing Strategy
  • Best Practice #5 – Be Open Minded About Your Intents and Entities
  • Best Practice #6 – Inject Some Personality Into Your Assistant

Now that you have the benefit of some experience in the development of a conversational assistant, take some time to dig in and begin building a solution that will make your life easier and more productive.

AI Changes Everything (but nothing really changes…)

I usually don’t write big long opinion rants, and usually, I stick to more technical focused content. However, recent events are beginning to make my head explode. There is so much misunderstanding about data, privacy, and artificial intelligence. So I decided to write something to help one of the younger members of my family make sense of everything. So please read this – and pass this along to your friends who are not “tech people” if you think it will help them understand and make sense of recent events.

Note: The attempt to make this like the Ryan Tomayko REST article is deliberate. I LOVED that article – it clarified the concept of REST for me and countless others a long time ago. I hope this article is half as effective as that one was. Ryan has since taken the original version of his article down because it was deemed sexist…

The Conversation

Young Person: What is with all of the recent media panic about FaceApp and what happens to your data? Is this just like the Facebook attention focused on user privacy?

Tox : Are you sure that you want to get into this? It gets kind of weird.

Young Person: Sure. How bad could it be? You work with Artificial Intelligence and that AI stuff don’t you? Can’t you just have Watson explain it to me?

Tox : Artificial Intelligence, or AI, isn’t really like that. I can’t just ask Watson to explain it to you. AI isn’t magic, it’s just different from how we traditionally programmed computers and created applications.

Young Person: What is an application?

Tox : An app – those things on your phone. Your phone hasn’t always been smart – phones were pretty dumb for the longest time.

Young Person: I know that, I think I saw a picture of one once. But why is AI different?

Tox : Previously, computer software and applications (or apps) operated as a set of rules. Picture them as big diagrams where you check some data (like age, name, location, etc.), and then do something based on the value of that data. So an auto loan calculator app would take in all of the various things needed to calculate your loan payment (length of the loan, downpayment, interest rate, etc.), and would “do the math” based on your individual situation, and return the answer to you. Most apps are bigger and more complex than my example, but they all work the same way every time. If I enter the same data each time, I can predict what the answers coming back will be. This is because traditional computer apps are essentially “rules engines” – they follow the same rules, every time, no matter what.

Young Person: So software developers are just writing “rules” all day long?

Tox : Yes – at some level, that is what a programming language does. It allows you to write rules about how to calculate things, and how to show things to the human users.

Young Person: So how is AI different?

Tox : Artificial Intelligence works differently. It is primarily based on statistical models. It looks at things and creates statistical models so it can try to “predict”, with some level of confidence, what the next steps should be. So in the case of facial recognition, it takes the pictures that you give to it, and it is able to “look” at the picture and say, “I am 90% confident this is an eye, I am 88% confident that this is hair, I am 93% confident that this is a mouth”. Doing this repeatedly allows it to go through a progression of steps, first it will isolate a person from a picture, then it will isolate the face, then it will isolate individual features in the face, and then it will compare those to specific individuals and may be able to “identify” an individual person.

Young Person: So that’s what those “auto tags” in Facebook do? They just apply statistics to my pictures? That’s AI?

Tox : It’s one aspect of AI – visual recognition and facial recognition.

Young Person: So why do people get so worked up about it?

Tox : Because I can use it to keep tabs on people – to make them do what I want them to. To take away their freedom. Have you heard about the Chinese facial recognition used to help enforce their laws?

Young Person: We talked about it in school, but that kind of stuff could never happen here in Texas. Right?

Tox : We could get into a whole political discussion about that – let’s just stick to the stuff I know. The key thing here is that Artificial Intelligence is not “magic” – it isn’t a small computer brain that we have in a lab somewhere bathed in a solution with electrodes attached. It’s just looking at things from a statistical perspective, instead of the more traditional rules-based perspective.

Young Person: OK – so now I know that AI is different. But why does this stuff matter – statistics vs. rules? It might be important to tech types like you and Uncle Marc, but why does it matter to me?

Tox : Since AI is statistically based, it needs to have a lot of data. Statistics has some interesting things, it’s not as bad as it can seem in school. Did you know that the more samples of something that you have, the more accurate your predictions about future behavior can be. This is called the Law of Large Numbers, and it is a CORE driving concept for AI. Having a large amount of data is essential for this to work properly, so AI solutions require a lot of data – a lot of samples – that help them “learn” about something.

Young Person: OK – so how does that work?

Tox : The best illustration of this might be from the world of sports, using baseball batting averages. Or shooting percentages from other sports. As players perform, a history of performance is built up. Early in the year, a player may do very well, and have a high average (or score of success). Later in the year, they may not do as well, and their average (or score of success) will drop. (For those of you who want to learn more, this is called regression to the mean – another interesting statistical concept).

Looking at these scores later in the year will give a coach (or a casual sports fan) a way to compare the probability of success for two different players. Now here is the key point – it will not predict WHO will be successful, but it will tell you who has a higher probability of success. This is what AI does – it uses statistical models to predict future behavior or make sense of current data.

Young Person: So I could create an AI thing that would predict which stocks will do great in the next year – and then make millions. Right?

Tox : Not really. Remember, AI only gives you the probability of success – it doesn’t predict the future. It tells you what the chances of something happening are. If something happens one time every hundred times, I can tell you that it probably won’t happen – and I would be right 99% of the time.

Young Person: But a lot of AI doesn’t predict anything. It changes things. Like some of the deepfakes that I saw. Some of those were really funny….

Tox : These videos are modified using those statistical models. They get enough examples of one persons face, and they “predict” what it would look like on a different person. What gets done is different, and more complex, but the concept is the same.

At this point we took a break to grab some food off of the grill, play with my dogs, and relax a little bit. Then we picked our conversation back up….

Young Person: OK, so now I understand why AI is different. It’s all statistics, and it’s not magic. Why is this becoming such a big deal now?

Tox : As time and technology progress, the computing power and availability of computing resources allow us to tackle larger and larger problems. Large amounts of data can now be easily collected, stored and processed. You can do this on a cloud platform (like the IBM Cloud) for relatively little money. An individual who has the correct background and knowledge can now build and launch an AI application serving hundreds or even thousands of users for around $100 a month.

Young Person: So you mean that I could be doing deepfake videos on the cloud? Or other stuff?

Tox : Sure. You and anyone else you can think of. The issue is now “What is useful for people?” – and how can I provide some capability that people are willing to pay me for? Right now, most of the business models in this space tend to follow the same path that broadcasting took in the last century. Broadcasters provided service for free, but charged advertisers for the ability to expose a large audience to a particular marketing or advertising message. people couldn’t DVR stuff and blast past the commercials. Today, internet companies use your personal information to allow advertisers to “target” particular populations of people – based on income, interest, location and a variety of other things.

Young Person: So I could be like one of those people that are a billionaire before they are 30?

Tox : Why not? But you need to figure out HOW? Entrepreneurs begin to look at where they can make money – and make an impact on the world. Online companies are attractive because they do not require large amounts of capital (I don’t need a factory, and thousands of workers, just a team of 15 software developers and some laptop computers). They can try to build user groups who will pay for a service with some sort of monthly “subscription” (like Netflix). This model means influencing a LOT of people, and getting a small amount of money from each. The other approach is to focus your attention – get a few BIG users, who will spend a lot of money. This leads to the advertising business models where large corporations are the ultimate target customer (like Facebook) – and the consumer is given something for “free”.

Young Person: Facebook is mostly for older people like you. I spend most of my time in Snapchat or Instagram.

Tox : Those guys use the same type of business model – they provide an audience and information about that audience to advertisers, and the consumer gets something for “free”.

Young Person: What’s wrong with getting something for free. Why should I pay to get these things if I don’t have to?

Tox : When I was younger, there was a saying that my Dad had, and it still holds true today. There’s no such thing as a free lunch. You may not be paying for these services and apps with money, but you ARE paying for them.

Young Person: What’s are you talking about? I’m not paying anyone any money. If they give me ads I don’t have to buy any of the stuff getting advertised. So how am I “paying for it”?

Tox : You may not be paying for these services and apps with money, but you ARE paying for them. You are paying for them with a loss of privacy. People in older generations, and people from areas of the world with repressive governments, know what living in a “police state” or “surveillance state” means for people. That brings us back to politics again – and we said that we weren’t going to get into that.

Young Person: So I guess we pay for everything, just sometimes we pay with things other than money.

Tox : Now you’re getting it. So all of this new technology – AI, minaturization, 5G, and whatever else you can think of – has changed the perception of things in the world. It feels like we are able to do things for “free” – but the reality is that we are paying with other currency. Sometimes it is access to our private data. Sometimes it is access to our movements. Sometimes it is limits on what we can do.

Watson Text to Speech – The Costs of Personalization

I tend to write these blog posts to share interesting things that I have learned when working with our customers.  Just this past week I have had 2 or 3 blog worthy events happen, so I hope to be publishing these posts at a brisker pace in the coming months.

This week I had a customer that is using the Watson Text to Speech service.  They are using it to do short utterances, things like street names, addresses, and city names.  The utterances are relatively short.  They told me that they had no idea how they were being charged for the service.

This particular customer has a focus on producing a positive customer experience.  No tinny, mechanical voice for this customer!!  They are tweaking the speaking voice and customizing it, using SSML (Speech Synthesis Markup Language) to modify and “humanize” the synthesized speech from the Watson Text-to-Speech (TTS) service.  You have the ability to modify things like the emotion used in the speech generated (called expressive SSML), to more basic things like the pitch and glottal tension (and yes, I had to look up the definition of glottal tension).  The typical curl call that they use looked similar to this:

curl -X POST -u apikey:*****************************--header "Content-Type: application/json" --header "Accept: audio/wav" --data "{\"text\":\"<speak><voice-transformation type=\\\"Custom\\\" breathiness=\\\"35%\\\" pitch=\\\"-80%\\\" pitch_range=\\\"60%\\\" glottal_tension=\\\"-40%\\\" >$text</voice-transformation></speak>\"}" --output $finalFile "
https://stream.watsonplatform.net/text-to-speech/api/v1/synthesize?voice=en-US_MichaelVoice"

So this curl command will ask for some text (referenced by the $text parameter) that will have breathiness set to 35%, pitch at -80%, the pitch range set to 60%, with a glottal tension of -40%.  I’m sure that someone played with these values, before settling on this combination.  It’s a great way to customize the sound and the tone of your automated speaking responses. 

How Does This Impact Cost?

 The cost of doing something like this will vary, and this is where I learned how some small changes can have a HUGE impact on the costs associated with your Watson solution.  The basic price for using the Watson TTS service is $0.02 per thousand characters.  There are some interesting things to keep in mind here.  Whitespace is NOT counted, so only count the non-whitespace characters.  Also, remember that the voice customizations and everything between the “<speak>” and the “</speak> ” are included in this count.

Now let’s assume that the text being converted was a home address, something like, “9 Marine Drive, Round Rock, Texas 78681”.  Let’s also assume that the user is being referred to by name.  There will also be some other text (a meter reading, a service interruption, etc.) as well, informing the end user about something about to happen near their home address.  We want to figure out the monthly costs for something like this if we estimate that we’ll build and issue 100,000 of these notices in a month.  A sample utterance might sound/look like this:

“This message is for Dan Toczala.  We are informing you of a service interruption tomorrow morning at 9 Marine Drive, Round Rock, Texas 78681.  Please call us at 1-800-123-4567 if you have questions.”

Breaking It Down

Your application can look up the customer name and address, and build this entire text string for each individual event, and then submit each one to the Watson Text To Speech service.  Your typical call would look like this:

curl -X POST -u apikey:*****************************--header "Content-Type: application/json" --header "Accept: audio/wav" --data "{\"text\":\"<speak><voice-transformation type=\\\"Custom\\\" breathiness=\\\"35%\\\" pitch=\\\"-80%\\\" pitch_range=\\\"60%\\\" glottal_tension=\\\"-40%\\\" >This message is for Dan Toczala.  We are informing you of a service interruption tomorrow morning at 9 Marine Drive, Round Rock, Texas 78681.  Please call us at 1-800-123-4567 if you have questions.</voice-transformation></speak>\"}" --output $finalFile "
https://stream.watsonplatform.net/text-to-speech/api/v1/synthesize?voice=en-US_MichaelVoice"

For the purposes of this discussion, we’re going to just focus on the “payload”, or the part in the data section of the curl command.  The part that impacts what your costs are.  So this chunk:

<speak><voice-transformation type=\\\"Custom\\\" breathiness=\\\"35%\\\" pitch=\\\"-80%\\\" pitch_range=\\\"60%\\\" glottal_tension=\\\"-40%\\\" >This message is for Dan Toczala.  We are informing you of a service interruption tomorrow morning at 9 Marine Drive, Round Rock, Texas 78681.  Please call us at 1-800-123-4567 if you have questions.</voice-transformation></speak>\

Now in this example, we count ALL non-whitespace characters inside of the quotes.  We have 336 non-whitespace characters.  Multiply that by 100,000 notices in a month, and I get a rate of 33,600,000 characters a month.  Apply the TTS cost of $0.02 per thousand characters, and you get a final monthly cost of $672.

Now let’s see what happens if we change the way that we think about this.  What if we quit customizing so much of the voice?  Then we would end up with something looking like this:

<speak><voice-transformation>This message is for Dan Toczala.  We are informing you of a service interruption tomorrow morning at 9 Marine Drive, Round Rock, Texas 78681.  Please call us at 1-800-123-4567 if you have questions.</voice-transformation></speak>\

So for the non-customized example, we have 225 non-whitespace characters.  Multiply that by 100,000 notices in a month, and I get a rate of 22,500,000 characters a month.  Apply the TTS cost of $0.02 per thousand characters, and you get a final monthly cost of $450.  Customizing my voice could be looked at as a cheap way to have an impact on customer satisfaction (it’s only $222 a month), or a really expensive way to do this (it’s about 49% more expensive than the base translation).  Remember, it all depends on how you want to look at things.  I suggest focusing on your problem and the overall costs of your solution.

Now let’s look at a final example.  In this example, we’ll keep our customized voice, but we’ll try to stop converting the same text over and over again.  What if our message was built in a way that minimized what needed to be converted each time?  What if we converted a basic message once, and the rest of the customized part for each customer?  So we could do this for each customer:

<speak><voice-transformation type=\\\"Custom\\\" breathiness=\\\"35%\\\" pitch=\\\"-80%\\\" pitch_range=\\\"60%\\\" glottal_tension=\\\"-40%\\\" >This message is for Dan Toczala, who resides at 9 Marine Drive, Round Rock, Texas 78681</voice-transformation></speak>\

And then follow that with this “standard” section which we would only need to convert once (for a one time cost of fractions of a cent):

<speak><voice-transformation type=\\\"Custom\\\" breathiness=\\\"35%\\\" pitch=\\\"-80%\\\" pitch_range=\\\"60%\\\" glottal_tension=\\\"-40%\\\" >We are informing you of a service interruption tomorrow morning. Please call us at 1-800-123-4567 if you have any questions.</voice-transformation></speak>\

So for the modified script example, we have 244 non-whitespace characters.  Multiply that by 100,000 notices in a month, and I get a rate of 24,400,000 characters a month.  Apply the TTS cost and you get a final monthly cost of $488.

Final Conclusions

So let’s look at all of these options together:

ApproachCharacters /
Msg.
Characters /
Month
Monthly
Cost
% Change
Basic
22522,500,000$4500%
Full
Customization
33633,600,000$67249.3%
Modified
Customization
24424,400,000$4888.4%

Looking at things in this way helped us make a rational decision on what things really cost, and helped us look at ways we could maximize our impact and minimize our costs.

P.S.  For those of you who were patient enough to read through this entire article, you can save yourself even more by removing the <speak> and </speak> tags.  These are assumed by the Watson Text To Speech service, so you can omit using them and save yourself 15 characters per message.  For the purposes of this example, that would reduce the monthly cost of each of the above approaches by $30 a month.