Conversational Assistants and Quality with Watson Assistant

By Daniel Toczala

Recently my team inside of IBM decided that we needed to capture some of the “institutional knowledge” that some of our more experienced members knew, but that didn’t seem to be captured anywhere so our newer team members could be as effective as our more seasoned members.  We also wanted an excuse to get our hands dirty with our own technology, some of us had been focused on some of the other Watson technologies and needed to get reintroduced to the Watson Assistant since the migration to the “skills based” UI.

I went looking for something good about developing a chatbot (or conversational assistant), with Watson Assistant, and some good lessons learned.  I found some good information, but not one spot with the kind of experience and tips that I was looking for.  So I thought it might be good to capture it in my own blog post.

Getting Started with Your Conversational Assistant

We spent a week or two coming to a common vision for our chatbot.  We also mapped out a “growth path” for our chatbot, and we agreed on our roles.  I cannot begin to stress how important this is – Best Practice #1 – Know the scope and growth path for your chatbot.  We had a good roadmap for the growth of our chatbot.  We mapped out the scope for a pilot, where we wanted to be to release it to our end users, and a couple of additional capabilities that we wanted to add on once we got it deployed.

My boss graciously agreed to be our business sponsor – his role is to constantly question our work and our approach.  “Is this the most cost-effective way to do this?”, and, “Does that add any value to your chatbot?”, are a couple of the questions he constantly challenges us with.  As a technical guy, it’s important to have someone dragging us back to reality – it’s easy to get focused on the technology and lose sight of the end goal.

Our team of “developers” also got a feel for the roles we would play.  I focused on the overall view and dove deeper on technical issues, some of my co-workers served primarily as testers, some as knowledge experts (SME’s), and others as served as UI specialists, focusing on the flow of conversation.  This helped us coordinate our work, and it turned out to be quite important – Best Practice #2 – Know your roles – have technical people, developers, SME’s, architects, and end users represented.  If you don’t have people in these roles, get them.

Starting Out – Building A Work Pipeline

As we started, we came together and worked in a spreadsheet (!?!), gathering the basic questions that we anticipated our chatbot being able to answer.  We cast a pretty wide net looking for “sample” questions to get us kickstarted.  If you are doing something “new”, you’ll have to come up with these utterances yourself.  If you’re covering something that already exists, there should be logs of end user questions that you can use to jumpstart this phase of your project.

Next, we wanted to make sure that we had an orderly development environment.  Since our chatbot was strictly for internal deployment, we didn’t have to worry too much about the separation of environments, so we could use the versioning capabilities of Watson Assistant.  Since our chatbot was going to be deployed on Slack, we were able to deploy our “development” version on Slack, and also deploy our “test” and “production” versions on Slack as well.  These are all tracked on the Versions tab of the Watson Assistant Skill UI.  This gives us the ability to “promote” tested versions of our skill to different environments.  All of this allowed us to have a stable environment that we could work and test in – which leads us to Best Practice #3 – Have a solid dev/test/prod environment set up for your Conversational assistant or chatbot.

How Are We Doing? – K- Fold Testing

As we started out, we began by pulling things together and seeing how our conversational assistant was doing in real-time, using the “Try It” button in the upper right-hand corner of the Watson Assistant skills screen.  Our results were hit and miss at first, so we knew that we needed a good way to test out our assistant. 

We started out with some code from a Joe Kozhaya blog post on Training and Evaluating Machine Learning Models.  I ended up modifying it a little bit, and you can find the modified notebook in my Watson Landing Page GitHub repo, under notebooks, in a Python notebook stored as ANYBOT_Test-and-Deploy.ipynb.  We also read some good stuff from Andrew Freed (Testing Strategies for Chatbots) and from Anna Chaney (Data DevOps Rules of Engagement),  and used some of those ideas as well.  This led me to create that modified Python notebook, which I used to provide automated k-fold testing of our assistant implementation.  This highlights Best Practice #4 – Automate Your AI Testing Strategy.

After all of this was in place, our team fell into a predictable rhythm of work and review of our work.  Since this was a side project for all of us, some of us contributed some weeks, and didn’t contribute on other weeks.  We were a small team of people (less than 10), so it was easy to have our team manage itself.

Using Feedback

As we let our automated training process take hold, we noted that our results were not what we had hoped, and that updating things was difficult.  We also learned that taking time each week to review our Watson Assistant logs was time well spent. 

It was quite difficult to add new scope to our conversation agent, so we looked at our intents and entities again.  After some in-depth discussions, we decided to try a slightly different focus on what we considered intents.  It allowed us to make better use of the entities that we detected, and it gave us the ability to construct a more easily maintained dialog tree.  We needed to change the way that we were thinking about intents and entities.

All of this brings us to our next piece of wisdom – Best Practice #5 – Be Open-Minded About Your Intents and Entities.  All too often I see teams fall into one of either two traps.  Trap 1 – they try to tailor their intents to the answers that they want to give.  If you find yourself with intents like, “how_to_change_password” and “how_to_change_username”, then you might be describing answers, and not necessarily describing intents.  Trap 2 – teams try to have very focused intents.  This leads in an explosion of intents, and a subsequent explosion of dialog nodes.  If you find yourself with intents like, “change_password_mobile”, “change_password_web”, “change_password_voice”, then you have probably fallen into this trap.

We found that by having more general intents, and then using context variables and entities to specify things with more detail, that we have been able to keep our intents relatively well managed, our dialog trees smaller and better organized, and our entire project is much easier to maintain.  So, if our intent was “find_person”, then we will use context variables and entities to determine what products and roles the person should have.  Someone asking, “How do I find the program manager for Watson Assistant?”, would return an intent of “find_person”, with entities detected for “program manager” and “Watson Assistant”.  In this way, we can add additional scope without adding intents, but only by adding some entities and one dialog node. 

Why K-Fold Isn’t Enough

One thing that we realized early on was that our k-fold results were just one aspect of the “quality” of our conversational assistant.  They helped quantify how well we were able to identify user intents, but they didn’t do a lot for our detection of entities or the overall quality of our assistant.  We found that our k-fold testing told us when we needed to provide additional training examples for our classifier, and this feedback worked well.

We also found that the “quality” of our assistant improved when we gave it some personality.  We provided some random humorous responses to intents around the origin of the assistant, or more general questions like, “How are you doing today?”.  The more of a personality that we injected into our assistant, the more authentic and “smooth” our interactions with it began to feel.  This leads us to Best Practice #6 – Inject Some Personality Into Your Assistant

Some materials from IBM will break this down into greater detail, insisting that you pay attention to tone, personality, chit-chat and proactivity.  I like to keep it simple – it’s all part of the personality that your solution has.  I usually think of a “person” that my solution is – say a 32-year old male from Detroit, who went to college at Michigan, who loves sports and muscle cars, named Bob.  Or maybe a 24-year-old recent college graduate named Cindy who grew up in a small town in Ohio, who has dreams of becoming an entrepreneur in the health care industry someday.  This helps me be consistent with the personality of my solution.

We also noticed that we often needed to rework our Dialog tree and the responses that we were specifying.  We used the Analytics tab in the skill we were developing.  On that Analytics tab, we would often review individual user conversations and see how our skill was handling user interactions.  This led us to make changes to the wording that we used, as well as to the things we were looking for (in terms of entities) and what we were storing (in terms of conversation context).  Very small changes can result in a big change in the end-user perception.  Something as simple as using contractions (like “it’s” instead of “it is”), will result in a more informal conversation style.

The Analytics tab in Watson Assistant is interesting.  It provides a wealth of information that you can download and analyze.  Our effort was small, so we didn’t automate this analysis, but many teams DO automate the collection and analysis of Watson Assistant logs.  In our case, we just spent some time each week reviewing the logs and looking for “holes” in our assistant (questions and topics that our users needed answers for that we did not address), and trends in our data.  It has helped guide our evolution of this solution.

Summary

This blog post identifies some best practices for developing a chatbot with IBM Watson Assistant – but these apply to ANY chatbot development, regardless of technology.

  • Best Practice #1 – Know the scope and growth path for your chatbot
  • Best Practice #2 – Know your roles – have technical people, developers, SME’s, architects, and end users represented
  • Best Practice #3 – Have a solid dev/test/prod environment set up for your Conversational assistant or chatbot
  • Best Practice #4 – Automate Your AI Testing Strategy
  • Best Practice #5 – Be Open Minded About Your Intents and Entities
  • Best Practice #6 – Inject Some Personality Into Your Assistant

Now that you have the benefit of some experience in the development of a conversational assistant, take some time to dig in and begin building a solution that will make your life easier and more productive.

Advertisements

I’m Having an Issue on IBM Cloud Part 1 – Why Can’t I Create Anything?

By: Daniel Toczala

Note: This is the first blog in a series of blogs that I am co-authoring with Paula Williams, as part of an “I have an Issue…” series on the IBM Cloud.  These blog posts will cover how to deal with common issues and roadblocks for users of the IBM Cloud. Check out her post on “What the Heck is a CSM?

I like helping my IBM Cloud customers, and I like dealing with the technology.  Every new technology (and even established technologies) have a learning curve.  My goal in writing this series of articles is to help you more quickly conquer that learning curve with respect to the IBM Cloud.

Typical Issues That People See

Some typical issues that users experience when working with the IBM Cloud will often be with respect to their services.  These could be one of the Watson services (like Watson Conversation or Watson Discovery), or maybe Cloud Object Storage, or a DB2 or Mongo database.  The issues that people will typically experience fall into one of two categories:

I’ve Created Something That I Cannot See

These situations have you going out and creating an instance of a service, but now you just can’t seem to figure out how to find it or how to get to it.  It’s almost as if we have decided to hide it on you.

In these situations, it is best to first figure out WHERE you are on the IBM Cloud.  Look in the upper right corner of your browser window.  As soon as you log into the IBM Cloud, you’ll see something like this:

Finding the area you are working in on the IBM Cloud

That text in that small black block tells you WHERE you are.  It tells you the account that you are operating in.  Clicking on this box you can CHANGE where you are looking.  Confused yet?  Maybe we should step back a bit….

You have an ACCOUNT and IDENTITY on the IBM Cloud.  Your IDENTITY from an IBM perspective is typically an email address, which is also your IBM ID (for our example let’s assume that mine is tox@acme.com).  It has the same username/password as your IBM ID (which you might use for things on ibm.com).  You might even have a Federated ID, where we use your company email address/identity and your company authentication mechanisms.  Some things to remember when looking at your IBM ID:

  • Just because you have an IBM ID, doesn’t mean that you have an IBM Cloud account.  You will need to register for an IBM Cloud account. While both the IBM Cloud and ibm.com both use your IBM ID, they are DIFFERENT domains.
  • When you sign up for your IBM ID, use a valid email address as your user ID.  You will need to VALIDATE your account by responding to an email that is sent to (you guessed it!) your user ID.  So in my case I cannot sign up for an IBM ID at tox_is_awesome@acme.com (unless I have convinced my corporate IT folks to give me that email address).

Now let’s get back to finding those cloud resources that we created.  Once you see what context you are in, you will have a better idea of what you can EXPECT to see.  Since I am an Acme Co. employee, I have been added as an approved user of the Acme corporate account.  What does this mean?  Well the Acme corporation created a different account, a corporate account, associated with a service account called IBM_Cloud_Admin@acme.com.  This account has a subscription which provides it with a set amount of “credits” for IBM Cloud services, which it burns down over time.  Since I am a member of this account, I can create IBM Cloud services in this corporate account, and their costs get assigned to the corporate account.  IBM Cloud services are billed based on where they are LOCATED (logically), and not based on who created them.

So now you hopefully have a better feel for how your account fits into the grand scheme of things, maybe you can find out WHERE that Watson service that you created is located, by looking at the various different contexts that you operate in.

Advanced Developer Note: Your IBM ID is based on an email ID.  So I have an IBM ID for tox@acme.com, but I also have one associated with my personal email address (tox_is_awesome@freebiemailcorp.com). I use my acme.com account for doing my regular work, and I use my personal email based ID to do open source work.  That account is a trial account (or maybe I even attach my personal credit card to it), and I am careful not to rack up big charges on the account.  I use it for doing simple little things in the cloud environment.

I Cannot Work With Something I Created

These situations are a little different.  You are able to create some service, but you are then unable to access it.  Either the service is broken, or down, or just not responding to your repeated attempts to use it.  Or maybe you can see a service but you just cannot create it.

So let’s go back to that tox@acme.com account.  First of all, you need to check and make sure that the account that you are attempting to create a service in is able to pay for that service.  If the account has a credit card associated with it, which guarantees payment for cloud services used, then the account is referred to as a “PAYGO” (short for pay-as-you-go) account.  People who use things like GitHub and other SaaS based services should be familiar with this model.  If the account has prepaid for services, via an IBM Cloud subscription, then it is referred to as a “SUBSCRIPTION” account.  Either “PAYGO” or “SUBSCRIPTION” accounts can create any type of service.  Your personal account might not have a guaranteed payment method, like my tox_is_awesome@freebiemailcorp.com account.  In that case you have a “TRIAL” account.  “TRIAL” accounts can create lite (or no charge) instances of most services on the IBM Cloud.  TRIAL accounts will not be able to create more robust versions of those services until they either become “PAYGO” or “SUBSCRIPTION” accounts.

So let’s get back to our example.  My tox_is_awesome@freebiemailcorp.com account is a “TRIAL” account (I’m not paying for anything!), but since I am an approved user of the IBM_Cloud_Admin@acme.com account (which is a “SUBSCRIPTION” account), I can create non-lite service instances within THAT account.  There is one hitch…. I have to have PERMISSIONS granted to me to be able to see particular logical areas of the Acme IBM_Cloud_Admin account.

What Are These “Logical” Areas?

There are two different types of logical areas where IBM Cloud resources can be created.  Each is based on a different security model. 

The Cloud Foundry security model uses the concept of Organizations (called “ORGs”) and Spaces.  These “Orgs” and “Spaces” live in a hierarchal model, with a single Org hosting one or more Spaces.  The administrator/owner of an IBM Cloud account will create these Orgs and Spaces, and will assign people various roles in each org/space.  These roles determine what a user can do within that particular logical environment.  You need to make sure that your account has access to the orgs and spaces that you need to work in.

The IBM Access Management (IAM) security model is based on Resource Groups.  Each resource group may have a series of Access Groups associated with it, and these Access Groups can be used to provide fine-grained access controls and role management.  You need to make sure that your account is enabled to do what is needed for the resource groups that you need to work in.

You can learn all about Orgs, Spaces, Resource Groups, Access Groups and best practices for organizing your IBM Cloud account by reading my blog post entitled, Getting Started Right on the IBM Cloud.

Getting Swagger API Pages for Watson APIs

In the past couple of weeks, I have seen a few comments from my customers complaining about the lack of “sufficient” API documentation for the various Watson API’s. I used to like to point my customers to the Swagger API documentation, but I can’t seem to find it anymore. So I asked some of my fellow IBM folks if they knew where these pages were. They didn’t know, they just had some vague notion that they were no longer supported.

I miss those Swagger API pages – so I found out how to get them. The IBM development teams no longer host these pages, but you can generate them for yourself, whenever you want, but just following this short little guide.

Go Ahead – Get That Swagger

Step 1 – Figure out which API you want to generate a Swagger page for. Go to the IBM Cloud catalog, and select the service that you want to see. For the purposes of this example, I’ll go and look at the Watson Assistant service.

Step 2 – Get to the API Documentation page by clicking on the link titled View API Docs – as shown below.

Step 2a – You can skip all of this hassle by just going to the IBM Cloud API Docs page, and then selecting the specific API documentation page that you are looking for (which in our case is Watson Assistant v1). This is much quicker – and easier to bookmark and remember.

Step 3 – You are now on the Watson Assistant API (V1) page. Look for the ellipsis in the white text portion of the UI, as shown below, and click on it. Save a version of the API by selecting Download OpenAPI Definition. This will download a JSON file to your local machine.

Step 4 – Open a new browser window, and go to the web-based Swagger editor.

Step 5 – In the Swagger editor window, select File -> Import File. Then select your recently downloaded Watson API JSON file (from Step 3).

You can now look at the Swagger API version of the Watson API documentation. This allows you to see all of the API calls for the service, along with the various parameters, and the responses. It also allows you to try to use the interface in an interactive manner. Pretty nice!!

Watson Text to Speech – The Costs of Personalization

I tend to write these blog posts to share interesting things that I have learned when working with our customers.  Just this past week I have had 2 or 3 blog worthy events happen, so I hope to be publishing these posts at a brisker pace in the coming months.

This week I had a customer that is using the Watson Text to Speech service.  They are using it to do short utterances, things like street names, addresses, and city names.  The utterances are relatively short.  They told me that they had no idea how they were being charged for the service.

This particular customer has a focus on producing a positive customer experience.  No tinny, mechanical voice for this customer!!  They are tweaking the speaking voice and customizing it, using SSML (Speech Synthesis Markup Language) to modify and “humanize” the synthesized speech from the Watson Text-to-Speech (TTS) service.  You have the ability to modify things like the emotion used in the speech generated (called expressive SSML), to more basic things like the pitch and glottal tension (and yes, I had to look up the definition of glottal tension).  The typical curl call that they use looked similar to this:

curl -X POST -u apikey:*****************************--header "Content-Type: application/json" --header "Accept: audio/wav" --data "{\"text\":\"<speak><voice-transformation type=\\\"Custom\\\" breathiness=\\\"35%\\\" pitch=\\\"-80%\\\" pitch_range=\\\"60%\\\" glottal_tension=\\\"-40%\\\" >$text</voice-transformation></speak>\"}" --output $finalFile "
https://stream.watsonplatform.net/text-to-speech/api/v1/synthesize?voice=en-US_MichaelVoice"

So this curl command will ask for some text (referenced by the $text parameter) that will have breathiness set to 35%, pitch at -80%, the pitch range set to 60%, with a glottal tension of -40%.  I’m sure that someone played with these values, before settling on this combination.  It’s a great way to customize the sound and the tone of your automated speaking responses. 

How Does This Impact Cost?

 The cost of doing something like this will vary, and this is where I learned how some small changes can have a HUGE impact on the costs associated with your Watson solution.  The basic price for using the Watson TTS service is $0.02 per thousand characters.  There are some interesting things to keep in mind here.  Whitespace is NOT counted, so only count the non-whitespace characters.  Also, remember that the voice customizations and everything between the “<speak>” and the “</speak> ” are included in this count.

Now let’s assume that the text being converted was a home address, something like, “9 Marine Drive, Round Rock, Texas 78681”.  Let’s also assume that the user is being referred to by name.  There will also be some other text (a meter reading, a service interruption, etc.) as well, informing the end user about something about to happen near their home address.  We want to figure out the monthly costs for something like this if we estimate that we’ll build and issue 100,000 of these notices in a month.  A sample utterance might sound/look like this:

“This message is for Dan Toczala.  We are informing you of a service interruption tomorrow morning at 9 Marine Drive, Round Rock, Texas 78681.  Please call us at 1-800-123-4567 if you have questions.”

Breaking It Down

Your application can look up the customer name and address, and build this entire text string for each individual event, and then submit each one to the Watson Text To Speech service.  Your typical call would look like this:

curl -X POST -u apikey:*****************************--header "Content-Type: application/json" --header "Accept: audio/wav" --data "{\"text\":\"<speak><voice-transformation type=\\\"Custom\\\" breathiness=\\\"35%\\\" pitch=\\\"-80%\\\" pitch_range=\\\"60%\\\" glottal_tension=\\\"-40%\\\" >This message is for Dan Toczala.  We are informing you of a service interruption tomorrow morning at 9 Marine Drive, Round Rock, Texas 78681.  Please call us at 1-800-123-4567 if you have questions.</voice-transformation></speak>\"}" --output $finalFile "
https://stream.watsonplatform.net/text-to-speech/api/v1/synthesize?voice=en-US_MichaelVoice"

For the purposes of this discussion, we’re going to just focus on the “payload”, or the part in the data section of the curl command.  The part that impacts what your costs are.  So this chunk:

<speak><voice-transformation type=\\\"Custom\\\" breathiness=\\\"35%\\\" pitch=\\\"-80%\\\" pitch_range=\\\"60%\\\" glottal_tension=\\\"-40%\\\" >This message is for Dan Toczala.  We are informing you of a service interruption tomorrow morning at 9 Marine Drive, Round Rock, Texas 78681.  Please call us at 1-800-123-4567 if you have questions.</voice-transformation></speak>\

Now in this example, we count ALL non-whitespace characters inside of the quotes.  We have 336 non-whitespace characters.  Multiply that by 100,000 notices in a month, and I get a rate of 33,600,000 characters a month.  Apply the TTS cost of $0.02 per thousand characters, and you get a final monthly cost of $672.

Now let’s see what happens if we change the way that we think about this.  What if we quit customizing so much of the voice?  Then we would end up with something looking like this:

<speak><voice-transformation>This message is for Dan Toczala.  We are informing you of a service interruption tomorrow morning at 9 Marine Drive, Round Rock, Texas 78681.  Please call us at 1-800-123-4567 if you have questions.</voice-transformation></speak>\

So for the non-customized example, we have 225 non-whitespace characters.  Multiply that by 100,000 notices in a month, and I get a rate of 22,500,000 characters a month.  Apply the TTS cost of $0.02 per thousand characters, and you get a final monthly cost of $450.  Customizing my voice could be looked at as a cheap way to have an impact on customer satisfaction (it’s only $222 a month), or a really expensive way to do this (it’s about 49% more expensive than the base translation).  Remember, it all depends on how you want to look at things.  I suggest focusing on your problem and the overall costs of your solution.

Now let’s look at a final example.  In this example, we’ll keep our customized voice, but we’ll try to stop converting the same text over and over again.  What if our message was built in a way that minimized what needed to be converted each time?  What if we converted a basic message once, and the rest of the customized part for each customer?  So we could do this for each customer:

<speak><voice-transformation type=\\\"Custom\\\" breathiness=\\\"35%\\\" pitch=\\\"-80%\\\" pitch_range=\\\"60%\\\" glottal_tension=\\\"-40%\\\" >This message is for Dan Toczala, who resides at 9 Marine Drive, Round Rock, Texas 78681</voice-transformation></speak>\

And then follow that with this “standard” section which we would only need to convert once (for a one time cost of fractions of a cent):

<speak><voice-transformation type=\\\"Custom\\\" breathiness=\\\"35%\\\" pitch=\\\"-80%\\\" pitch_range=\\\"60%\\\" glottal_tension=\\\"-40%\\\" >We are informing you of a service interruption tomorrow morning. Please call us at 1-800-123-4567 if you have any questions.</voice-transformation></speak>\

So for the modified script example, we have 244 non-whitespace characters.  Multiply that by 100,000 notices in a month, and I get a rate of 24,400,000 characters a month.  Apply the TTS cost and you get a final monthly cost of $488.

Final Conclusions

So let’s look at all of these options together:

ApproachCharacters /
Msg.
Characters /
Month
Monthly
Cost
% Change
Basic
22522,500,000$4500%
Full
Customization
33633,600,000$67249.3%
Modified
Customization
24424,400,000$4888.4%

Looking at things in this way helped us make a rational decision on what things really cost, and helped us look at ways we could maximize our impact and minimize our costs.

P.S.  For those of you who were patient enough to read through this entire article, you can save yourself even more by removing the <speak> and </speak> tags.  These are assumed by the Watson Text To Speech service, so you can omit using them and save yourself 15 characters per message.  For the purposes of this example, that would reduce the monthly cost of each of the above approaches by $30 a month.