Getting Swagger API Pages for Watson APIs

In the past couple of weeks, I have seen a few comments from my customers complaining about the lack of “sufficient” API documentation for the various Watson API’s. I used to like to point my customers to the Swagger API documentation, but I can’t seem to find it anymore. So I asked some of my fellow IBM folks if they knew where these pages were. They didn’t know, they just had some vague notion that they were no longer supported.

I miss those Swagger API pages – so I found out how to get them. The IBM development teams no longer host these pages, but you can generate them for yourself, whenever you want, but just following this short little guide.

Go Ahead – Get That Swagger

Step 1 – Figure out which API you want to generate a Swagger page for. Go to the IBM Cloud catalog, and select the service that you want to see. For the purposes of this example, I’ll go and look at the Watson Assistant service.

Step 2 – Get to the API Documentation page by clicking on the link titled View API Docs – as shown below.

Step 2a – You can skip all of this hassle by just going to the IBM Cloud API Docs page, and then selecting the specific API documentation page that you are looking for (which in our case is Watson Assistant v1). This is much quicker – and easier to bookmark and remember.

Step 3 – You are now on the Watson Assistant API (V1) page. Look for the ellipsis in the white text portion of the UI, as shown below, and click on it. Save a version of the API by selecting Download OpenAPI Definition. This will download a JSON file to your local machine.

Step 4 – Open a new browser window, and go to the web-based Swagger editor.

Step 5 – In the Swagger editor window, select File -> Import File. Then select your recently downloaded Watson API JSON file (from Step 3).

You can now look at the Swagger API version of the Watson API documentation. This allows you to see all of the API calls for the service, along with the various parameters, and the responses. It also allows you to try to use the interface in an interactive manner. Pretty nice!!

Advertisements

Watson Text to Speech – The Costs of Personalization

I tend to write these blog posts to share interesting things that I have learned when working with our customers.  Just this past week I have had 2 or 3 blog worthy events happen, so I hope to be publishing these posts at a brisker pace in the coming months.

This week I had a customer that is using the Watson Text to Speech service.  They are using it to do short utterances, things like street names, addresses, and city names.  The utterances are relatively short.  They told me that they had no idea how they were being charged for the service.

This particular customer has a focus on producing a positive customer experience.  No tinny, mechanical voice for this customer!!  They are tweaking the speaking voice and customizing it, using SSML (Speech Synthesis Markup Language) to modify and “humanize” the synthesized speech from the Watson Text-to-Speech (TTS) service.  You have the ability to modify things like the emotion used in the speech generated (called expressive SSML), to more basic things like the pitch and glottal tension (and yes, I had to look up the definition of glottal tension).  The typical curl call that they use looked similar to this:

curl -X POST -u apikey:*****************************--header "Content-Type: application/json" --header "Accept: audio/wav" --data "{\"text\":\"<speak><voice-transformation type=\\\"Custom\\\" breathiness=\\\"35%\\\" pitch=\\\"-80%\\\" pitch_range=\\\"60%\\\" glottal_tension=\\\"-40%\\\" >$text</voice-transformation></speak>\"}" --output $finalFile "
https://stream.watsonplatform.net/text-to-speech/api/v1/synthesize?voice=en-US_MichaelVoice"

So this curl command will ask for some text (referenced by the $text parameter) that will have breathiness set to 35%, pitch at -80%, the pitch range set to 60%, with a glottal tension of -40%.  I’m sure that someone played with these values, before settling on this combination.  It’s a great way to customize the sound and the tone of your automated speaking responses. 

How Does This Impact Cost?

 The cost of doing something like this will vary, and this is where I learned how some small changes can have a HUGE impact on the costs associated with your Watson solution.  The basic price for using the Watson TTS service is $0.02 per thousand characters.  There are some interesting things to keep in mind here.  Whitespace is NOT counted, so only count the non-whitespace characters.  Also, remember that the voice customizations and everything between the “<speak>” and the “</speak> ” are included in this count.

Now let’s assume that the text being converted was a home address, something like, “9 Marine Drive, Round Rock, Texas 78681”.  Let’s also assume that the user is being referred to by name.  There will also be some other text (a meter reading, a service interruption, etc.) as well, informing the end user about something about to happen near their home address.  We want to figure out the monthly costs for something like this if we estimate that we’ll build and issue 100,000 of these notices in a month.  A sample utterance might sound/look like this:

“This message is for Dan Toczala.  We are informing you of a service interruption tomorrow morning at 9 Marine Drive, Round Rock, Texas 78681.  Please call us at 1-800-123-4567 if you have questions.”

Breaking It Down

Your application can look up the customer name and address, and build this entire text string for each individual event, and then submit each one to the Watson Text To Speech service.  Your typical call would look like this:

curl -X POST -u apikey:*****************************--header "Content-Type: application/json" --header "Accept: audio/wav" --data "{\"text\":\"<speak><voice-transformation type=\\\"Custom\\\" breathiness=\\\"35%\\\" pitch=\\\"-80%\\\" pitch_range=\\\"60%\\\" glottal_tension=\\\"-40%\\\" >This message is for Dan Toczala.  We are informing you of a service interruption tomorrow morning at 9 Marine Drive, Round Rock, Texas 78681.  Please call us at 1-800-123-4567 if you have questions.</voice-transformation></speak>\"}" --output $finalFile "
https://stream.watsonplatform.net/text-to-speech/api/v1/synthesize?voice=en-US_MichaelVoice"

For the purposes of this discussion, we’re going to just focus on the “payload”, or the part in the data section of the curl command.  The part that impacts what your costs are.  So this chunk:

<speak><voice-transformation type=\\\"Custom\\\" breathiness=\\\"35%\\\" pitch=\\\"-80%\\\" pitch_range=\\\"60%\\\" glottal_tension=\\\"-40%\\\" >This message is for Dan Toczala.  We are informing you of a service interruption tomorrow morning at 9 Marine Drive, Round Rock, Texas 78681.  Please call us at 1-800-123-4567 if you have questions.</voice-transformation></speak>\

Now in this example, we count ALL non-whitespace characters inside of the quotes.  We have 336 non-whitespace characters.  Multiply that by 100,000 notices in a month, and I get a rate of 33,600,000 characters a month.  Apply the TTS cost of $0.02 per thousand characters, and you get a final monthly cost of $672.

Now let’s see what happens if we change the way that we think about this.  What if we quit customizing so much of the voice?  Then we would end up with something looking like this:

<speak><voice-transformation>This message is for Dan Toczala.  We are informing you of a service interruption tomorrow morning at 9 Marine Drive, Round Rock, Texas 78681.  Please call us at 1-800-123-4567 if you have questions.</voice-transformation></speak>\

So for the non-customized example, we have 225 non-whitespace characters.  Multiply that by 100,000 notices in a month, and I get a rate of 22,500,000 characters a month.  Apply the TTS cost of $0.02 per thousand characters, and you get a final monthly cost of $450.  Customizing my voice could be looked at as a cheap way to have an impact on customer satisfaction (it’s only $222 a month), or a really expensive way to do this (it’s about 49% more expensive than the base translation).  Remember, it all depends on how you want to look at things.  I suggest focusing on your problem and the overall costs of your solution.

Now let’s look at a final example.  In this example, we’ll keep our customized voice, but we’ll try to stop converting the same text over and over again.  What if our message was built in a way that minimized what needed to be converted each time?  What if we converted a basic message once, and the rest of the customized part for each customer?  So we could do this for each customer:

<speak><voice-transformation type=\\\"Custom\\\" breathiness=\\\"35%\\\" pitch=\\\"-80%\\\" pitch_range=\\\"60%\\\" glottal_tension=\\\"-40%\\\" >This message is for Dan Toczala, who resides at 9 Marine Drive, Round Rock, Texas 78681</voice-transformation></speak>\

And then follow that with this “standard” section which we would only need to convert once (for a one time cost of fractions of a cent):

<speak><voice-transformation type=\\\"Custom\\\" breathiness=\\\"35%\\\" pitch=\\\"-80%\\\" pitch_range=\\\"60%\\\" glottal_tension=\\\"-40%\\\" >We are informing you of a service interruption tomorrow morning. Please call us at 1-800-123-4567 if you have any questions.</voice-transformation></speak>\

So for the modified script example, we have 244 non-whitespace characters.  Multiply that by 100,000 notices in a month, and I get a rate of 24,400,000 characters a month.  Apply the TTS cost and you get a final monthly cost of $488.

Final Conclusions

So let’s look at all of these options together:

ApproachCharacters /
Msg.
Characters /
Month
Monthly
Cost
% Change
Basic
22522,500,000$4500%
Full
Customization
33633,600,000$67249.3%
Modified
Customization
24424,400,000$4888.4%

Looking at things in this way helped us make a rational decision on what things really cost, and helped us look at ways we could maximize our impact and minimize our costs.

P.S.  For those of you who were patient enough to read through this entire article, you can save yourself even more by removing the <speak> and </speak> tags.  These are assumed by the Watson Text To Speech service, so you can omit using them and save yourself 15 characters per message.  For the purposes of this example, that would reduce the monthly cost of each of the above approaches by $30 a month.