In part 1 of this series, Full Cycle Cognitive Development – Part 1 – Business Concepts, I talked about some of the simple organizational issues and concepts that are important to know when doing cognitive development. I am an ex-developer, so this second part is a bit closer to my heart. In this blog we’ll look at some of the basic “blocking and tackling” that developers need to do as part of a successful cognitive development effort.
Use Agile development methods
Most software development teams use some sort of Agile methodology today. It’s not perfect, and there are “shades” of Agile. Some teams are able to fully embrace Agile development methods, but others cannot due to business or regulatory constraints, or safety concerns. There are thousands of articles, papers and blog posts about Agile development – and everyone has sightly different approaches and opinions, so I won’t go into great depth here.
What I will say is that the development of cognitive solutions is often a very fluid process. IBM (as well as other vendors) often come out with new cognitive capabilities which you will need to assess and evaluate. Don’t try to lay out a full blown project plan without acknowledging that the changes in this area are going to be constant. New capabilities may emerge that can change the way your solution is constructed, or may even change the original scope and goal of your project.
Because of this constant evolution and change in the capabilities of cognitive services, it is essential that you embrace the idea of sprint planning, and the basic philosophy of Agile development. If you find yourself beginning to produce Gantt charts, detailed requirements documents, and calling out delivery dates 9 months in the future, you’re probably moving in the wrong direction. Focus on user stories, sprints, and the delivery of incremental value. Keep in mind that your business goals probably won’t change too much, but the technological path that you take to achieve these goals will definitely change over time.
It’s All About the Data (Science)
Another reason Agile is attractive for cognitive development efforts is because cognitive capabilities sometimes require training. Some capabilities come pre-trained, where the vendor has already spent the time and effort to train and model the cognitive service. Others will require you to train your service, teaching it how to react and respond.
These trained services will require data in order to be trained, and the accuracy of the trained service will need to be assessed over time. This training is a bit non-deterministic, so it could take 2 sprints, or it could take 10 sprints, to get your service adequately trained. James Ravenscroft has some excellent blog posts about training your cognitive service, and then testing it out, which I suggest you read. If you have the time, read some of the blog posts on Cognitive System Testing by Andrew Freed – he’s really good and he knows the subject area quite well. He has a great post on Reaching Peak Cognitive Performance that I consider a “must read”. Another good source on this is Marc Nehme, who has some good guidance on training the IBM Watson services. (Author’s note: I work for IBM on Watson, so most of my links/references will focus on the Watson technology. The basic concepts here should hold for most AI vendor technologies. If you know of other good sources, please reference them in the comments to this blog post.)
One common area of confusion is around the testing of cognitive services and trained systems. This is NOT traditional application testing. Traditional applications are deterministic, if you do “A”, they will respond with “B”. Cognitive systems (and AI systems in general) work with probabilities – and they are imperfect just like you and me. So testing requires a bit of a different mindset with cognitive systems. With cognitive systems, 100% accuracy isn’t realistic. If all of this sounds “just plain wrong” to you, then I strongly encourage you to read the testing links in the previous paragraph. Those folks explain it much better than I could.
All of this training and testing rely on one thing – data. It sounds simple, but there is a science to it – which is why we have data scientists and the discipline of data science. A cognitive system is only as good as it’s training, and it’s training is only as good as the data being used to train the system. My team has worked with hundreds of organizations looking to build cognitive applications, and my architects always seem to come back to a single point when assessing the prospects for some new effort – “How good is their data?”. Data is critical. Data is king.
So how do you get your hands on the data, and once you do, what should you do with it? My advice is to get a good data scientist (or multiple data scientists) to help you. They know how to get information and intelligence from both open source data sets, as well as your own private data sets. You can also check out the IBM Data Science Experience, which has all kinds of resources that you can use. The site has articles, some sample data sets and notebooks, tutorials and more. It can be a great place for you to begin to understand how you can think about your data, and figure out how to use this knowledge to help drive a cognitive application.
The key to keep in mind here is that as your learn about the cognitive capabilities that you have at your disposal, and as you train these services, the feedback that you get can cause you to change your focus and schedule. Data sources may change, training may take longer than expected, additional data may have to be obtained, and existing data may need to be annotated. Stay disciplined in doing sprint demos and backlog grooming. These activities will allow you to show positive progress and keep you aligned with your stakeholders.
This is where a lot of Agile people get confused. Agile teams don’t HATE tools, they just don’t use tools unless doing so adds value. The Agile Manifesto states that, “We value individuals and interactions over processes and tools”. It doesn’t say, “We hate all tools”. When doing cognitive development, you will need to rely on tools to make your job easier (valuing the time and energy of the individual), and to show your stakeholders what you are doing (helping keep interactions based on reality).
So what kinds of tools are useful here? Let’s cover a few of the different areas.
There are hundreds of different Agile tools that you can use to help keep your cognitive development organized and focused. You can use things like the IBM Jazz products, GitHub/ZenHub, VersionOne, Atlassian and others. Some are open source, some are vendor provided. Developer communities can get into long arguments about which tools are better. Don’t waste a lot of time and effort doing tool evaluations, just choose the tools that your team will work the best with. The tools that you select should provide a clear mechanism for your development team to communicate status, ask for help, indicate issues and problems, and allow you the ability to quickly find and focus on problems. They should allow your team to do this without significant additional effort, and they should be something that is as unobtrusive as possible. In other words, they should value the individuals, and not the process.
Whatever tools you choose should have some transparency and the ability to show team progress, goals, and challenges on a DASHBOARD. If you’re a long time reader of my blog, you know that I find dashboards to be helpful not only in communication of status and issues, but they help teams become transparent, and allow you to spend less time arguing over who’s version of reality is correct. The best Agile development teams that I have seen have been transparent, almost painfully so.
Cognitive and Data Tooling
Many of the different vendors of cognitive capabilities also provide some kind of tooling or scripts that help to make their services easier to use. These range from simple tools that can be used to break up training and test data, to some more sophisticated data annotation tools (like Watson Knowledge Studio) and analytics capabilities. Just like the Agile tools, some of these tools are open source and you can just grab the code from GitHub repositories, other things are vendor provided, and some cost money.
Often cognitive services will need data in a particular format, XML or JSON, before they can use the data. Save yourself some time and trouble and just do a simple Google search for simple data translation tools like these. More complex tools may require licensing or some cost. You should also spend some time with your data scientist (you did get one, right?).
Make sure that your data scientists have the tools that they need to be effective. It’s also a good idea to have your data scientists using the same tools – that way they can begin to reuse certain methods and techniques, and begin to build up a solid data science discipline in your organization. You wouldn’t want your developers all working with different IDE’s, compilers, languages, on an ad-hoc basis, would you? It would be total chaos. The same thing holds true for your data scientists – be nice to them and they will reward you with insights and business intelligence that will amaze you.
Some of the more common tools that data scientists use include things like Python (along with the Pandas, SciPy and NumPy modules), Jupyter notebooks, R, D3 (D3.js), Spark, and TensorFlow. Some of these are programming languages well suited to data manipulation and analysis, some are data visualization frameworks, and some are just data driven technologies.
For those of you who know me, this is an area near and dear to my heart. We’ve already discussed Agile tools, but what about other software tools? What else do you you need to be aware of?
You need development tools, but before you go and get some, keep in mind that most cognitive technologies depend on (or are deployed on) cloud infrastructure. Keep in mind that the tools that you choose will need to work in a cloud environment. If you’re working with Watson, then you will need to provision and access the Watson services on the IBM Bluemix cloud. Do you want your application to live in this environment as well? Or are you going to host your application on your own infrastructure, and make calls out to the Watson services? Where is your data going to live? Do you have private data being used by the cognitive application? Are you using two types of data, with some data used for conversational and user context, and another set used for training your cognitive service?
With cloud environments, this can all get overwhelming pretty quickly. So the first thing I would suggest is a good drawing tool. Pick what you want, Visio, Mural, PowerPoint (don’t shoot me – some of you actually use it for this kind of thing), some architecture tool like Archi, or something else. You just need to have the ability to easily express in pictures what your cognitive application is going to look like from a technology perspective. A lot of people will want to know about specific pieces of your application (security, deployment, etc.), and you will want to communicate clearly with them.
Next you need to be able to actually write code and that means you’ll need some sort of IDE for your developers. There is a lot out there to choose from, and you probably already have something that you are using. If it’s not broke – then don’t fix it, and just let your developers use what they are comfortable with. Some of the more popular IDE’s work for more than just one language, and that is probably a direction I would go in. You can use something like Eclipse, Intellij, Xcode, Atom, Komodo, or even the Orion IDE that is part of Jazz Hub (more on that in a little bit). Some of these are open source, some are vendor tools that you’ll need to pay for. Find what gives your developers the most value, and go with it. Just make sure that whatever you choose has some way to integrate information back to your dashboards (remember those from the section on Agile tools?).
You also need to consider Software Configuration Management (SCM) tools, so you can develop code without having developers stepping all over each other. There are a LOT of different SCM tools in the market, and if your company is normal you probably already use multiple different SCM tools. Developers can get “excited” about their favorite SCM tools – they love some and hate others. You can select from things like the Jazz tools, Git or GitHub, Subversion, Razor, CVS and others. Like all of the other tools I’ve mentioned, some of these are open source, some are vendor tools that you’ll need to pay for. Recently I have seen a lot of work being done by my customers on either GitHub or GitHub Enterprise (think of it as a “private”GitHub). If you’re using GitHub, then you might want to look at ZenHub for Agile tooling capabilities. My own team has been using GitHub Enterprise with ZenHub Enterprise, which gives us SCM and Agile tool capabilities (but no real dashboarding). I like it because it’s all handled out in the “cloud” somewhere, and it integrates with Jazz Hub for a cloud based, seamless, deployment to Bluemix. We still need to use another technology for our dashboards, but it allows us to be flexible, responsive, and get up and going on any project pretty quickly.
I did mention Jazz Hub in my list of potential IDE’s. That is a qualified recommendation, and it’s a bit different. As far as an IDE is concerned, it’s a bit limited in it’s capabilities. It does have some really cool features though, which is why I mention it. It has some Agile tooling and dashboarding capabilities built into it, which you can access by hitting the “Track & Plan” button in your project. You also have an integrated GitHub repository for your SCM needs. Finally, it also has some built in DevOps tooling (more on that in my next post). In my work with it I have found it sufficient for small projects, and it really simplifies things having everything Cloud based (IDE, SCM, Agile tools, deployment to Bluemix, etc.). Give it a look (it’s free), and see if it works for some of your projects.
I can hear people already clamoring for sustainable software development, with DevOps tools and delivery pipelines. Don’t worry, I haven’t forgotten about them – the topic is too big to approach here. That is the subject of Part 3 of this series, Full Cycle Cognitive Development – Part 3 – Do It Again.
Now you should have most, if not all, of your supporting infrastructure in place. Now you need to get to work actually creating something useful. Less talk, more action!
Start by getting your epics and user stories collected. They won’t be perfect, and they will change over time, but you need to get started. Assemble your team, get them familiar with their working environment, and begin to do some sprint planning and prioritization of user stories. Let your teams get started and begin to find a working rhythm, and have the whole team monitor the progress of their efforts on your dashboards.
You are going to make mistakes – be honest about it and accept that fact. Your processes and tools will change as your team begins to add things to the process to ensure quality, and as they remove things from the process that add little or no value. Your cognitive team is not only doing software development, they are doing cognitive system training, and that those two things are quite different. You need to be flexible and understand that what works for other organizations might work for your organization, or it might not. Don’t believe everything that you read (even this blog!), believe what you SEE and what you MEASURE in your own teams.
Cognitive development is new for most organizations. It is NOT your typical software development effort, although it shares many of the same concepts and metrics. Keep in mind that cognitive systems are trained, not programmed. If you’re doing cognitive development, people are going to be watching you – it’s just the way that things are. So spare yourself and your team the agony of endless status meetings by being transparent, and show your progress and struggles on a dashboard. Embrace Agile development concepts, and don’t be afraid to replay, reprioritize, and adjust as time (and sprints) moves on. Being transparent will give your management team some comfort with all of the change that cognitive development will bring, and will allow you to quickly introduce new methodology, metrics and concepts.
In part 3 of this series I will explore how you make cognitive development repeatable and part of your development culture, and the important role of DevOps concepts in making this a reality.