Jazz Performance – A Guide to Better Performance

Note: I have had a tremendous response to this article, and the attached PDF.  I also received some great feedback which I am incorporating into the document.

  • 2/13/2013 – updated the document based on comments and feedback.
  • 2/14/2013 – updated the PDF document based on additional feedback.  Added content on reporting and weekly administrative tasks.

A lot of my time has been spent working on Jazz performance over the past year.  As I worked with different customers, I saw a lot of different approaches and a lot of different performance issues.  As the year has rolled on, I have tried to capture things that I have learned, and share them in my blog series on Jazz performance (see Jazz Performance Part 1 – Is My Network Causing Me Pain?).  I have also been capturing a lot of the best practices and explanations about Jazz performance.  I have compiled all of this information, and checked with the experts, and put it all into one place.

Last year, some of my IBM friends also mentioned the concept of a performance guide to our customers.  They called this guide a “Purple Book”.  What you see attached here is the initial version of that “purple book”.  Why am I releasing this right now, on my blog?  Why isn’t this on the “official IBM site” or up on Jazz.net?  This is an early copy of that “Purple Book”, but there is enough good information in here that it may be worthwhile for a lot of our customers.  It still needs more information, more refinement and more review, before it can get released on Jazz.net.  This information will actually become the foundation of a Deployment wiki site which should go live later this year.  I expect the information to be updated and expanded upon in that deployment wiki.  This document is a “point in time” view of Jazz performance.  I know that it will improve and change over time, as new tools and techniques for monitoring and improving Jazz performance are uncovered.  Please feel free to provide comments (and even content!) for this guide.

So what is this guide about?  It has sections on what impacts the performance of a Jazz solution, with some basic explanation of why performance is impacted.  It has a guide on monitoring your Jazz implementation, and then finally ends with a section of troubleshooting advice and tips.   Here is a link to the Jazz Performance Guide, please give it a read and let me know what you think.

Jazz Performance Part 5 – Keep an http watch on your Jazz Server

Finally I get to the final chapter of my blog postings on Jazz Performance.  I ended up doing parts 7, 6 & 5 in reverse order, because those seemed to be the most relevant at the time.  So I guess that this makes this post the least relevant of all!  It isn’t, but the area that this post covers tends to be a very focused one, and one that may not apply to many of our customers.  This post focuses on the performance of the web client in your Jazz environment, and how you can tell where time is being spent in the responses to user requests through the web browser clients.

One of the key points to understand here is that when using a web browser, is that there are three distinct phases in any web client transaction with the Jazz application server.  There is an initial request for data from the web browser to the Jazz application server.  This is followed by a response, or responses, from the Jazz application server, which is returning the data requested.  The final phase is the assimilation of the data returned to the browser, and the subsequent JavaScript processing to transform this data into meaningful information which can be displayed to the end user.

When we discuss web based performance issues, the typical complaint is with slow rendering of plans in the web client.  This is because when returning data for plans (as well as some other types of user requests), a large amount of data needs to be returned from the Jazz application server.  Data on each work item in the plan needs to be retrieved.  Once the data has been returned, the JavaScript engine then needs to parse and present the data in a meaningful way to the end user.  Finally we get to the actual presentation of the information back to the end user.  In this blog article, we will look at how you can determine how much time is spent in each phase, and how you can do things that will impact this performance.

Using Web Browser Monitors

The best way to determine how time is being spent, is to use one of the web browser debugger and monitor tools that are commonly used.  I will cover a tool called Http Watch.  I start with Http Watch because it works for both Firefox and Internet Explorer.  It runs on Windows, and is an add-on/plugin for both Internet Explorer and Firefox.  Once you have it installed, go to the Tools menu of either browser to activate it.

FF_HW_Launch
Launching in Firefox
IE_HW_Launch
Launching in Internet Explorer

 

Once you launch Http Watch, you will see a window at the bottom of your browser.  This is where you will be able to see exactly what is going on in the browser.  Now navigate out to Jazz.net, and check out one of the plans.  Go out to Jazz.net and log in, using your user ID and password (because I know that you already have one).  Now before you navigate to the plan, make sure to hit the record button in the Http Watch window, which tells Http Watch to begin recording and timing your browser interactions.  Then navigate to the RTC Product Backlog Plan.  It might be easier to navigate to the plan first, start recording with Http Watch, and then do a page refresh.  Be careful though, because the browser cache may influence the actual load time in this case.  Either way, you should see a graph begin populating in the lower window indicating what has happened during your display of this plan.  Wait until the plan is fully displayed and rendered, then press the Stop button in the Http Watch window.  Now you can go and look at the data.

Checking out The Data

FF_HW_Results
Sample results from Http Watch session

As you can see in the diagram above, the results get shown in a graph, with the length of each line showing the relative length of time needed for each operation.  You also can see the time stamp of when the operation occurred, how long it took, the type of operation (GET, POST, etc.) , the type of resource (CSS, json, JavaScript, images, etc.), and the address that the request went to or came from.  The initial series of GET operations are pulling the typical header information and CSS information for rendering the page, along with the initial JavaScript needed for the page.  The first POST command is to “https://jazz.net/jazz/service/com.ibm.team.apt.internal.service.rest.IPlanRestService/getItems2”, which is the beginning of the retrieval of the information for the plan being displayed.  As you scroll down you can see the series of GETs and POSTs being made, and see the amount of data being transferred with each of these.  Further down the trace you will see a large POST to “https://jazz.net/jazz/service/com.ibm.team.workitem.common.internal.rest.IWorkItemRestService/getWorkItems”, which has a large amount of data returned.  This is the retrieval of the work item contents for the work items in the plan.  After this point, there will be some traffic, but most of this time after this is spent in processing and rendering the data coming back from the Jazz application server.

What Does It Mean?

Keep an eye on how long it takes for the correct data to be identified on the Jazz application server, how long it takes to return that data, and then how long it takes to render the data. It will give you an idea of where the time in processing typical user requests is spent.  is it waiting for data to come over the network?  Is it waiting for Jazz application server responses?  Is it in the JavaScript processing?  This information will help you determine where you should focus your efforts, if you are debugging performance issues.

One of the things that I have used this for is to highlight the differences between the various browsers and browser versions that my customers have available to them.  In many instances, customers are limited in what they can install on their machines, and they may be on earlier versions of some browsers.  This allows them to compare browser performance in their environment.  Not only are you able to compare the overall time needed to display page contents, you can also see how much time is being spent in processing and transferring the information, and contrast the differences between your browsers.  It is something that I strongly suggest that you do, and let your users know what you have seen.  End users want to know the differences in browser performance, and they also like to see benchmarks of what is considered “normal” performance in your environment.  If they see that a particular plan takes 10 seconds to render for you, then they know to expect similar performance when they use the tools.

Setting expectations and getting performance baselines is important.  Recording them somewhere visible to your users is even more important.  If you begin getting complaints about slower performance three months from today, you will have a set of tools and measurements so you can objectively determine if things have deteriorated over time, or if the end users just have rising expectations.  This objective data is critical when trying to address performance issues in your environment.  It allows you to weed out the users complaining about normal performance, and allows you to quickly zero in on real problems.  it will then allow you to measure the impact of any changes that you make to your configuration.

Other Things

I have only discussed Http Watch in this blog.  Keep in mind that there are a lot of different tools out there that will do similar types of profiling.  I know that a lot of our developers like to use Firebug for working with Firefox.  Firebug is nice because it has debugging capabilities as well, which can sometimes come in handy.  In addition, there is a built in capability in Firefox for doing a lot of this.  Go to Tools -> Web Developer -> Developer Toolbar in your Firefox browser.  This will launch some developer tools, like a debugger, that you can use to look into the JavaScript (if you care to).  In Internet Explorer, you can use the F12 key to debug JavaScript in that browser as well.

Other Articles in the Series

This series of articles is completed.  Here is a list of the topics covered:

Jazz Performance Part 6 – Obeying the Laws of Physics

There are a number of things that can impact the performance of your Jazz servers.  Some of these things you can address with better architecture.  Some things can be addressed with better tuning of the web servers, JVMs, databases and Jazz applications.  Some things cannot be changed.  Some things are just slaves to the laws of physics.  Electrons can only move so fast (“186,000 miles per second – Not just a good idea, it’s the law“), and there are distinct physical limitations that we need to be aware of in our environments.  What are some of these limitations that you need to be aware of?

Latency

The biggest physical limitation to the performance of any computing system is latency.  The amount of time it takes for your data to make the physical journey from your Jazz server to the end user machine is just one place where latency comes into play.  There is also the amount of time data needs to travel between the various servers that support your Jazz applications.  What are some of the more common causes of latency?

Since the Jazz applications often depend on LDAP for user authentication, the latency between the Jazz JTS application and the LDAP server can have an impact on end user performance.  That is why we always recommend deploying Jazz in a data center, since most corporate data centers have a local LDAP repository, thus minimizing the amount of latency between the JTS and any associated LDAP servers.

Likewise, the Jazz application servers, web servers, and backend database servers should be co-located if at all possible.  Some Jazz customers do have them in different locations, but I always insist that these be located within the same physical location.  A 100ms ping time may not seem like much, but when a query needs to return data from 100’s of work items, and the round trip between the Jazz server and the database server ends up being 200ms for every operation, the time adds up.

You cannot control where end users will access your Jazz solution from, but you can control expectations.  Users that are overseas need to understand that their performance will not be as good as the performance for the people who are in a more local region.  There are some ways that you can address this, using SCM caching proxies and distributed Jazz SCM capabilities, but you always have to be aware of the issue.  Make sure that your end users have realistic expectations.

One of the less known causes of latency, and one that most people ignore, is the presence of switches and routers in a network.  A physical database server may be only 30 feet away from the Jazz application servers, but if these are on different subnets, and require routing between multiple switches to communicate, then the net effect is the same as having the machines thousands of miles apart.  Because of this is it always important to have a feel for what the typical round trip time is for your data.  You can use network tools to determine this, or you can use the Jazz Performance Health Check widget.  Do NOT use ping as a measure of latency.  Ping sends small packets, and networks will handle ping packets differently from Jazz application packets.  Trust the results that you see in the performance health check widget, since it does not use ping, but sends actual packets of data between the web client and the Jazz application servers.

Jazz applications are intended to perform in wide area networks, but anything that you can do to further minimize latency will result in improved end user performance.

Throughput

The second big physical factor that will limit your Jazz performance is the concept of throughput.  This is the amount of data that can be transitioned or transferred from one of the components of your solution architecture over a given period of time.  The common places that most people will observe throughput as a bottleneck to performance is at the web server layer.  A web server can only service a certain amount of requests in a given period of time.  This is a limitation of the Jazz solution that is imposed by the web server technology that you choose to deploy with (either Tomcat or WebSphere).

A more common limitation is the network.  Your network can only supply a certain volume of data in a given time period, regardless of the number of requests waiting to go onto the network.  Network bandwidth is not typically an issue, but it can become an issue in a couple of different situations.  In the case large builds, often teams will first create a workspace in which to execute the build.  In cases where the code base is large (like a full Android build), all of the code needs to be transferred from the repository to the local workspace.  Users need to understand that this is the equivalent of downloading the entire code base over the network.  In the case of large code bases, this transfer of data would take tens of minutes (if not hours) if just done via FTP over the network.  It will not go any quicker with Jazz, the files still need to be physically copied from the repository to the workspace area.  Coordinating large builds, and doing incremental builds, can help spread out and reduce some of the pain.  Having a build farm that is close to the Jazz application servers is another way to reduce the impact of builds on your network.

The second area where I have seen network bandwidth come into play is in virtual environments.  In some virtual environments, the virtual machine appears to have a full 100BaseT or gigabit ethernet connection to the network.  However, the virtual machine is actually sharing this connectivity with other virtual machines that exist on the same physical hardware.  So if you have 4 Jazz application servers running in their own virtual machines, on the same physical hardware, then the maximum amount of throughput for ALL machines will be limited by the network bandwidth.  in this situation, a large build being done out of the repository of one of your Jazz SCM instances, could potentially impact the performance of ALL of your Jazz applications, since the network bandwidth is being saturated by the files being transferred in support of the build.

Another area where throughput and bandwidth come into play is with the disk I/O used with a Jazz solution.  The Jazz application servers are not heavy users of the disk storage on their systems, since data is stored in the repositories, which are located on the database servers.  Since Jazz deployments do store a large number of artifacts, some of which have large sizes, the disk I/O on the database servers is critical.  The disk I/O controllers will limit how fast data can be pulled from the database, so it can be returned to the Jazz application server, and ultimately to the user.  Make sure that you are monitoring the speed and load put on the disk I/O controllers, so you can see when disk I/O throughput issues are limiting your performance.

Summary

Jazz performance has been a hot topic with many customers recently.  Everyone wants to know how to optimize their Jazz deployments.  What most people REALLY want is to balance the costs associated with the hardware needed to support a Jazz deployment, with the user expectations of acceptable performance.  Jazz administrators can tune and adjust their deployments to make them more efficient, but they need to realize that basic architecture and the physical limitations of the hardware environment will also have an impact on the overall performance of their deployments.  Being aware of how these physical limitations impact how you deploy your Jazz solution will allow you to make better architectural choices for your specific deployment.

Other Articles in the Series

This series of articles is completed.  Here is a list of the topics covered: