Jazz Performance Part 4 – Why WAIT, see what your Jazz server is doing

In the first two editions of this series, I looked at how you could do some simple network analysis and some simple monitoring of your Jazz servers.  In the third edition I looked at nmon, for showing you what the server is doing from a system perspective.  In this blog post, we get a even deeper into seeing what is happening on your Jazz server.  One of the nice little tools that I have encountered in my work with Jazz deployments, is a little tool called WAIT.  This tool works for ALL environments.

What is WAIT?

WAIT (Whole system Analysis of Idle Time) is a great little tool that analyzes javacores to tell you what is going on in your system.  It uses these javacore files as input, and does an analysis of the information within them.  Very handy to use, and it can be quite informative.

Setting up WAIT

First you need to go and find WAIT, so go to the WAIT homepage.  Now click on the new user button on this page, and get yourself registered with the WAIT tool.  So register yourself (it will be worth it).

WAIT New User Dialog

Now that you are registered, you’re ready to move on.  You will see an odd screen with text informing you that your registration was a success.  Just hit the green Use Wait button, and then acknowledge the license agreement.

Get Ready to Use WAIT

Now we get to the real user interface for the WAIT tool.  You will see a series of tabs across the top of the screen, for Submit Data, Report Gallery, Example Input, and Data Collection Scripts.  You might want to start out by checking out the demo, so you can see what WAIT will do for you.  Go ahead, it will give you some more background on WAIT.

The first thing we should do now is to check out the Data Collection Scripts page.  Click on the menu entry for Data Collection Scripts, and then take a look at the page.  Download the appropriate data collection utility for the platform that you will be working on.  These data collection scripts are really quite simple.  They just force the generation of a series of javacores, and then package these up for the WAIT tool to process.

ONE WORD OF CAUTION: The generation of javacores will stop the JVM briefly while the javacore is generated.  This can make your Jazz application “pause”, so be prepared.  In addition, these javacores can take up disk space.  I have had customers who have started generating javacores, and have forgotten to STOP generating them.  The system eventually crashes when the space on the disk gets completely filled with javacores.

Now lets try launching that data collection script.  I run on Ubuntu Linux (I know, it’s not officially supported for Jazz, but it works).  On my machine, I just put the script in a utils directory under my user directory.  Then I brought up the system monitor and found the java process that was running the RTC server that I have running on my system.  I took note of the PID of this process.

The process ID (PID) of my Jazz service

So now that I know my Jazz PID, I can kick off a collection of data.  I used this command line to get what I wanted:

./waitDataCollector.sh --iters 10 --javacoreDir /home/dtoczala/utils/WAIT_tools/cores --outputDir /home/dtoczala/utils/WAIT_tools/logs 19748

So let’s look at this command:

  • The ./waitDataCollector.sh invokes my script
  • The – – iters 10 option tells the tool to collect 10 javacores (at the 30 second default interval)
  • The – – javacoreDir path is the path to the javacore files
  • The – – outputDir path is the path to the resulting zip file of data collected.

Just execute the script without arguments to see a list of the arguments available to you.  After 5 minutes, I can go out and see the resulting zip file in the /logs directory.

Now that I have something, what do I do with it?

So now that I have the zip file with my data, I can go back out to the WAIT website.  At this point we can click on the Submit Data tab, and pull up that page.  Now I select my file to be analyzed, enter in a description, and then I Submit for Analysis.

Submit screen for WAIT tool

So now the WAIT tool will upload my zip file, and then process my javacores.  This can take a little while, depending on the speed of your network, so now is a good time to go and grab a beverage.

Checking the Results

Once the tool is done you will see some colorful output, like the boring sample that I am showing below.

Boring example of WAIT output

The top section has time running from left to right, and CPU utilization broken down over that time period.  Using this graph you can easily determine if other processes on your Jazz server are running out of control, and just how hard your Jazz instance(s) are straining your CPU capacity.

The next section shows the runnable threads.  This section, and the sections that follow, allow you to see what the threads of your Jazz server are doing.  Click on one of these colored sections, and look at the Stack Viewer section in the lower right.  You will see all of the various processes associated with that particular thread.  The rest of the report is just like this, with you being able to drill down and get more information on particular threads and processes.

Using this and the Pop Out button in the Stack Viewer section, you can see exactly what is happing on your Jazz server at various points in time.  You will want to look for threads and services that are taking up the most resources, and determine which operations are causing those to run.  This will help you determine what is impacting the performance of your Jazz server.  This is just another way to see what is going on “under the hood” of your Jazz application server.  One more note, don’t neglect the memory usage section.  Make sure to take a look at that too.

What Next?

This series of articles is continued.  Here is a list of the topics covered:

Jazz Performance Part 3 – What does nmon have to do with my Jazz Server?

In the first two editions of this series, I looked at how you could do some simple network analysis and some simple monitoring of your Jazz servers.  In this blog post, we get a little bit deeper into seeing what is happening on your Jazz server.  One of the nice little tools that I have encountered in my work with Jazz deployments, is a little tool called nmon.  This tool works on AIX and Linux environments.

Author’s Note: I run Jazz in an unsupported Ubuntu Linux environment, and nmon works fine on my system as well.  All of the screenshots in this article are from my own Ubuntu system.

What does nmon do and how do I get it?

Nmon is a great tool that allows you to see how your system is performing.  On Windows (or even Solaris) you have the perfmeter that does essentially the same thing.  A lot of of my customers tend to run their Jazz servers on Linux or AIX, so nmon becomes my friend in these instances.

You can learn about nmon from a a variety of locations, and even check out the nmon Introductory Workshop.  there are a variety of download areas for this as well, but I trust the download area on SourceForge.  It is now open source, so you can even dig in and help improve the tool if you want.

You will also want to download the nmonanalyser spreadsheet.  I know, it’s in Windows format, but I am not too proud to use Windows apps where it makes sense.  I am sure that you can make it work for your particular situation.  We’ll go over using this later in this article.

I’ve got it! Now what do I do?

Once you have nmon installed, it is usually helpful to run through some of the interactive screens.  Start up nmon (just type nmon), and you will see a menu screen.

nmon Interactive Menu

Take some time and check out the displays and data for some of the options that you see here.  The things that we will be primarily interested in when looking at our Jazz server will be CPU, memory, disks, network, and top-processes.  Bring up each of those displays and get familiar with the data that they are showing you.  Now that you seem to have this all under control, you are probably wondering exactly how you use this to tell you ANYTHING.  It’s showing me things in real time, and in order to debug performance issues you need data over a period of time.  You want to see how the system resources perform before, during, and after any performance issues that you might be having.

Using nmon from the command line

The nice thing about nmon is that you can run it from the command line, have it run for a period of time and collect statistics about your system, and then have all of that data free for you to analyze.  Just type “nmon –help” to see the help screen for the nmon utility, and you will see all of the different possible switches that you can use.

I like using the following command to launch nmon:

nmon -f -s 300 -c 288 -t

This will launch nmon to output to a spreadsheet (-f), making measurements every 5 minutes (-s 300), for a day (-c 288), while collecting information on the top processes (-t).  You would need to relaunch this every day.  This is good if you are doing this to find a current issue.

If you are just monitoring things, over the long term, then I would look at using the following command to launch nmon:

nmon -f -s 1800 -c 336 -t

Now keep in mind that the directory that you launch nmon from is where the results files will be written.  Make sure that you launch nmon in a spot where you will be able to easily organize the results (which come out named “<hostname>_yymmdd_hhmm.nmon”).

Launch this daily/weekly with a cron job (if on Linux), or a scheduled batch file (on Windows).  Use daily dumps for looking at current issues, and use weekly (or even monthly) collections for the ongoing monitoring of Jazz server performance.

What Do I Look For?

For a Jazz based server, I like to monitor the following areas, for the following reasons:

  • CPU – I like to know what the general load on the CPU is, but I am also interested in spikes at different times.  Seeing consistent spikes at the same time every day can indicate that something unique is occurring each day at that time.  For example, it could be the data warehouse ETL jobs, it could be some automated reports, or it could be the beginning of a workday for a particular site.
  • Memory – I like to make sure that we’re not running out of memory.
  • Disk I/O – I like to see how much we are going out to the local disk.  This is less important if you do not have a local Lucene index being maintained.
  • Network I/O – I like to monitor this to see if we are reaching the limits of what our network pipe can handle.  It also allows me to identify periods of high traffic.

One key thing to keep in mind – this information is not going to tell you where your problems are all by itself.  You need to correlate this data with times where you see performance degradation, and with expected user activities during those times.  Typical areas that I like to look into include:

  • Spikes in network I/O – often indicates a large amount of data which can impact performance, and which will often point us towards poor/slow network connections.
  • Spikes in disk I/O – may indicate issues with local storage.
  • Spikes in memory consumption – this will help you determine if you need to look more closely at JVM heap sizes and other memory parameters.
  • Saturation of CPU – if you see that the CPU is consistently running about 50%, then it might be time to think about setting up another Jazz server instance.

Using nmon with the nmonanalyser

So now that you have these nmon data files being generated, how do you take a good look at them?  This is where you will want to check out the nmonanalyser spreadsheet.  The spreadsheet will use macros (so you need to enable macros) to read in those nmon data files, and will dump the data into a large number of spreadsheets, each on a separate tab.  Each spreadsheet will cover a different aspect of system performance, and most of them have some built in graph which will show you what your system is doing.

Sample of CPU graph from nmon

As you can see in the example above, the graphs provide an easy way to visualize the various facets of system performance over a period of time.  To use the nmonanalyser spreadsheet, I grabbed my download and made a copy of it.   I then renamed the copy to something meaningful, so i always have a backup copy.  When you open the spreadsheet, you will see the initial startup screen.

Nmon statup screen

First you need to make sure that you ENABLE macros from this spreadsheet.  Then you will want to press the button to analyze your nmon data (which you collected earlier with that nmon -f -s 1800 -c 336 -t command).  When you hit the button, a file selection dialog will pop up, and you will select your nmon output file.  Once selected, you will see the macro begin to process your input file.  Once it has completed, it will prompt you for an output file.  Just select a good name, and location, for the resulting spreadsheet.  Exit the analyzer  then sit back and enjoy your results.

Flip through the tabs on the bottom of the spreadsheet, and you will see graphs for all sorts of different system parameters.  There are 22 different tabs on the spreadsheet, most of which have graphs to provide a nice visual representation of the data that was collected.

When looking at Jazz servers with nmon, it is important to keep in mind that application server performance is not the only thing that will impact the performance of your Jazz solution.  Jazz uses multiple applications (most notably the JTS, CCM, QM, and RM) all working in concert, to provide a software development experience with deep integration capabilities.  It will also rely on a database server for storing the repository contents, and the network is also a factor in performance.  What this will do is allow you to either eliminate the application server performance, or dig into it more deeply, in the search for better Jazz performance.

What Next?

This series of articles is continued.  Here is a list of the topics covered:

OSLC Community Survey

OSLC (Open Services for Lifecycle Collaboration) is an open integration framework, allowing software development tools from a variety of vendors to seamlessly operate and integrate, sharing data resources, and providing information to each other.  I follow the OSLC community because it is key to the most robust integrations that involve the Jazz based tools.

I just saw an announcement for the OSLC Community Survey, and thought that I should pass it along.  Submit your feedback and help the OSLC Community grow and strengthen.