Introduction
For a while now I have been actively maintaining the sosreport debian package. I am also helping out making it available on Ubuntu.
I also have had multiple requests to make sosreport more easily usable in a Juju environment. I have finally been able to author a charm for the sosreport which will render its usage simpler with Juju.
Theory of operation
As you already know, sosreport is a tool that will collect information about your running environment. In the context of a Juju deployment, what we are after is the running environments of the units providing the services. So in order for the sosreport charm to be useful, it needs to be deployed on an existing unit.
The charm has two actions :
- collect : Generate the sosreport tarball
- cleanup : Cleanup existing tarballs
You would use the collect action to create the sosreport tarball of the unit where it is being run and cleanup to remove those tarballs once you are done.
Each action has optional parameters attached to it :
homedir | Home directory where sosreport files will be copied to (common to both collect & cleanup actions) |
options | Command line options to be passed to sosreport (collect only) |
minfree | Minimum of free diskspace to run sosreport expressed in percent,Megabytes or Gigabytes. Valid suffixes are % M or G (collect only) |
Practical example using Juju 2.0
Suppose that you are encountering problems with the mysql service being used by your MediaWiki service (yes, I know, yet one more MediaWiki example). You would have an environment similar to the following (Juju 2.0) :
$ juju status Model Controller Cloud/Region Version default MyLocalController localhost/localhost 2.0.0 App Version Status Scale Charm Store Rev OS Notes mediawiki unknown 1 mediawiki jujucharms 5 ubuntu mysql error 1 mysql jujucharms 55 ubuntu Unit Workload Agent Machine Public address Ports Message mediawiki/0* unknown idle 1 10.0.4.48 mysql/0* error idle 2 10.0.4.140 hook failed: "start" Machine State DNS Inst id Series AZ 1 started 10.0.4.48 juju-53ced1-1 trusty 2 started 10.0.4.140 juju-53ced1-2 trusty Relation Provides Consumes Type cluster mysql mysql peer
Here the mysql start hook failed to start for some reason that we want to investigate. One solution is to ssh to the unit and try to find out. You may be asked by a support representative to provide the data for remote analysis. This is where sosreport becomes useful.
Deploy the sosreport charm
The sosreport charm will be helpful in going to collect the information of the unit where the mysql service runs. In our example, the service runs on unit #2 so this is where the sosreport charm needs to be deployed. So in our example we would do :
$ juju deploy cs:~sosreport-charmers/sosreport --to=2
Once the charm is done deploying, you will have the following juju status :
$ juju status Model Controller Cloud/Region Version default MyLocalController localhost/localhost 2.0.0 App Version Status Scale Charm Store Rev OS Notes mediawiki unknown 1 mediawiki jujucharms 5 ubuntu mysql error 1 mysql jujucharms 55 ubuntu sosreport active 1 sosreport jujucharms 1 ubuntu Unit Workload Agent Machine Public address Ports Message mediawiki/0* unknown idle 1 10.0.4.48 mysql/0* error idle 2 10.0.4.140 hook failed: "start" sosreport/1* active idle 2 10.0.4.140 sosreport is installed Machine State DNS Inst id Series AZ 1 started 10.0.4.48 juju-53ced1-1 trusty 2 started 10.0.4.140 juju-53ced1-2 trusty Relation Provides Consumes Type cluster mysql mysql peer
Collect the sosreport information
In order to collect the sosreport tarball, you will issue an action to the sosreport service, telling it to collect the data :
$ juju run-action sosreport/1 collect Action queued with id: 95d405b3-9b78-468b-840f-d24df5751351
To verify the progression of the action you can use the show-action-status command :
$ juju show-action-status 95d405b3-9b78-468b-840f-d24df5751351 actions: - id: 95d405b3-9b78-468b-840f-d24df5751351 status: running unit: sosreport/1
After completion, the action will show as completed :
$ juju show-action-status 95d405b3-9b78-468b-840f-d24df5751351 actions: - id: 95d405b3-9b78-468b-840f-d24df5751351 status: completed unit: sosreport/1
Using the show-action-output, you can see the result of the collect action :
$ juju show-action-output 95d405b3-9b78-468b-840f-d24df5751351 results: outcome: success result-map: message: sosreport-juju-53ced1-2-20161221163645.tar.xz and sosreport-juju-53ced1-2-20161221163645.tar.xz.md5 available in /home/ubuntu status: completed timing: completed: 2016-12-21 16:37:06 +0000 UTC enqueued: 2016-12-21 16:36:40 +0000 UTC started: 2016-12-21 16:36:45 +0000 UTC
If we look at the mysql/0 unit $HOME directory, we will see that the tarball is indeed present :
$ juju ssh mysql/0 "ls -l" total 26149 -rw------- 1 root root 26687372 Dec 21 16:36 sosreport-juju-53ced1-2-20161221163645.tar.xz -rw-r--r-- 1 root root 33 Dec 21 16:37 sosreport-juju-53ced1-2-20161221163645.tar.xz.md5 Connection to 10.0.4.140 closed.
One thing to be aware of is that, as with any environment using sosreport, the owner of the tarball and md5 file is root. This is to protect access to the unit’s configuration data contained in the tarball. In order to copy the files from the mysql/0 unit, you would first need to change their ownership :
$ juju ssh mysql/0 "sudo chown ubuntu:ubuntu sos*" Connection to 10.0.4.140 closed. $ juju ssh mysql/0 "ls -l" total 26149 -rw------- 1 ubuntu ubuntu 26687372 Dec 21 16:36 sosreport-juju-53ced1-2-20161221163645.tar.xz -rw-r--r-- 1 ubuntu ubuntu 33 Dec 21 16:37 sosreport-juju-53ced1-2-20161221163645.tar.xz.md5 Connection to 10.0.4.140 closed.
The files can be copied off the unit by using juju scp.
Cleanup obsolete sosreport information
To cleanup the tarballs that have been previously created, use the cleanup action of the charm as outlined here :
$ juju run-action sosreport/1 cleanup Action queued with id: 3df3dcb8-0850-414e-87d5-746a52ef9b53 $ juju show-action-status 3df3dcb8-0850-414e-87d5-746a52ef9b53 actions: - id: 3df3dcb8-0850-414e-87d5-746a52ef9b53 status: completed unit: sosreport/1 $ juju show-action-output 3df3dcb8-0850-414e-87d5-746a52ef9b53 results: outcome: success result-map: message: Directory /home/ubuntu cleaned up status: completed timing: completed: 2016-12-21 16:49:35 +0000 UTC enqueued: 2016-12-21 16:49:30 +0000 UTC started: 2016-12-21 16:49:35 +0000 UTC
Practical example using Juju 1.25
Deploy the sosreport charm
Given the same environment with mysql & MediaWiki service deployed, we need to deploy the sosreport charm to the unit where the mysql service is deployed :
$ juju deploy cs:~sosreport-charmers/sosreport --to=2
Once deployed, we have an environment that looks like this :
$ juju status --format=tabular [Environment] UPGRADE-AVAILABLE 1.25.9 [Services] NAME STATUS EXPOSED CHARM mediawiki unknown false cs:trusty/mediawiki-5 mysql unknown false cs:trusty/mysql-55 sosreport active false cs:~sosreport-charmers/trusty/sosreport-2 [Units] ID WORKLOAD-STATE AGENT-STATE VERSION MACHINE PORTS PUBLIC-ADDRESS MESSAGE mediawiki/0 unknown idle 1.25.6.1 1 192.168.122.246 mysql/0 unknown idle 1.25.6.1 2 3306/tcp 192.168.122.6 sosreport/0 active idle 1.25.6.1 2 192.168.122.6 sosreport is installed [Machines] ID STATE VERSION DNS INS-ID SERIES HARDWARE 0 started 1.25.6.1 localhost localhost zesty 1 started 1.25.6.1 192.168.122.246 caribou-local-machine-1 trusty arch=amd64 2 started 1.25.6.1 192.168.122.6 caribou-local-machine-2 trusty arch=amd64
Collect the sosreport information
With the previous version of Juju, the syntax for actions is slightly different. To run the collect action we need to issue :
$ juju action do sosreport/0 collect
We then get the status of our action :
$ juju action status 2176fad0-9b9f-4006-88cb-4adbf6ad3da1 actions: - id: 2176fad0-9b9f-4006-88cb-4adbf6ad3da1 status: failed unit: sosreport/0
And to our surprise, the action has failed ! To try to identify why it has failed, we can fetch the result of our action :
$ juju action fetch 2176fad0-9b9f-4006-88cb-4adbf6ad3da1 message: 'Not enough space in /home/ubuntu (minfree: 5% )' results: outcome: failure status: failed timing: completed: 2016-12-22 10:32:15 +0100 CET enqueued: 2016-12-22 10:32:09 +0100 CET started: 2016-12-22 10:32:14 +0100 CET
So there is not enough space in our unit to safely run sosreport. This gives me the opportunity to talk about one of the parameter of the collect action : minfree. But first, we need to look at how much disk space is available.
$ juju ssh sosreport/0 "df -h" Warning: Permanently added '192.168.122.6' (ECDSA) to the list of known hosts. Filesystem Size Used Avail Use% Mounted on /dev/mapper/ubuntu--vg-root 222G 200G 11G 95% /
We see that there is at least 11Gb available. While it is below the 5% mark, we can change that by using the minfree parameter. Here is its description :
* minfree : Minimum of free diskspace to run sosreport expressed in percent, Megabytes or Gigabytes. Valid suffixes are % M or G (default 5%)
Since we have 11Gb available, let us set minfree to 5G :
$ juju action do sosreport/0 collect pctfree=5G Action queued with id: b741aa7c-537d-4175-8af9-548b1e0e6f7b
We can now fetch the result of our command, waiting for at most 600 seconds for the result :
$ juju action fetch b741aa7c-537d-4175-8af9-548b1e0e6f7b --wait=100 results: outcome: success result-map: message: sosreport-caribou-local-machine-1-20161222153903.tar.xz and sosreport-caribou-local-machine-1-20161222153903.tar.xz.md5 available in /home/ubuntu status: completed timing: completed: 2016-12-22 15:40:01 +0100 CET enqueued: 2016-12-22 15:38:58 +0100 CET started: 2016-12-22 15:39:03 +0100 CET
Cleanup obsolete sosreport information
As with the previous example, the cleanup of old tarballs is rather simple :
$ juju action do sosreport/0 cleanup Action queued with id: edf199cd-2a79-4605-8f00-40ec37aa25a9 $ juju action fetch edf199cd-2a79-4605-8f00-40ec37aa25a9 --wait=600 results: outcome: success result-map: message: Directory /home/ubuntu cleaned up status: completed timing: completed: 2016-12-22 15:47:14 +0100 CET enqueued: 2016-12-22 15:47:12 +0100 CET started: 2016-12-22 15:47:13 +0100 CET
Conclusion
This charm makes collecting information in a juju enviroment much simpler. Don’t hesitate to test it and please report any bug you may encounter.