Announcing the sosreport charm

Introduction

For a while now I have been actively maintaining the sosreport debian package. I am also helping out making it available on Ubuntu.

I also have had multiple requests to make sosreport more easily usable in a Juju environment. I have finally been able to author a charm for the sosreport which will render its usage simpler with Juju.

Theory of operation

As you already know, sosreport is a tool that will collect information about your running environment. In the context of a Juju deployment, what we are after is the running environments of the units providing the services. So in order for the sosreport charm to be useful, it needs to be deployed on an existing unit.

The charm has two actions :

  • collect    : Generate the sosreport tarball
  • cleanup  : Cleanup existing tarballs

You would use the collect action to create the sosreport tarball of the unit where it is being run and cleanup to remove those tarballs once you are done.

Each action has optional parameters attached to it :

homedir  Home directory where sosreport files will be copied to (common to both collect & cleanup actions)
options Command line options to be passed to sosreport (collect only)
minfree Minimum of free diskspace to run sosreport expressed in percent,Megabytes or Gigabytes. Valid suffixes are % M or G (collect only)

Practical example using Juju 2.0

Suppose that you are encountering problems with the mysql service being used by your MediaWiki service (yes, I know, yet one more MediaWiki example). You would have an environment similar to the following (Juju 2.0) :

$ juju status
Model Controller Cloud/Region Version
default MyLocalController localhost/localhost 2.0.0

App Version Status Scale Charm Store Rev OS Notes
mediawiki unknown 1 mediawiki jujucharms 5 ubuntu 
mysql error 1 mysql jujucharms 55 ubuntu

Unit Workload Agent Machine Public address Ports Message
mediawiki/0* unknown idle 1 10.0.4.48 
mysql/0* error idle 2 10.0.4.140 hook failed: "start"

Machine State DNS Inst id Series AZ
1 started 10.0.4.48 juju-53ced1-1 trusty 
2 started 10.0.4.140 juju-53ced1-2 trusty

Relation Provides Consumes Type
cluster mysql mysql peer

Here the mysql start hook failed to start for some reason that we want to investigate. One solution is to ssh to the unit and try to find out. You may be asked by a support representative to provide the data for remote analysis. This is where sosreport becomes useful.

Deploy the sosreport charm

The sosreport charm will be helpful in going to collect the information of the unit where the mysql service runs. In our example, the service runs on unit #2 so this is where the sosreport charm needs to be deployed. So in our example we would do :

$ juju deploy cs:~sosreport-charmers/sosreport --to=2

Once the charm is done deploying, you will have the following juju status :

$ juju status
Model Controller Cloud/Region Version
default MyLocalController localhost/localhost 2.0.0

App Version Status Scale Charm Store Rev OS Notes
mediawiki unknown 1 mediawiki jujucharms 5 ubuntu 
mysql error 1 mysql jujucharms 55 ubuntu 
sosreport active 1 sosreport jujucharms 1 ubuntu

Unit Workload Agent Machine Public address Ports Message
mediawiki/0* unknown idle 1 10.0.4.48 
mysql/0* error idle 2 10.0.4.140 hook failed: "start"
sosreport/1* active idle 2 10.0.4.140 sosreport is installed

Machine State DNS Inst id Series AZ
1 started 10.0.4.48 juju-53ced1-1 trusty 
2 started 10.0.4.140 juju-53ced1-2 trusty

Relation Provides Consumes Type
cluster mysql mysql peer

Collect the sosreport information

In order to collect the sosreport tarball, you will issue an action to the sosreport service, telling it to collect the data :

$ juju run-action sosreport/1 collect
Action queued with id: 95d405b3-9b78-468b-840f-d24df5751351

To verify the progression of the action you can use the show-action-status command :

$ juju show-action-status 95d405b3-9b78-468b-840f-d24df5751351
actions:
- id: 95d405b3-9b78-468b-840f-d24df5751351
 status: running
 unit: sosreport/1

After completion, the action will show as completed :

$ juju show-action-status 95d405b3-9b78-468b-840f-d24df5751351
actions:
- id: 95d405b3-9b78-468b-840f-d24df5751351
 status: completed
 unit: sosreport/1

Using the show-action-output, you can see the result of the collect action :

$ juju show-action-output 95d405b3-9b78-468b-840f-d24df5751351
results:
 outcome: success
 result-map:
 message: sosreport-juju-53ced1-2-20161221163645.tar.xz and sosreport-juju-53ced1-2-20161221163645.tar.xz.md5
 available in /home/ubuntu
status: completed
timing:
 completed: 2016-12-21 16:37:06 +0000 UTC
 enqueued: 2016-12-21 16:36:40 +0000 UTC
 started: 2016-12-21 16:36:45 +0000 UTC

If we look at the mysql/0 unit $HOME directory, we will see that the tarball is indeed present :

$ juju ssh mysql/0 "ls -l"
total 26149
-rw------- 1 root root 26687372 Dec 21 16:36 sosreport-juju-53ced1-2-20161221163645.tar.xz
-rw-r--r-- 1 root root 33 Dec 21 16:37 sosreport-juju-53ced1-2-20161221163645.tar.xz.md5
Connection to 10.0.4.140 closed.

One thing to be aware of is that, as with any environment using sosreport, the owner of the tarball and md5 file is root. This is to protect access to the unit’s configuration data contained in the tarball. In order to copy the files from the mysql/0 unit, you would first need to change their ownership :

$ juju ssh mysql/0 "sudo chown ubuntu:ubuntu sos*"
Connection to 10.0.4.140 closed.

$ juju ssh mysql/0 "ls -l"
total 26149
-rw------- 1 ubuntu ubuntu 26687372 Dec 21 16:36 sosreport-juju-53ced1-2-20161221163645.tar.xz
-rw-r--r-- 1 ubuntu ubuntu 33 Dec 21 16:37 sosreport-juju-53ced1-2-20161221163645.tar.xz.md5
Connection to 10.0.4.140 closed.

The files can be copied off the unit by using juju scp.

Cleanup obsolete sosreport information

To cleanup the tarballs that have been previously created, use the cleanup action of the charm as outlined here :

$ juju run-action sosreport/1 cleanup
Action queued with id: 3df3dcb8-0850-414e-87d5-746a52ef9b53

$ juju show-action-status 3df3dcb8-0850-414e-87d5-746a52ef9b53
actions:
- id: 3df3dcb8-0850-414e-87d5-746a52ef9b53
 status: completed
 unit: sosreport/1

$ juju show-action-output 3df3dcb8-0850-414e-87d5-746a52ef9b53
results:
 outcome: success
 result-map:
 message: Directory /home/ubuntu cleaned up
status: completed
timing:
 completed: 2016-12-21 16:49:35 +0000 UTC
 enqueued: 2016-12-21 16:49:30 +0000 UTC
 started: 2016-12-21 16:49:35 +0000 UTC

Practical example using Juju 1.25

Deploy the sosreport charm

Given the same environment with mysql & MediaWiki service deployed, we need to deploy the sosreport charm to the unit where the mysql service is deployed :

$ juju deploy cs:~sosreport-charmers/sosreport --to=2

Once deployed, we have an environment that looks like this :

$ juju status --format=tabular
[Environment]
UPGRADE-AVAILABLE
1.25.9

[Services]
NAME STATUS EXPOSED CHARM
mediawiki unknown false cs:trusty/mediawiki-5
mysql unknown false cs:trusty/mysql-55
sosreport active false cs:~sosreport-charmers/trusty/sosreport-2

[Units]
ID WORKLOAD-STATE AGENT-STATE VERSION MACHINE PORTS PUBLIC-ADDRESS MESSAGE
mediawiki/0 unknown idle 1.25.6.1 1 192.168.122.246
mysql/0 unknown idle 1.25.6.1 2 3306/tcp 192.168.122.6
sosreport/0 active idle 1.25.6.1 2 192.168.122.6 sosreport is installed

[Machines]
ID STATE VERSION DNS INS-ID SERIES HARDWARE
0 started 1.25.6.1 localhost localhost zesty
1 started 1.25.6.1 192.168.122.246 caribou-local-machine-1 trusty arch=amd64
2 started 1.25.6.1 192.168.122.6 caribou-local-machine-2 trusty arch=amd64

Collect the sosreport information

With the previous version of Juju, the syntax for actions is slightly different. To run the collect action we need to issue :

$ juju action do sosreport/0 collect

We then get the status of our action :

$ juju action status 2176fad0-9b9f-4006-88cb-4adbf6ad3da1
actions:
- id: 2176fad0-9b9f-4006-88cb-4adbf6ad3da1
 status: failed
 unit: sosreport/0

And to our surprise, the action has failed ! To try to identify why it has failed, we can fetch the result of our action :

$ juju action fetch 2176fad0-9b9f-4006-88cb-4adbf6ad3da1
message: 'Not enough space in /home/ubuntu (minfree: 5% )'
results:
 outcome: failure
status: failed
timing:
 completed: 2016-12-22 10:32:15 +0100 CET
 enqueued: 2016-12-22 10:32:09 +0100 CET
 started: 2016-12-22 10:32:14 +0100 CET

So there is not enough space in our unit to safely run sosreport. This gives me the opportunity to talk about one of the parameter of the collect action : minfree. But first, we need to look at how much disk space is available.

$ juju ssh sosreport/0 "df -h"
Warning: Permanently added '192.168.122.6' (ECDSA) to the list of known hosts.
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-root 222G 200G 11G 95% /

We see that there is at least 11Gb available. While it is below the 5% mark, we can change that by using the minfree parameter. Here is its description :

 * minfree : Minimum of free diskspace to run sosreport expressed in percent, 
             Megabytes or Gigabytes. Valid suffixes are % M or G 
            (default 5%)

Since we have 11Gb available, let us set minfree to 5G :

$ juju action do sosreport/0 collect pctfree=5G
Action queued with id: b741aa7c-537d-4175-8af9-548b1e0e6f7b

We can now fetch the result of our command, waiting for at most 600 seconds for the result :

 

$ juju action fetch b741aa7c-537d-4175-8af9-548b1e0e6f7b --wait=100

results:
 outcome: success
 result-map:
 message: sosreport-caribou-local-machine-1-20161222153903.tar.xz and sosreport-caribou-local-machine-1-20161222153903.tar.xz.md5
 available in /home/ubuntu
 status: completed
 timing:
 completed: 2016-12-22 15:40:01 +0100 CET
 enqueued: 2016-12-22 15:38:58 +0100 CET
 started: 2016-12-22 15:39:03 +0100 CET

Cleanup obsolete sosreport information

As with the previous example, the cleanup of old tarballs is rather simple :

$ juju action do sosreport/0 cleanup
Action queued with id: edf199cd-2a79-4605-8f00-40ec37aa25a9
$ juju action fetch edf199cd-2a79-4605-8f00-40ec37aa25a9 --wait=600
results:
 outcome: success
 result-map:
 message: Directory /home/ubuntu cleaned up
status: completed
timing:
 completed: 2016-12-22 15:47:14 +0100 CET
 enqueued: 2016-12-22 15:47:12 +0100 CET
 started: 2016-12-22 15:47:13 +0100 CET

Conclusion

This charm makes collecting information in a juju enviroment much simpler. Don’t hesitate to test it and please report any bug you may encounter.

Ce contenu a été publié dans Cloud, Juju, Technical, Ubuntu Server. Vous pouvez le mettre en favoris avec ce permalien.

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *