Laravel Problems: How to tune/troubleshoot/optimize docker deployment of jenkins testing Laravel app with behat controlling selenium running Chromedriver on AWS

Friday, April 14, 2017

How to tune/troubleshoot/optimize docker deployment of jenkins testing Laravel app with behat controlling selenium running Chromedriver on AWS

I have the following docker containers that I have set up to test my web application:

Jenkins
Apache 1 (serving a laravel app)
Apache 2 (serving a legacy codeigniter app)
MySQL (accessed by both Apache 1 and Apache 2)
Selenium HUB
Selenium Node — ChromeDriver

The jenkins job runs a behat command on Apache 1 which in turn connects to Selenium Hub, which has a ChromeDriver node to actually hit the two apps: Apache 1 and Apache 2.

The whole system is running on an EC2 m2.small instance (1 core, 2GB RAM) with AWS linux.

The problem

The issue I am having is that if I run the pipeline multiple times, the first few times it runs just fine (the behat stage takes about 20s), but on the third and consecutive runs, the behat stage starts slowing down (taking 1m30s) and then failing after 3m or 10m or whenever I lose patience.

If I restart the docker containers, it works again, but only for another 2-4 runs.

Clues

Monitoring docker stats each time I run the jenkins pipeline, I noticed that the Block I/O, and specifically the 'I' was growing exponentially after the first few runs.

For example, after run 1

After run 2

After run 3

After run 4

The Block I/O for the chromedriver container is 21GB and the driver hangs. While I might expect the Block I/O to grow, I wouldn't expect it to grow exponentially as it seems to be doing. It's like something is... exploding.

The same docker configuration (using docker-compose) runs flawlessly every time on my personal MacBook Pro. Block I/O does not 'explode'. I constrain Docker to only use 1 core and 2GB of RAM.

What I've tried

This situation has sent me down the path of learning a lot more about docker, filesystems and memory management, but I'm still not resolving the issue. Some of the things I have tried:

Memory

I set mem_limit options on all containers and tuned them so that during any given run, the memory would not reach 100%. Memory usage now seems fairly stable, and never 'blows up'.

Storage Driver

The default for AWS Linux Docker is devicemapper in loop-lvm mode. After reading this doc

https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/#configure-docker-with-devicemapper

I switched to the suggested direct-lvm mode.

`docker-compose restart`

This does indeed 'reset' the issue, allowing me to get a few more runs in, but it doesn't last. After 2-4 runs, things seize up and the tests start failing.

My Question...

What is happening that causes the block i/o to grow exponentially? I'm not clear if it's docker, jenkins, selenium or chromedriver that are causing the problem. My first guess is chromedriver.
What is a good approach to tuning a system like this with multiple moving parts?

Additonal Info

My chromedriver container has the following environment set in docker-compose:

- SE_OPTS=-maxSession 6 -browser browserName=chrome,maxInstances=3

docker info:

$ docker info
Containers: 6
 Running: 6
 Paused: 0
 Stopped: 0
Images: 5
Server Version: 1.12.6
Storage Driver: devicemapper
 Pool Name: docker-thinpool
 Pool Blocksize: 524.3 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: 
 Metadata file: 
 Data Space Used: 4.862 GB
 Data Space Total: 20.4 GB
 Data Space Available: 15.53 GB
 Metadata Space Used: 2.54 MB
 Metadata Space Total: 213.9 MB
 Metadata Space Available: 211.4 MB
 Thin Pool Minimum Free Space: 2.039 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: overlay null host bridge
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 4.4.51-40.60.amzn1.x86_64
Operating System: Amazon Linux AMI 2017.03
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.956 GiB
ID: 7XEV:LEON:H5R6:NDTT:C6PQ:I6UV:4ETW:WB2V:CBDD:DBRL:ONZU:SKTG
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/

via meriial