Spring Data aggregation operations

MongoDB has a powerful system for aggregating data, called the aggregation pipeline. One particularly useful feature is called bucket operations.

Bucket operations take a collection of documents, and based on a property of those documents, and a list of boundaries, groups those documents into buckets. Each bucket is then transformed into an output document. One output document per bucket.

Buckets are great for time series analysis. Create buckets that break up the time series by day, or by hour, or whatever unit of time, and then perform aggregation operations on the documents within the bucket. For example, find the average temperature by day, the total number of messages sent by hour, or similar type of time series queries.

We are using MongoDB to store images and associated data in a collection called imageCapture. Documents in this collection have many fields, but in this post we’ll focus only on aggregation and consider only two fields on the imageCapture documents:

db.imageCapture.insertMany([
    {
        date: ISODate("2019-11-25T01:13:00.000Z"),
        lbls: ["one", "two", "three"]
    },
    {
        date: ISODate("2019-11-25T04:23:00.000Z"),
        lbls: ["four"]
    },
    {
        date: ISODate("2019-11-26T03:21:00.000Z"),
        lbls: ["five", "six"]
    },
]);

To understand the performance of our system, we want to know how many labels are being detected every day. That is, for every day, we want a sum of the size of the lbls array. This is perfect for MongoDB’s aggregation and bucket capabilities.

Bucket operations in Mongo

To use a bucket operation, we specify the following:

  • groupBy: the field that the boundaries will apply to. This field must be numeric or a date field.
  • boundaries: an array of boundary points. Documents which have a groupBy field falling between two elements in the array go into that bucket. The between test here is half-open, so the first point is inclusive, second point is exclusive, which is the behavior developers would expect
  • default: any documents in the pipeline which don’t go into one of the buckets will go into default. This is required. Using a match operation in the pipeline before the bucket operation will remove documents which shouldn’t be processed.
  • output: an aggregation expression to generate the output document for each bucket

Putting it all together, we create a query as follows:

db.imageCapture.aggregate([
    {
        $bucket: {
            groupBy: "$date",
            boundaries: [ISODate("2019-11-25T00:00:00.000Z"), ISODate("2019-11-26T00:00:00.000Z"), ISODate("2019-11-27T00:00:00.000Z"),
                ISODate("2019-11-28T00:00:00.000Z")],
            default: 'defaultBucket',
            output: {
                "count": {"$sum": {"$size": "$lbls"}}
            }

        }
    }
]);

We have arbitrarily selected some bucket boundaries. The match operation is skipped, but it would be recommended to put a match operation before the bucket in the pipeline, to reduce the number of documents going into the default bucket.

The output could have more fields, too. We might want another field for the most frequently used label in a bucket, or a list of all the labels used in a bucket, for example.

This query works exactly as expected and gives us a count of the total number of labels in each of these buckets. These buckets are not evenly spaced time periods, so this shows that buckets are quite flexible. Buckets are also often used for numeric data. A query might be used to show average income among people in different ranges of credit ratings, for example.

Buckets in Spring Data

Spring Data’s reference guide doesn’t give detailed examples of using buckets. Buckets are such an important feature of MongoDB, it’s worth presenting how to use them in Java. Of course, we could translate the query into a  BasicDBObject, but that’s equivalent to writing SQL query strings and passing them directly to a driver.

Every field in the output is created from an AggregationExpression. In this case, we want an AggregationExpression which is a sum of the size of all the arrays:

final AggregationExpression countingExpression = AccumulatorOperators.Sum.sumOf(ArrayOperators.Size.lengthOfArray("lbls"));

Make a starting point:

        final Instant now = LocalDate.now().atStartOfDay(ZoneOffset.UTC).toInstant();

This starts the time at the start of the day in the UTC timzeone.  Create the BucketOperation with the necessary boundaries, the AggregationExpression, and on the desired field:

final BucketOperation bucketOperation = Aggregation.bucket("date").
                withBoundaries(now.minus(10, ChronoUnit.DAYS), now.minus(9, DAYS),
                        now.minus(8, DAYS), now.minus(7, DAYS), now.minus(6, DAYS),
                        now.minus(5, DAYS), now.minus(4, DAYS), now.minus(3, DAYS),
                        now.minus(2, DAYS), now.minus(1, DAYS), now.minus(0, DAYS)).
                withDefaultBucket("defaultBucket").
                andOutput(countingExpression).as("count");

Now this AggregationOperation can be used in an aggregation pipeline:

final Aggregation aggregation = Aggregation.newAggregation(bucketOperation);

In real use, we would normally want a match operation before the bucket operation in the pipeline. It would be part of the newAggregation call.

Using the aggregation and getting results is very simple. If we had a class that matched the result format we could use that. Otherwise using an implementation of a Map is simple:

final AggregationResults<HashMap> ar = mongoOperations.aggregate(aggregation, "imageCapture", HashMap.class);

The result gives access to a List of HashMaps, accessible by ar.getMappedResults().

Viewing it all together:

        final AggregationExpression countingExpression = AccumulatorOperators.Sum.sumOf(ArrayOperators.Size.lengthOfArray("lbls"));

        final Instant now = LocalDate.now().atStartOfDay(ZoneOffset.UTC).toInstant();

        final BucketOperation bucketOperation = Aggregation.bucket("date").
                withBoundaries(now.minus(10, ChronoUnit.DAYS), now.minus(9, DAYS),
                        now.minus(8, DAYS), now.minus(7, DAYS), now.minus(6, DAYS),
                        now.minus(5, DAYS), now.minus(4, DAYS), now.minus(3, DAYS),
                        now.minus(2, DAYS), now.minus(1, DAYS), now.minus(0, DAYS)).
                withDefaultBucket("defaultBucket").
                andOutput(countingExpression).as("count");
        final Aggregation aggregation = Aggregation.newAggregation(bucketOperation);
        final AggregationResults<HashMap> ar = mongoOperations.aggregate(aggregation, "imageCapture", HashMap.class);
        LOG.info("And the list: " + ar.getMappedResults());

All these classes use a fluent coding style, and we can use static imports, and express it in a more naturally readable style:

        final Instant now = now().atStartOfDay(UTC).toInstant();

        final Aggregation aggregation = newAggregation(bucket("date").
                withBoundaries(now.minus(10, DAYS), now.minus(9, DAYS),
                        now.minus(8, DAYS), now.minus(7, DAYS), now.minus(6, DAYS),
                        now.minus(5, DAYS), now.minus(4, DAYS), now.minus(3, DAYS),
                        now.minus(2, DAYS), now.minus(1, DAYS), now.minus(0, DAYS)).
                withDefaultBucket("defaultBucket").
                andOutput(sumOf(lengthOfArray("lbls"))).as("count"));

        final AggregationResults<HashMap> ar = mongoOperations.aggregate(aggregation, "imageCapture", HashMap.class);

That’s all that’s needed to perform what would be a very complex query in SQL.

Spring Boot CLI

The code for this project is available on GitHub. The project itself runs as a Spring Boot CLI application. This makes it easy to test and experiment with the project. Download from GitHub, enter the project directory, and run:

mvn spring-boot:run

Relaunched site

We have joined the movement from Drupal to WordPress. We created our own custom theme based on Bootstrap 4. WordPress is easier to edit, especially with the new Gutenberg editor system. Using Bootstrap 4 gives us a responsive site. We used just a few plugins, including support for SVG graphics and relative paths. Overall it’s been an easy change and the site is now much easier to edit, loads faster, looks better, and is responsive.

Some of the old content is being moved over. Most is being deleted, as old technical tips are unhelpful.

Our new site doesn’t work at all on Internet Explorer, due to both the TLS settings, and the use of modern CSS.

Nginx, Ubuntu and ZoneMinder

With some effort, ZoneMinder runs well on Nginx as an alternative to Apache Httpd. Nginx is a better server in many ways and so it’s worth the effort. This article covers Zoneminder 1.31.43 with PHP 7 on Ubuntu 18.04 Bionic Beaver LTS, using all pre-built packages.
This article originally covered Ubuntu 16.04 but has been updated for 18.04 and is no longer accurate for 16.04.

Apache HTTP Server, fare thee well

Long ago, there was no Web. Then came CERN httpd in 1990 which was quickly replaced by NCSA HTTPd in 1993. That was better, but needed to be patched. The result was a patchy server, named the Apache server, first released in 1995.

The past 23 years have not been kind to the Apache server, although it still serves 27% of web traffic, mostly because of momentum. Nginx is steadily growing in market share at the expense of Apache.

Welcome Nginx

Nginx is a completely new code base, with a modern design, modern config files (goodbye, half-XML Apache config files). It has long had solid support for modern technologies such as SNI, HTTP/2, and WebSockets. At Chiral Software, we use all of those technologies. Apache supports them now also, but it has taken longer and the support is not as complete.

In particular, we couldn’t wait for SNI. It lets us use more than one server TLS certificate on a single IP address. And HTTP/2 is great, allowing faster page load times, especially on mobile browsers. Nginx has great support as a front-end proxy, allowing us to use any technology behind it. In fact, we use it as a front-end for servers written in PHP, Perl, and Java web apps, and it supports all of them easily. These diverse servers get the benefits of TLS and HTTP/2 without needing any special configuration.

ZoneMinder

ZoneMinder is a widely used open source security camera monitoring system. It has good points:

  • it works
  • it’s open source

and some bad points:

  • It is written in a bizarre mix of Perl, PHP, and C++. None of these languages are used for modern web development, and using all three of them together is witch’s brew of bad, obsolete web programming methods
  • It sends video streams as MJPEG, the worst possible format today
  • It is tied to MySQL, in a very specific non-default configuration
  • The installation is tied to Apache HTTP Server

We can’t fix the software problems, but we can install it on something other than Apache.

This guide shows how to install ZoneMinder from the PPA distribution on Ubuntu 18.04, running with PHP7 and Nginx.

ZoneMinder and CGI

ZoneMinder is an old application, based on CGI. Nginx does not have built-in support for CGI, but instead uses external executables. There are two types of CGI used within ZoneMinder: PHP for the user interface, and a compiled binary called ZMS, which is the ZoneMinder Streaming Server. ZMS is a small program to create the Motion JPEG (MJPEG) stream, which then displays in the browser for a live view. Overall streaming JPEG using a CGI program is a poor design and no modern software would work this way. However, it is easy to implement and it works.

Installing ZoneMinder on Ubuntu

18.04 doesn’t ship with a ZoneMinder package. It is necessary to add a PPA. The PPA has changed from the one that worked in 16.04.

% add-apt-repository ppa:iconnor/zoneminder-master
% apt update 
% apt install zoneminder zoneminder-doc

You’ll also need:

apt install nginx-extras fcgiwrap php7.2-fpm

Now we can configure the server.

Recommended: turn on TLS using certbot

I recommend installing certbot (letsencrypt) so you can secure your server. If you are planning to use TLS (you should), do that first. Certbot with Nginx is so easy, the only reason not to do it is if your server is on a non-routed network.

% apt install certbot
% systemctl stop nginx
% certbot --domains example.com certonly

Restart Nginx and test:

# systemctl start nginx

More detailed guides can be found to advise on TLS configuration, and I recommend doing a server test using Qualsys SSL Labs.

Nginx configuration

Make sure php7-fpm is running:

# systemctl status php7.2-fpm.service 
● php7.2-fpm.service - The PHP 7.2 FastCGI Process Manager
 Loaded: loaded (/lib/systemd/system/php7.2-fpm.service; enabled; vendor preset: enabled)
 Active: active (running) since Tue 2018-05-08 10:43:27 PDT; 3min 16s ago
 Docs: man:php-fpm7.2(8)
 Main PID: 27010 (php-fpm7.2)
 Status: "Processes active: 0, idle: 2, Requests: 0, slow: 0, Traffic: 0req/sec"
 Tasks: 3 (limit: 4915)
CGroup: /system.slice/php7.2-fpm.service ├─27010 php-fpm: master process (/etc/php/7.2/fpm/php-fpm.conf) ├─27011 php-fpm: pool www └─27012 php-fpm: pool www May 08 10:43:26 home systemd[1]: Starting The PHP 7.2 FastCGI Process Manager… May 08 10:43:27 home systemd[1]: Started The PHP 7.2 FastCGI Process Manager.

Stop ZoneMinder if it is running:

# systemctl stop zoneminder

Create the zm-include file in /etc/nginx:

location /zoneminder/cgi-bin {
   gzip off;
   root /usr/lib;
 
   include fastcgi_params;
   fastcgi_param SCRIPT_FILENAME /usr/lib/zoneminder/cgi-bin/nph-zms;
 
   fastcgi_intercept_errors on;
   fastcgi_pass unix:/var/run/fcgiwrap.socket;
}
 
location /zoneminder/ {
   gzip off;
   alias /usr/share/zoneminder/www/;
   index index.php;
 
   location ~ \.php$ {
      include fastcgi_params;
      # NOTE: You should have "cgi.fix_pathinfo = 0;" in php.ini
      fastcgi_param SCRIPT_FILENAME $request_filename;
      fastcgi_intercept_errors on;
      fastcgi_pass unix:/var/run/php/php7.2-fpm.sock;
   }
 
   location ~* /zoneminder/.*\.(txt|log)$ {
      deny all;
   }
 
   location ~* /zoneminder/.*\.(m4a|mp4|mov)$ {
      mp4;
      mp4_buffer_size 5M;
      mp4_max_buffer_size 10M;
   }
}

Include the above in the appropriate server definition. Restart Nginx, and navigate to https://example.com/zoneminder/

This will probably result in the following error:

Could not open config file.

Fix it:

# chown www-data:www-data /etc/zm/zm.conf

Reload and there’s another error:

ZoneMinder is not installed properly: php's date.timezone is not set to a valid timezone

Fix this by editing /etc/php/7.0/fpm/php.ini and setting the date.timezone field as appropriate. Restart php7-fpm so the change takes effect.

Before you do anything else

Set an administrator password and make it mandatory. There are bots probing open servers all the time for well-known services such as phpMyAdmin, ZoneMinder, Drupal, etc.

Go to options / users and click on admin. Set a password. Then go to options / system and check the box for OPT_USE_AUTH. The server should now prompt for a password.

Adding a camera, finding a problem

As soon as we attempt to add a camera, we get an ugly error:

SQL-ERR 'SQLSTATE[HY000]: General error: 1366 Incorrect integer value: '' for column 'ServerId' at row 1',

This error doesn’t happen when you use the automatic installation on Apache Httpd, but we’re not doing that.

As mentioned above, one problem with ZoneMinder is that it is closely connected to MySQL, and not just MySQL, but MySQL in a specific non-default configuration. The mode needs to be changed to NO_ENGINE_SUBSTITUTION.

Edit /etc/mysql/mysql.conf.d/mysqld.cnf . In the [mysqld] section, add sql_mode = NO_ENGINE_SUBSTITUTION and restart mysql:

# systemctl stop mysql
# systemctl start mysql

Viewing the camera

After the camera is added, click on the camera name on the main page. It should show a live camera view, but it does not. The problem is that a path to the ZMS must be set.

Go to options / Paths / PATH_ZMS and change it from /cgi-bin/nph-zms to /zoneminder/cgi-bin/nph-zms. Now live view should work.

ZoneMinder’s components

The ZoneMinder package installs in the following locations:

  • /usr/share/zoneminder/ PHP, perl and compiled binaries
  • /etc/zm/ Contains the zm.conf configuration file
  • /var/cache/zoneminder/ contains temp files and events (this will grow as ZM stores more events)
  • /var/run/zm/ contains zm.pid
  • /tmp/zm/ contains the Unix socket

We need to add CGI access for ZMS. There are some scripts published on other websites to do this. Fortunately, we don’t need to write a script. Ubuntu’s packages have everything we need, using fastcgi.

In my opinion, Nginx should be the default front-end server for ZoneMinder. Everything works using Ubuntu’s provided packages.

Problems and solutions

Do check your camera specs and make sure that the captured image size is supported by your camera, otherwise you will get warning about buffer size. It would be nice if the warnings were more informative, like, “the JPEG size did not match the monitor specification.”

On the main screen, change the function from “monitor” to “modetect” to get motion detection and recording.

After you enable modetect, try to trigger an event (walk in front of the camera). If the event is not captured, and the source shows up orange, and the log link shows up red, it means something is wrong. The first thing to look for is an error such as Can't make events/1: Permission denied in the logs. By default, ZoneMinder saves events to /var/cache/zoneminder. Remember, the user that is executing CGIs needs to have write permission in that directory. For Nginx, this means the directory should be owned by www-data:www-data.

# chown -R www-data:www-data /var/cache/zoneminder

I also got “slow down capture, speed up analysis or increase ring buffer size” errors. The Axis cameras can produce a lot of frames. I played around some with max FPS. Of course it depends on how fast the ZoneMinder server computer is. At 20 FPS my old computer has an uptime of 0.7, which indicates that’s about the limit for FPS. Switching to a more modern computer solved the problem.

Tips: ffmpeg

ZoneMinder can use the ffmpeg utility to create video files. These are more compact and suitable for storage. Install the ffmpeg binary and set the full path to the ffmpeg executable in the ZoneMinder options.

Conclusion

ZoneMinder is one of the few usable open source options right now. Unfortunately its design is quite out of date and it should be replaced. But it works.