eHarmony Engineering logo

Maven Dependency Wrangling

Rick Warren

December 2, 2014

What nested project versions does your main Apache Maven project actually depend on? How can you choose those nested versions wisely? This Blog post looks at the methodology Rick’s team follows. If you use Gradle or some other tool, the principles are the same though the implementation differs. Rick Warren

If you’re a Java developer, you’ve inevitably encountered a situation in which your project’s dependencies look like this:
proja

Projects A, B, and C are probably provided by third parties, and frequently A and B will depend on different versions of C. In real-world projects, this situation will occur many times over, and with deeper levels of nesting. So which versions of all of the “Project C”s out there does your project really depend on, and how can you choose those versions wisely?

In the remainder of this post, I’ll describe the methodology that I and my team follow, based on Apache Maven. If you use Gradle or some other tool, the principles that follow will be the same, but the implementation will be different.

tl;dr

If brevity is the soul of wit, you can tune out after reading this short list:

  1. Prefer newer versions to older ones.
  2. Use the Maven Enforcer Plugin to prevent version backsliding.
  3. In case of a version conflict in your POM file, prefer including newer versions over excluding older ones.


Maven Dependency-Management Basics

Many people have used Maven on and off for some time without really understanding how it resolves dependency versions. The strategy is based primarily on the depth of the dependency, and secondarily on the declaration order in the POM file.

  1. If your project depends transitively on two versions of the same artifact, the chosen version will be the one that is depended on by the shortest path.
  2. If there is a tie for shortest path, the winner will be the one reached from the first-level dependency that was declared first in the POM.

So for example, let’s expand the example above:
projb

Dependencies on Project C come along three different paths. Suppose they’re associated with the following versions:

  • A –> C 1.1
  • B –> C 1.2
  • D –> E –> C 1.5

Suppose that the declaration order for the top-level dependencies in the POM file is:

  1. D
  2. B
  3. A

The A –> C and B –> C paths are the shortest, so the D –> E –> C path will be ignored. A and B are tied, and B was declared first, so it wins. Your project depends on Project C version 1.2.

This system sucks, and not just for one reason. Here are a few:

  • It’s horribly complicated and hard to understand. And understanding matters, as illustrated by the following two reasons.
  • It’s completely brittle. Any time you upgrade a version of any dependency, or even reorganize your POM file, any transitive dependency might silently change its version.
  • Your code can break at run time! Dependencies D and E were developed, tested, and released against C 1.5. They may depend on features that didn’t exist in earlier versions. But now they’re going to run against version 1.2. By default, you will get no warning about this at build time.


Bring the Sanity

Do yourself a big favor, and add the following plugin executions to all of your POM files immediately:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-enforcer-plugin</artifactId>
    <executions>
        <execution>
            <id>enforce-ban-version-downgrades</id>
            <goals>
                <goal>enforce</goal>
            </goals>
            <configuration>
                <rules>
                    <requireUpperBoundDeps />
                </rules>
            </configuration>
        </execution>
        <execution>
            <id>enforce-ban-duplicate-classes</id>
            <goals>
                <goal>enforce</goal>
            </goals>
            <configuration>
                <rules>
                    <banDuplicateClasses>
                        <ignoreClasses>
                            <!-- Account for badly-designed libraries here -->
                        </ignoreClasses>
                        <findAllDuplicates>true</findAllDuplicates>
                    </banDuplicateClasses>
                </rules>
                <fail>true</fail>
            </configuration>
        </execution>
    </executions>
    <dependencies>
        <dependency>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>extra-enforcer-rules</artifactId>
        </dependency>
    </dependencies>
</plugin>


The XML above says two things:

  1. If the aforementioned dependency-resolution rules result in any dependency version rolling backwards (like 1.5 being replaced by 1.2 in our example), fail the build.
  2. After verifying that all of the artifact versions are monotonically increasing, verify that none of those artifacts directly embed the same classes, because those undeclared dependencies could also effectively roll backwards. If you find any, fail the build.

Why do you usually want the newest version of a dependency? Because when the developers of that version released it, they already knew about the versions that came before, and as thoughtful decent human beings, they gave due consideration to backwards compatibility. (If they didn’t, and you depend on two binary-incompatible versions of the same library, you’re probably screwed. Sorry. If you yourself are a library vendor, do your users a favor and increment a major version number when you break backwards compatibility. I’m looking at you, Patch Versions of Jackson 1.9.x, and at you, Protocol Buffers 2.4 to 2.5.)

When you first add these rules, your build will fail. Fix it. As a friend once told me, pain is just the feeling of weakness leaving your body.

But How Do I Fix My Build?

There are two ways. Take the first as far as it can go before resorting to the second.

    1. Include the version you want.
    2. Exclude the version you don’t want.

Recall from the previous section that the shortest path to a dependency wins. Therefore, if you declare a dependency in your POM file directly, it will take precedence over any transitive dependencies. I like to do it this way, to keep all of my versions easy to see and update:

<properties>
    <project.c.version>1.5</project.c.version>
</properties>
...
<dependencies>
    <dependency>
        <!-- Included to resolve version conflict -->
        <groupId>project.c</groupId>
        <artifactId>project.c</artifactId>
        <version>${project.c.version}</version>
    </dependency>
</dependencies>

This technique will solve most of your problems.

The alternative, dependency exclusion, looks like this:

<dependencies>
    <dependency>
        <groupId>project.a</groupId>
        <artifactId>project.a</artifactId>
        <version>${project.a.version}</version>
        <exclusions>
            <exclusion>
                <groupId>project.c</groupId>
                <artifactId>project.c</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>project.b</groupId>
        <artifactId>project.b</artifactId>
        <version>${project.b.version}</version>
        <exclusions>
            <exclusion>
                <groupId>project.c</groupId>
                <artifactId>project.c</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
</dependencies>

Why should you prefer inclusion over exclusion? Because it fails fast. To understand why this is important, imagine that the next version of Project A (which previously depended on Project C 1.1), now depends on Project C 1.6. This version of Project C is now the one you want! But if you’ve manually excluded the A –> C dependency, you’ve circumvented Maven’s ability to warn you about version rollbacks along that path. Maven will silently continue to choose the D –> E –> C dependency, Project A will run against an older version than the one it was tested against, and you may get run-time failures. In contrast, if you’ve manually included Project C 1.5, the plugin execution you added above will fail your build, and give you a chance to change project.c.version from 1.5 to 1.6.

When You Have to Exclude

Sometimes adding inclusions won’t fix your problem.

1. When version “numbers” contain letters:
Maven understands the word “SNAPSHOT” but is otherwise ignorant of ad hoc version-naming schemes. If an artifact has a version like “1.2.RELEASE” vs. “1.3.BETA”, Maven has no idea which of these versions came first. The Enforcer Plugin will fail the build if it sees versions like this, and you will have to manually exclude the one you don’t want. (Spring is a major offender in this category.)

2. When multiple dependencies contain classes of the same name:
Some library vendors repackage their own dependencies, rather than just declaring them in a POM file (or in documentation). This makes it impossible to properly manage diamond dependency patterns. The Enforcer Plugin’s ban-duplicate-classes rule will fail your build, which is a good thing: it tells you that your dependency vendors are wicked people, working against your sanity. Which version of these duplicate classes will be loaded? Depends on which copy comes first in each ClassLoader’s class path. Test your code with some paranoia. When you’re satisfied that it’s working, enumerate the offending packages in the “ignoreClasses” element of the Enforcer Plugin.

This general case also has a couple of special cases, which are slightly less egregious:

2a. When one vendor re-implements another’s classes:
Some libraries masquerade as other libraries for functional reasons. For example, if you manage your logging through SLF4J, but you depend on libraries that use Apache Commons Logging, you might be interested in the jcl-over-slf4j library, which redirects any messages logged to the Commons Logging API to SLF4J, so that you can configure them uniformly with your own logs. That library works by reimplementing the Commons Logging API. In order for it to work properly, you will need to manually exclude all copies of Commons Logging that you might be pulling in.

2b. When artifacts change names:
Sometimes a library vendor decides that their previous artifact organization is not appropriate, and they’ll change. Spring and Hibernate, for example, have both changed their group and artifact IDs at different points in the past. If you find yourself pulling in versions both before and after the split, Maven won’t have any way to know that they’re actually two versions of the “same” thing. Fortunately, as above, the ban-duplicate-classes rule will fail the build, and you will have the opportunity to resolve the discrepancy manually.

Or Just Use OSGI

OSGI folks have been squirming in their seats thus far. “We don’t have that problem!” they cry. Friends, you have been heard.

In Summary

Until Java has an actual functioning package-management facility, and unless you’re using a multi-class-loader container of some kind, you will have only a single version of any given class in use by your program at any given time. And if you’ve got a diamond-shaped dependency tree, you’ve got to give some thought as to which version that is.

I recommend the following guidelines. My team went through some small pain instituting them a while back, because our dependencies were a tangled mess. Since doing so, things have been much more understandable and manageable, and we’ve had fewer problems.

    1. Prefer newer versions to older ones unless you have a specific reason not to. Libraries are more likely to be backwards compatible than forwards compatible.
    2. Use the Maven Enforcer Plugin to ensure you depend on the versions you think you do. (Hint: without this step, you probably aren’t.)
    3. In case of a version conflict, first manually include the most recent version directly in your POM file.
    4. In certain special cases, inclusion won’t work. You will have to manually exclude the older versions.