Monthly Archives: November 2014

How to Create a Maven Site in Multi-Module Builds (and Publish it on GitHub)

I recently published a small project on GitHub, mostly to play around with the features that GitHub offers over plain git. I came across the GitHub pages features, and wanted to try creating a site from my multi-module Maven project, and have that automatically published on GitHub pages.

It turns out that building a Maven site is a very fragile process, in particular for multi-module builds. There are countless blog posts and posts on stack overflow that deal with this problem, and I am adding just one more after spending half a day figuring things out. I just could not get the internal project links to work: Either the link worked from the direction of the site created by the parent POM to the child POM, or the other way around. At some point, I got the links working in both directions for the site’s side menu, but not for the links in the content section (e.g. the Project Modules overview page).

The key to the problem lies in how Maven creates the relative URLs in the project. Obviously, the project needs a “distribution management” section for the parent module (aggregator project) of the multi-module build. It also needs a “distribution management” section in each child module for the links to work. The URL in each of these sections has to be different in each project, otherwise the links in the report are broken. The reason for this is that some of the relative links in the generated site are computed based on the URL from the “distribution management” section.

In addition to the URL in the “distribution management” section, each module must also have a top-level URL parameter, because some of the links are computed relative to the URL from this top-level parameter.

To make things extra tricky, the two URLs must be the same, because otherwise the two algorithms that compute the links produce a site where some of the links work, and others do not.

As an example, I incorrectly used one of the links as a pointer to the project’s start page, and the other one as the pointer to the project’s Maven page. They shared the same domain, which probably contributed to the mess: The index.html page that the project-report plugin produces (“project information” -> “about” page -> “project modules” overview section) uses the main project URL to derive links, which all ended up having a “../” too much in the relative URL. The maven site plugin uses the “distribution management” URL in “mvn site:stage“, which resulted in the same links being correct in the “modules” side menu.

 

It is crucial to set the project URL correctly on each level, and then reference a module’s project URL again in its distribution management section to overwrite anything that may be inherited from a higher level POM.

The relevant section of the parent POM is set up like so:

<project>
  ...
  <url>http://domain.invalid/root/path/to/project/site/${project.version}/</url>
  ...
  <distributionManagement>
    ...
    <site>
      <id>some-id</id>
      <url>${project.url}</url>
    </site>
    ...
  </distributionManagement>
  ...
</build>

The id in the “site” section may have to match an entry in the ~/.m2/settings.xml file, depending on how the site is being distributed (e.g. automated upload to a webserver). For the setup on GitHub that I am using in this project, this is not required (I decided to copy locally, and then commit to the gh-pages branch – much less brittle, and not a lot of extra work).

Each child POM is set up like so:

<project>
  ...
  <url>http://domain.invalid/root/path/to/project/site/${project.version}/${project.artifactId}/
</url>
  ...
  <distributionManagement>
    ...
    <site>
      <id>some-id</id>
      <url>${project.url}</url>
    </site>
    ...
  </distributionManagement>
  ...
</build>

Note that the format of the URL must be specified as in the example, and the distribution management section must be present.

The distribution management section in the child POM looks exactly like the one in the parent POM. However, if it is omitted, then the distribution management section of the parent will eventually be used. It took me a while to figure this out, but apparently the order in which Maven seems to process this is to first do variable expansion, and then inheritance second. This means that, if the distribution management section were removed in the child, the child POM would inherit the parent’s distribution management section with the expanded URL, putting the value of the parent’s project URL into the child’s POM, and not the child’s URL that contains the artifact ID. Or, in other words, the URL value in the child POM would be ignored.

The interesting thing is that, according to the Maven Site Plugin documentation, the inheritance should work correctly as long as the multi-module build follows the Maven naming conventions:

  • If subprojects inherit the (distribution) site URL from a parent POM, they will automatically append their artifactId to form their effective deployment location. This goes for both the project url and the url defined in the <distributionManagement> element of the pom.
  • If your multi-module tree does not follow the Maven conventions, or if module directories are named differently than module artifacts, you have to specify the url’s for each child project.

As my project follows the standard naming convention, and removing both the <url> and the <distributionMananegemt> sections from the child POM leads to erratic behavior with broken links, the “automatic appending” is apparently not working correctly.

 

See https://github.com/mbeiter/util for an example on how to configure Maven to create the site in a multi-module build as described in this post. You may find the parent POM and the child POM of the DB module particularly interesting.

The BUILD.md file contains instructions on how to perform a release build using this setup. This in particular includes automatically (and consistently) setting the version for all POM files in the multi-module build, building the site, staging it locally, and then publish it on GitHub pages.

Note that it is recommended to always build the project before building the site, as the site goal may or may not report missing dependencies if it does not find them. A possible symptom of this problem is that the Javadoc plugin will not find your classes if “mvn install” is not run before “mvn site“:

I do not use a plugin to automatically publish the site on GitHub, but instead copy it manually from the staging folder in the “target” folder to the appropriate folder in the “gh-pages” branch of the same repository. This requires me to check out the project twice on my machine, once on the main branch, and once on the “gh-pages” branch. I still like this approach better than automating it, because building the site for multi-module builds seems to be very brittle in Maven, and this gives me the opportunity to review that everything has been built correctly (and all the links work) before I publish the site with a git commit. Also, while there are a few plugins that automate this process, such as com.github.github:site-maven-plugin and other plugins that implement wagon for GitHub in Maven, none of them seem to be able to properly deal with multi-module sites at this time.

I also use a small project intro page on GitHub pages with some basic project information and links to relevant documentation (e.g. the Maven site, and the Javadocs). The page I used there is a GitHub standard template. It looks good enough for my needs, and offers convenient links to download the project source code as a zip file or tar ball.

NIST Special Publication 800-160

I have recently been asked a lot about the NIST process mentioned in an earlier post. In a nutshell, NIST was working on a paper describing (software) security as a holistic approach, deeply embedding a security mindset into traditional systems engineering patterns with the goal of building secure and resilient systems, instead of bolting security on in a later stage of the game.

NIST has meanwhile published more current draft materials (http://csrc.nist.gov/publications/drafts/800-160/sp800_160_draft.pdf ). At this time, the link points to a draft dated May 2014. The draft was released about six months ago, and NIST 800-160 starts to pick up with the industry. I had numerous inquiries from major HP customers asking questions on HP’s internal security processes in context of NIST 800-160, and how HP is proactively dealing with security threats not yet known while the product is being built. The language of the requests with terms such as “resiliency”, “trustworthiness”, and “build security in” strongly resemble the language NIST and the Department of Homeland Security have chosen in their publications around software that should be able to not only withstand known attacks, but also be ready to resist new threats or possibly even actively defend against them.

John Diamant is an HP Distinguished Technologist and secure product development strategist in HP. He has done a lot of work on how to design security into a product, and with HP’s Comprehensive Applications Threat Analysis (CATA), John has created a great toolkit to automate parts of the security requirements and threat analysis.

John is working a lot with the US public sector, and he certainly sees a lot of the feedback HP receives around the NIST and DHS security initiatives. He has some very interesting comments on how to create secure and resilient software, and how a Secure Software Development Lifecycle (SSDLC) program fits into this: http://www.securitysolutionswatch.com/Interviews/in_Boardroom_HP_Diamant2.html