Sometimes you may need to extract content from a word document. You will need to be aware of the structure. Extremely simplified, a Word document has the following structure:

  1. At the top level is a list of "parts".
  2. One part is the "main document part", m.
  3. The part m contains some w:p elements, represented in Docx4j as org.docx4j.wml.P objects. Semantically this represents a paragraph.
  4. Each paragraph consists of "runs" of text. These are w:r elements. I think that the purpose of these is to allows groups within paragraphs to have individual stylings, roughly like span in HTML.
  5. Each run contains w:t elements, or org.docx4j.wml.Text. This contains the meat of the text.

Here's how you define a traversal against a Docx file:

public class TraversalCallback extends TraversalUtil.CallbackImpl {
    public List<Object> apply(Object o) {
        if (o instanceof org.docx4j.wml.Text) {
            org.docx4j.wml.Text textNode = (org.docx4j.wml.Text) o;

            String textContent = textNode.getValue();

            log.debug("Found a string: " + textContent);

        } else if (o instanceof org.docx4j.wml.Drawing) {
            log.warn("FOUND A DRAWING");
        return null;

    public boolean shouldTraverse(Object o) {
        return true;

Note that we inherit from TraversalUtil.CallbackImpl. This allows us to avoid implementing walkJAXBElements() method ourselves -- although you still might need to, if your algorithm can't be defined in the scope of the apply method. It seems like the return value of apply is actually ignored by the superclass implementation of walkJAXBElements, so you can just return NULL.

To bootstrap it from a file, you just do the following:

URL theURL = Resources.getResource("classified/lusty.docx");

WordprocessingMLPackage opcPackage = WordprocessingMLPackage.load(theURL.openStream());
MainDocumentPart mainDocumentPart = opcPackage.getMainDocumentPart();

TraversalCallback callback = new TraversalCallback();

By modifying the apply method, you can special-case each type of possible element from Docx4j: paragraphs, rows, etc.

Posted 2018-01-04

Sometimes you may have a reason to deploy certain code. This normally involves something like the following: you copy some files to a certain server somewhere, and perhaps restart a server. This is all well known territory, but due to the vagaries of SSH, automating it can often be a pain. There are existing tools for this, Fabric and Capistrano, that are fairly well known but -- it seems to me -- underused. Anyway, they're certainly far from standard, and particularly with regard to Fabric (which I like and use on a near-daily basis, I should point out) they can be tricky to get installed and configured in their own right.

I devised this simple, perhaps even simplistic, plan to handle deployments.

  1. Create a UNIX user that will be used for deployments. For this article we'll refer to this user as dply, although the name is immaterial. This user must exist on the hosts that are the target of the deployments.

  2. Distribute SSH keys to the hosts that need to initiate deployments. This will often be a worker node in a CI system but people may also manually initiate deployments.

  3. Each deployment target receives an appropriate sudoers file that allows them to execute one command (and one only): the deployment processor, with the NOPASSWD specifier.

  4. The deployment user dply can write to a mode 700 directory that is used to receive deployment artifacts. Artifacts are written by a simple scp process to this directory, /home/dply or whatever you like.

  5. The deployment processor script, which is distributed identically to all the nodes and lives in /usr/local/bin, knows about all existing deployments, which are hardcoded with plaintext aliases like main-site, backend, etc, and knows to look for the artifacts in /home/dply or whatever.

  6. Nodes simply scp up the deployment archive, ssh to the relevant server and invoke sudo /usr/local/bin/deployment-processor backend. The processor then looks for the files in a hardcoded location and does whatever's needed to actually deploy them. Concretely in this case every handler is just a function in Perl which can then do many tasks. The key is that it doesn't get any input from the user, thus mitigating some security issues. It's easy to do the various things you may need to do, untar an archive, perhaps chmod some files, restart a service, etc.

It's secure in some senses, but not in others. There's no access isolation between nodes so any node can deploy any code. Once a CI worker node is assumed penetrated, a malicious user can indeed wipe out a production site, but they can't do damage to existing servers. (for whatever that's worth...)

It should be noted that no consensus exists around solutions in this space. It has some virtues over Fabric and probably Capistrano to, by being markedly less complicated, because it only relies on the presence of ssh and scp on the client boxes, which are near-universal. If you wanted to formalize it you could develop cross-platform deployment client binaries in Go or something similar, but I haven't found this necessary. Anecdotally I've had many unpleasant problems with fabric, although it remains a very useful piece of software.

I don't like to deploy with Git because I don't see Git as something that's related to deployments, Git is related to source code history which is distinct from something that I might consider a "release artifact". FWIW release artifacts are also built using a separate processing step, which (for me) is often just a "narrowing" of the file tree according to a set of rsync patterns and tarring up this narrowed tree.

Heroku also have an approach that involves creating "slugs" and "releases" where each release corresponds to a deployment, and "to create a release" is synonymous with "to deploy". This is much more featureful than the above approach but it's over-engineered for this case.

There's also WAR deployment which is interesting but specific to a rather small area of Java development. If you're a Java-only shop, this can probably be nice.

Something that was also on my radar in my department is the Perl-based Rex, which I never got the chance to investigate.

Posted 2017-12-09

I'm doing a project in C++ at present and experiencing mixed feelings about it. One of the worst things about using C++ is the necessity to come into contact with CMake, which is the bane of my existence. When I used to work on ProjectM, I used to wrestle with this system and ended up hating it. Anyway, now I'm starting a fresh C++ project, I started using the less popular (but far more ergonomic) SCons.

Anyway, like many C++ projects GoogleMock has a bias for CMake. So if you want to use SCons to build instead, here is a tiny SConstruct that you can use.

googletest_framework_root = "/home/amoe/vcs/googletest"

googletest_include_paths = [
    googletest_framework_root + "/googletest",
    googletest_framework_root + "/googletest/include",
    googletest_framework_root + "/googlemock",
    googletest_framework_root + "/googlemock/include"

gtest_all_path = googletest_framework_root + "/googletest/src/"
gmock_all_path = googletest_framework_root + "/googlemock/src/"

env = Environment(CPPPATH=googletest_include_paths)

    source=["", gtest_all_path, gmock_all_path],

Where your is a regular test driver, as such:

#include <gmock/gmock.h>

int main(int argc, char **argv) {
    testing::InitGoogleMock(&argc, argv);
    return RUN_ALL_TESTS();

You'll need to find some answer to actually getting the source of the test framework into the build filesystem -- Google Test doesn't build as a system library. That could be git submodules, a download script, or just dumping the repository inside yours.

Posted 2017-11-30

This is based on Camilla Panjabi's recipe. The only variations were, not using any cloves (that she mentioned in the recipe but not in the ingredients list -- a possible erratum?) and using pre-cooked lamb. I got the lamb from the butcher, a large leg joint on the bone. I stewed the entire joint for an hour and a half in a large pot, with some curry powder & balti masala for flavouring, which I presume didn't form a large part of the flavour of this dish itself, but I thought that since I plan to reuse the stock I may as well infuse something into it. The meat slid off the bone rather easily after that, with small pinker patches inside after cutting.

This curry has the singular innovation of creating the cumin-flavoured potatoes first: you saute a big handful of cumin seeds and shallow-fry whole small potatoes to give them a crispy skin on the outside. It looks very attractive when finished. The cumin clings to the outside of the potato. Then later you submerge these in curry liquid and boil them for about 10 minutes. This gives whole potatoes that are still firm to the palate. I also used large chunky sea salt on these potatoes.

The rest of it is rather standard. I didn't use a curry base for this one because I had ran out, so the onions are reduced from scratch. The first thing I noticed is what a long time it takes to get the onions the correct colour. It took nearly a whole hour. That's the real benefit of using the base, IMO, the time differential; there's probably not any large flavour benefit from a curry base, perhaps there's even a flavour deficit.

This one strangely has garam masala formed into a paste and added relatively early in cooking, which is somewhat of a departure.

I look forward to using this meat & potatoes pattern in the future; potatoes are great cupboard stock because they're cheap and last for ages. When you can boost the bulk and variation of a curry by this addition, everybody wins.

Posted 2017-11-19

When using CentOS 6 as a container under a Debian Sid host, you may face the following problem.

amoe@inktvis $ sudo lxc-create -n mycontainer -t centos
Host CPE ID from /etc/os-release: 
This is not a CentOS or Redhat host and release is missing, defaulting to 6 use -R|--release to specify release
Checking cache download in /var/cache/lxc/centos/x86_64/6/rootfs ... 
Downloading CentOS minimal ...

You have enabled checking of packages via GPG keys. This is a good thing. 
However, you do not have any GPG public keys installed. You need to download
the keys for packages you wish to install and install them.
You can do that by running the command:
    rpm --import public.gpg.key

Alternatively you can specify the url to the key you would like to use
for a repository in the 'gpgkey' option in a repository section and yum 
will install it for you.

For more information contact your distribution or package provider.

Problem repository: base
/usr/share/lxc/templates/lxc-centos: line 405: 24156 Segmentation fault      (core dumped) chroot $INSTALL_ROOT rpm --quiet -q yum 2> /dev/null
Reinstalling packages ...
mkdir: cannot create directory ‘/var/cache/lxc/centos/x86_64/6/partial/etc/yum.repos.disabled’: File exists
mv: cannot stat '/var/cache/lxc/centos/x86_64/6/partial/etc/yum.repos.d/*.repo': No such file or directory
mknod: /var/cache/lxc/centos/x86_64/6/partial//var/cache/lxc/centos/x86_64/6/partial/dev/null: File exists
mknod: /var/cache/lxc/centos/x86_64/6/partial//var/cache/lxc/centos/x86_64/6/partial/dev/urandom: File exists
/usr/share/lxc/templates/lxc-centos: line 405: 24168 Segmentation fault      (core dumped) chroot $INSTALL_ROOT $YUM0 install $PKG_LIST
Failed to download the rootfs, aborting.
Failed to download 'CentOS base'
failed to install CentOS
lxc-create: lxccontainer.c: create_run_template: 1427 container creation template for mycontainer failed
lxc-create: tools/lxc_create.c: main: 326 Error creating container mycontainer

This is due to vsyscall changes in recent kernels. To get this working, you need to add vsyscall=emulate parameter to your kernel command line (to be perfectly specific, the command line of the host because containers share a kernel.) To do this you can modify /etc/default/grub and run update-grub.

Posted 2017-11-03

Update 2017-11-13: Now you need to use ca-cert=/etc/ssl/certs/QuoVadis_Root_CA_2_G3.pem. I suppose they changed the certificate.

Here's a NetworkManager connection for Eduroam. This can live under /etc/NetworkManager/system-connections. Fill in your MAC address, your username and password, plus a unique UUID.






Posted 2017-11-02

Gear Description & Review, Nov 2017

I have a lot of gear configured in a very specific setup that has been that way for going on 2 years now. We can call this "configuration 2". Before configuration 2, I used a Kaoss Pad KP3+, plus the MS-20 and the ESX1 sampler for the setup, this I eventually found limiting because of the sequencing capabilities of the sampler. You can arrange songs and record effect sequences, but triggering chords on the synth is very limited, and getting seamless looping of chords for sampled pads is nigh-on impossible using the ESX1.

The current setup is:

  • Output: Behringer Xenyx ZB319 mixer
  • Path 1: Juno 106 → Eventide Space → mixer ch1
  • Path 2: MS-20 mini → Tonebone Hot British tube distortion → Moog Clusterflux → mixer ch2
  • Path 3: Nord Lead → Mojo Hand Colossus fuzz → Strymon Timeline → mixer ch3
  • Path 4: Korg ESX1 with replacement JJ ECC803 vacuum tubes (IIRC)

All driven by Sequentix Cirklon. Space & Timeline have MIDI inputs which can be driven from the Cirklon as well. The ESX1 has an audio in port which enables its use as an effects unit. It can receive a pre-mixed copy and apply effects using the Audio In part on the step sequencer. (Although this can easily create a feedback loop, leading to some fairly painful squealing.)

All gear is plugged into surge protected adapters and uses shielded cables for connection, this is key, a lot of noise happened before I did this. It's worth noting that there's still a lot of noise with this setup.

  1. The MS-20 is noisy, not insanely so, but noticably.
  2. The Juno chorus is noisy. The rest is fine.
  3. The Tonebone & Colossus are insanely noisy, but you'd expect that given that they are fuzz pedals.
  4. The ESX1 is noisy (although less noisy than before the tube replacement.)

The expensive pedals, Space, Timeline and Clusterflux are very clean-sounding.

Path 1: 106 → Space

The Space is a bit of an enigma. It seems to be capable of a huge variety of sounds, but it's a pain in the arse to use. The BlackHole and Shimmer algorithms sounds huge and are perfect for the type of music that I do. ModEchoVerb is also extremely useful. The trouble is that I tend to use the 106 for pad sounds and in this case the 106's legendary chorus is somewhat disrupted by the Space. It's difficult to get sounds that are "in the middle": either the effects from the Space are barely noticable (although I don't use headphones), or they are completely swamping the character of the input (for instance, when using Shimmer and BlackHole, the 106's setting is nigh-on irrelevant).

I am speculating on moving the Space to MS-20.

Path 2: MS-20 → Hot British → Clusterflux

The original goal of this setup was because I know from experience that the MS-20 excels at raw and cutting sounds, it's quite an aggressive sounding synth. The Hot British combination is definitely great, to be honest it would probably sound great through nearly any distortion, so I'm not totally sure that this is taking advantage of the high-end quality of the HB. And because the MS-20 is a monosynth I tend to use it less -- given that this is basically an amp in a box, as I understand it, it would probably be more suited to a polysynth, where you can imagine dialing in giant stoner chords. Regardless, if you feed an arpeggio into the MS-20 and crank the resonance/cutoff you're in acid heaven. I might replace it with some type of MIDI-controllable distortion. What I really want for this is a really quiet digital distortion (or a noise gate, I guess).

The ClusterFlux -- I just don't get on with it, I'm not sure why, because I love chorus and phasing. It would be very neat for some nice phased acid lead, but I don't find that it gives that much use for what I do. It does give a fatter chorus sounds that goes nicely with the MS-20's dual-VCO mode (one of the most notorious "basic settings" I've heard). The phasing sounds really great but I hardly ever use it. It's just too extreme a sound.

Incidentally I've found that the best route with the MS-20 is to take the time initially to find a nice sound and design the rest of the piece around it. It doesn't work very well the other way around. It's very difficult to find a matching sound to an existing composition. You don't even need to use it for melody, what can be nice is to just design little "hits" and articulations on the beat.

Path 3: Nord → Colossus → Timeline

This works the best out of the paths. The Nord is designed for leads, hence the name and is a treble-heavy synth. The Colossus transforms everything into trancy acid and the Timeline -- well, I barely know how to program the Timeline, I just use presets and they sound gorgeous. Some of the Timeline's presets are a touch subtle, you need to have "space" in the sound to be able to detect them. Just the initial preset, Melt Away, will blow you away when you hear it, it makes everything sound gorgeous. All the other presets sound gorgeous and there's a looping feature which I never use but it's nice to know it's there. Basically everything that comes out of this path sounds good. The Colossus is pretty tweakable as well -- perhaps a more restricted range of tones than the HB but a much more practical range.

Future directions: I'd like to integrate the KP3 back into the setup. The ESX1 I'd like to replace with another, more basic sampler -- it's got too much functionality for what I use it for now. Search for "super basic sampler" for some suggestions. Reverb and delay on drums can sound nice, but I don't want to use the Eventide for those moments. I think the reverb could sound good on the MS-20. The problem with the MS-20 is that the sound can be a little bit sparse. I also want to replace the MS-20 with a module. But the distorted MS-20 is a key sound so maybe put a nova drive on there? And put the Hot British onto the 106. This can sharpen up the tones of the 106 and stop it sounding so 80s. Plus we can get a low level output signal from the 106 to stop the HB blowing up.

The Clusterflux -- well, the 106 doesn't need it. The MS-20 wouldn't need it after you added the Space. All in all I may just sell it. Possibly if I saved some space I could add another synth and put it through that -- that would be a Waldorf FM synth -- but then we are starting to run out of connections for the Cirklon. I'd say that the Clusterflux is focused on a sort of guitar-specific sound, a kind of funky syncopated sound which we don't really use with the MS-20. It probably sounds better on something that's being played "live", where you're manually varying the note gate times to sync with the FX. I also want to replace the 106 with a 106 module and the MS-20 with an MS-20 module. I rarely play the MS-20 keyboard because the keyboard is too small. Full size keyboards are quite underrated. But it is kind of useful having two keyboards when you are working with someone else. Probably it's worth trying the Clusterflux on both the 106 and the Lead to see what sounds better.

Posted 2017-10-29

Org-Mode with Capture and Xmonad

This is a useful configuration for a distraction-free note-taking system. Often you want to quickly note something but don't want to break your flow too much.
The result will be a single key-combination that will pop a frame, allow you to write in a task, and have it be written to your default org-mode file.

Nearly all of the meat of this configuration comes from Phil Windley's blog post. I assume you already have experience with org-mode.

;; org-capture code for popping a frame, from Phil Windley

;; We configure a very simple default template
(setq org-capture-templates
      '(("t" "Todo" entry (file "")  ; use the default notes files
         "* TODO %? %i %t")))

(defadvice org-capture-finalize 
    (after delete-capture-frame activate)  
  "Advise capture-finalize to close the frame"  
  (if (equal "capture" (frame-parameter nil 'name))  

(defadvice org-capture-destroy 
    (after delete-capture-frame activate)  
  "Advise capture-destroy to close the frame"  
  (if (equal "capture" (frame-parameter nil 'name))  

;; make the frame contain a single window. by default org-capture  
;; splits the window.  
(add-hook 'org-capture-mode-hook  

(defun make-capture-frame ()  
  "Create a new frame and run org-capture."  
  (make-frame '((name . "capture") 
                (width . 120) 
                (height . 15)))  
  (select-frame-by-name "capture") 
  (setq word-wrap 1)
  (setq truncate-lines nil)
  ;; Using the second argument to org-capture, we bypass interactive selection
  ;; and use the existing template defined above.
  (org-capture nil "t"))

Now we add the binding to your (hopefully existing) call to the xmonad additionalKeysP function, which uses emacs-style notation for the keybindings.

popCaptureFrameCommand = "emacsclient -n -e '(make-capture-frame)'"

myConfig = desktopConfig
       -- your existing config here
    `additionalKeysP` [("M-o c", spawn popCaptureFrameCommand),
                       -- probably more existing bindings...

Now you can type M-o c, you'll be popped into a capture buffer, then C-c C-c will save and file it and close the window. It'll appear as a top-level heading in the file. You can change the template definition if you are more OCD-minded but I find that this simplistic configuration works and stays out of my way.

Posted 2017-10-29

This is now reaching the height of some ridiculousness, but this was made with a base formed through the following process:

  1. Forming an oil-based marinade with the Srichacha rub mentioned in my earlier post about the kaeng pa.
  2. Taking the remnants of the marinade which weren't absorbed by the tofu.
  3. Make a chicken stock from the carcass of a whole chicken that was leftover from making Sri Owen's ayam bakar, about which more later.
  4. When the ayam bakar is produced, it produces scorched chicken skin which melts into a mixture of rendered fat and unidentified black pieces.
  5. Combine the stock with the Srichacha marinade and the chicken skin.
  6. Raise to a rolling boil for 10 minutes for safety.
  7. Re-blend the entire mixture until smooth.
  8. Let the mixture cool, the fat will rise to the surface.
  9. Skim off all the fat, you should be able to just use a teaspoon. It will form a kind of foamy mass. You won't get rid of all of it, though.
  10. Pass the mixture through a fat separator. This will get rid of any clumps and hopefully remove remaining fat.

First cook the veggies by sauteeing them for around 5 minutes. It's good to use shallots, though you don't need any spices. Don't overcook the vegetables.

You'll now have a rather concentrated stock with a slightly bitter flavour. To form it into something suitable for soup, you simply dilute with equal amount of water (50% stock, 50% water). Then bring to the boil. Once boiling add the noodles. Do the noodles until al dente. Don't add salt.

Wait for the stock & noodle mix to cool, now add the veggies.

Post script: It's actually better not to mix the noodles with the soup until you're ready to eat them, because they can absorb too much liquid and end up mushy. You can save a separate container of the stock for boiling the noodles, or you can just do them with water. But keep them separated if you want to freeze the finished soup.

Posted 2017-10-17

In Emerick, Carper & Grand's 2012 O'Reilly book, Clojure Programming, they give an example of using Java interop to create JAX-RS services using a Grizzly server. These examples are now outdated and don't work with recent versions of Jersey.

Here's an updated version that is working correctly, at least for this tiny example.


(ns cemerick-cp.jaxrs-application
  (:gen-class :name cemerick_cp.MyApplication
  (:import [java.util HashSet])
  (:require [cemerick-cp.jaxrs-annotations]))

(defn- -getClasses [this]
  (doto (HashSet.)
    (.add  cemerick_cp.jaxrs_annotations.GreetingResource)))


(ns cemerick-cp.jaxrs-annotations
  (:import [ Path PathParam Produces GET]))

(definterface Greeting
  (greet [^String vistor-name]))

(deftype ^{Path "/greet/{visitorname}"} GreetingResource []
  (^{GET true Produces ["text/plain"]} greet        ; annotated method
   [this ^{PathParam "visitorname"} visitor-name]   ; annotated method argument
   (format "Hello, %s!" visitor-name)))


(ns cemerick-cp.jaxrs-server
  (:import [org.glassfish.jersey.grizzly2.servlet GrizzlyWebContainerFactory]))

(def myserver (atom nil))

(def properties
  {"" "cemerick_cp.MyApplication"})

(defn start-server []
  (reset! myserver (GrizzlyWebContainerFactory/create "http://localhost:8080/"

This uses the following Leiningen coordinates to run.

[org.glassfish.jersey.containers/jersey-container-grizzly2-servlet "2.26"]
[org.glassfish.jersey.inject/jersey-hk2 "2.26"]]

You probably also need to AOT some of these namespaces, I used :aot :all for this example.

Posted 2017-10-14

This blog is powered by coffee and ikiwiki.