Routing requests through nginx by querystring

I’m writing this up as much for my own future reference as I am to help others, since I found the documentation a bit lacking in this area.

For reasons not worth getting into, I needed to build a proof-of-concept to transparently proxy requests to different servers based on a querystring value, using nginx as a reverse proxy.

The TL;DR solution:

nginx.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
worker_processes 2;
events { worker_connections 1024; }
http {
upstream server1 {
server web1:8080;
}
upstream server2 {
server web2:8080;
}
map $arg_queryval $node {
default server2;
"abcd1234" server1;
}
server {
listen 80;
location / {
proxy_pass http://$node;
}
}
}

This transparently routes any request to the proxy with a querystring argument of queryval=abcd1234 to server 1. All other requests would go to server 2. It should also automatically include all of the headers and the full querystring, too.

This was a pain in the butt to figure out, and I ended up having to call on a friend more experienced than I am with nginx for the final critical detail that got this working – basically, each server (or collection of servers, since you could theoretically load balance across them) needs to be in its own upstream block.

Testing this was interesting. The straightforward way to do it was to set up a quick test environment using Docker and Docker Compose, which are tools I’ve been interested in but haven’t really worked with yet. Fortunately, there’s a lot of good information out there on how to set up these sorts of environments. Particularly helpful for me was Anand Mani Sankar’s article covering sample Docker Compose workflows using nginx and node.js, specifically.

So let’s delve into this a little! First we have our Express app. Nothing complicated, I just want something that traps all GET and POST requests and dumps the querystring, body, and headers to the console for inspection. There’s some repetition in this code, but that’s fine for quick-and-dirty.

index.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
const express = require("express");
const bodyParser = require("body-parser");
const app = express();
const os = require("os");
const PORT = 8080;
const HOST = "0.0.0.0";
app.use(bodyParser.json());
app.post("*", (req, res) => {
const dirOpts = { showHidden: true, depth: null };
console.dir(req.body, dirOpts);
console.dir(req.headers, dirOpts);
console.dir(req.query, dirOpts);
res.send("thanks");
});
app.get("*", (req, res) => {
console.log(process.env.worker_name);
res.send("it worked!");
});
app.listen(PORT, HOST, () => console.log(`app listening on ${HOST}:${PORT}`));

Pretty standard. Nothing of note in package.json other than that you want to have one, and the way I set it up, you do want to have a script for starting the app:

package.json
1
2
3
"scripts": {
"start": "node index.js"
}

That’ll do for both of our server nodes. Now we need to Dockerize it, which is pretty straightforward.

Dockerfile (node)
1
2
3
4
5
6
7
FROM node:9.2.0
WORKDIR /usr/src/app
COPY package.json .
RUN npm install
COPY . .
EXPOSE 3000
CMD ["npm", "start"]

Use the current version of node, copy package.json first to install deps, copy the app, expose a port, run the app.

Now, nginx. We already have the config file above, so let’s just deal with its Dockerfile.

Dockerfile (nginx)
1
2
3
4
5
6
7
8
9
10
FROM nginx:latest
RUN rm /etc/nginx/conf.d/default.conf
COPY nginx.conf /etc/nginx/nginx.conf
WORKDIR /etc/nginx
CMD ["nginx", "-g", "daemon off;"]
EXPOSE 80

Use the latest nginx image, remove the default config, copy in our config, run it. Also pretty straightforward. Not sure if removing the default config is absolutely necessary here. One thing I found was a pain was nginx ‘completing’ successfully and terminating when running Docker Compose interactively, so I had to set a few additional options. This was fine since I didn’t want to run Docker Compose in a detached mode – I wanted to see the logs immediately in the console. Probably not the ideal production configuration, but for testing, just fine.

Finally, we need to put all the pieces together with a Docker Compose file.

docker-compose.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
version: '3'
services:
load_bal:
build: ./nginx
ports:
- "8080:80"
links:
- web1:web1
- web2:web2
depends_on:
- web1
- web2
web1:
build: ./node1
ports:
- "8080"
web2:
build: ./node1
ports:
- "8080"

Pretty straightforward. You can see we’re creating two instances of the node app. They’re identical, but one will handle all the requests that have the specified querystring argument, and we’ll be able to see this in the console logs while docker-compose is running. We also create names for the two worker services and link them to the nginx (load_bal) container, which creates internal hostnames for them (you can see nginx.conf is referring to them as well). Finally, we map port 8080 on the host machine to port 80 in the nginx service.

You might also note that web1 and web2 are exposing port 8080, whereas in our Dockerfile for the node app, we expose port 3000. This is partly because I changed it after writing the node app and its Dockerfile, but before creating the compose file. It also seems that options set in the compose file will override ones set in the Dockerfile – though I wouldn’t take my word for it here as I’ve not gone digging to confirm that.

And the results:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
web2_1 | { foo: 'bar' }
web2_1 | { host: 'server2',
web2_1 | connection: 'close',
web2_1 | 'content-length': '13',
web2_1 | 'content-type': 'application/json; charset=utf-8',
web2_1 | 'user-agent': 'Paw/3.1.5 (Macintosh; OS X/10.13.1) GCDHTTPRequest' }
web2_1 | { queryval: 'abcd' }
load_bal_1 | 172.18.0.1 - - [16/Nov/2017:21:32:54 +0000] "POST /?queryval=abcd HTTP/1.1" 200 6 "-" "Paw/3.1.5 (Macintosh; OS X/10.13.1) GCDHTTPRequest"
web1_1 | { foo: 'bar' }
web1_1 | { host: 'server1',
web1_1 | connection: 'close',
web1_1 | 'content-length': '13',
web1_1 | 'content-type': 'application/json; charset=utf-8',
web1_1 | 'user-agent': 'Paw/3.1.5 (Macintosh; OS X/10.13.1) GCDHTTPRequest' }
web1_1 | { queryval: 'abcd1234' }
load_bal_1 | 172.18.0.1 - - [16/Nov/2017:21:33:05 +0000] "POST /?queryval=abcd1234 HTTP/1.1" 200 6 "-" "Paw/3.1.5 (Macintosh; OS X/10.13.1) GCDHTTPRequest"

As you can see from the logs, anything with the specified queryval is getting routed to web1; everything else goes to web2.

This really isn’t a production configuration in itself (of nginx or Docker), but for spiking out a quick proof of concept, it was pretty awesome. Setting up a fleet of virtual machines on my laptop would’ve taken way longer.

Installing Mono 5.2 on Ubuntu 17.04

Just a quick post to fill a gap on the Internet that was frustrating me, and which hopefully someone will find helpful as well. I don’t customarily work in Linux these days, but needed to set up a small Ubuntu 17.04 VM for some stuff I’m testing with F#. Since I wanted that fully working, I was trying to install Mono based on the download page instructions and was dismayed to find there were no instructions there for Ubuntu 17.04, and extensive Googling didn’t turn up anything.

After asking briefly in the Mono gitter.im chat - where, I’ll add, folks were super helpful with a slightly stupid question - they informed me that everything past 16.04 should, currently, be able to use the 16.04 repositories and install instructions. I tested this out, they were right.

So, if you’ve been looking all over the place trying to figure that one out… there you go. Note that this may or may not change when Ubuntu 17.10 drops later this month.

Org Mode Rendering Test

I generally enjoy using Emacs, but have been mainly using Visual Studio Code for blogging since it supports syntax highlighting in fenced markdown code blocks without making me jump through annoying hoops. That said, I find certain editor tasks easier when I’m using Emacs, particularly with my evil-mode keybindings and extensions, not to mention being able to use magit for managing the repository. Accordingly, trying to figure out a way to get this working in Emacs has been something I’d been poking at off and on.

There’s mmm-mode but I found it more than a bit flaky and difficult to use.

After doing some digging, I found there’s an org-mode renderer for Hexo, the blog engine I use to generate this site. There’s also a really helpful post on getting everything set up, and that’s what I used for the model. Works great, correctly renders and highlights syntax blocks in org-mode, and lets me use the keybindings and git client I prefer… so I guess I’ll try punching out a few posts and see how I like working with it. The only downside I’ve found is I can’t seem to specify a filename for source code blocks, but it’s a limitation I can live with for now.

Update: I ended up ripping this out, for various reasons. Mostly that it’s easier to get things formatted the way I want in Markdown using Visual Studio Code, and the Markdown fenced code blocks functionality in Hexo allows you to specify filenames for a code block as well. Sometimes it’s the little things. :)

Flattening a List

Getting back to writing articles after spending the better part of a month fighting off a sinus infection and helping my wife get over a nasty cold. Normally I love northeast Ohio, but I’m so over winter right now.

I read a post a month or so ago asking why it’s so difficult for programmers to write code to flatten a list… so naturally, this got me thinking about it and I wanted to tackle it. To start with, let’s try tackling it in Javascript.

For this sort of question, I actually think a dynamic language like JavaScript or Ruby is probably the ideal choice, and the reason for that is hinted at in the post when he remarks that:

Candidates fail to write proper method signatures. They get confused about what type of list they should use. Some start with List of integers List<Integer> ints. They fail to see how they will store a List<Integer> to a List<Integer>.

Since JavaScript lets you stuff pretty much anything into an array—for better or sometimes worse, as the case may be—the problem is much more straightforward to approach there.

He asks candidates to write test cases. Let’s start with a simple stub file and a test—I’m using Mocha here.

flatten.js
1
2
3
4
5
function flatten(inputArray) {
return inputArray;
}
module.exports.flatten = flatten;
flatten_test.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
var assert = require('assert');
var f = require('./flatten');
describe("Flatten", () => {
it("should return a flattened array when passed a nested array", () => {
var testArray = [1,[2,3], [4, [5,6]]];
var result = f.flatten(testArray);
assert.equal(result, [1,2,3,4,5,6]);
});
});
// [~/source/js_tests]$ mocha flatten_test.js -R list
//
// 1) Flatten Should return a flattened array when passed a nested array:
//
// AssertionError: [ 1, [ 2, 3 ], [ 4, [ 5, 6 ] ] ] == [ 1, 2, 3, 4, 5, 6 ]

Of course, since we’re just throwing the original array right back out, it immediately bombs. Not so useful! But on the plus side, this gives us a nice test harness for executing the file, so let’s dig into this further. I’m going to it.skip() our first test here to reduce test clutter while we work our way back into this. First, let’s make sure we’re handling a couple of straightfoward base cases correctly, such as single-element arrays that aren’t a sub-array:

flatten.js
1
2
3
4
5
6
7
8
9
10
11
12
function flatten(inputArray) {
var result = [];
if (inputArray instanceof Array) {
if (inputArray.length === 1 && !(inputArray[0] instanceof Array)) {
// Nothing further to do. Return the original array.
return inputArray;
}
}
return result;
}
flatten_test.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
it("should return the original array if it's a single element with no nested arrays", () => {
var testArray = [1];
var result = f.flatten(testArray);
assert.equal(testArray, result);
});
// [~/source/js_tests]$ mocha flatten_test.js
//
// Flatten
// ✓ should return the original array if it's a single element with no nested arrays
// - should return a flattened array when passed a nested array
//
//
// 1 passing (8ms)
// 1 pending

Okay. Now let’s try a multi-element array with no sub-arrays:

flatten.js
1
2
3
4
5
6
7
8
9
10
11
12
if (inputArray instanceof Array) {
if (inputArray.length === 1 && !(inputArray[0] instanceof Array)) {
// Nothing further to do. Return the original array.
return inputArray;
}
var processArrayItem = (item) => {
result.push(item);
};
R.forEach(processArrayItem, inputArray);
}
flatten_test.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
it("should return a two element array with no sub-arrays", () => {
var testArray = [1, 2];
var result = f.flatten(testArray);
assert.equal(testArray, result);
});
// [~/source/js_tests]$ mocha flatten_test.js
//
// Flatten
// ✓ should return the original array if it's a single element with no nested arrays
// 1) should return a two element array with no sub-arrays
// - should return a flattened array when passed a nested array
//
// 1 passing (8ms)
// 1 pending
// 1 failing
//
// 1) Flatten should return a two element array with no sub-arrays:
//
// AssertionError: [ 1, 2 ] == [ 1, 2 ]

Alrighty then… wait, what? Why is it saying that [1, 2] doesn’t equal [1, 2]? We need to check using deepEqual instead of just equal—it’s a bit much to get into here, but this link goes into a quick overview of the differences. Interestingly, using equal works fine on single-element arrays and doesn’t try to tell you that [1] and [2] are equal.

Change the assertion and the tests pass:

flatten_test.js
1
2
3
4
5
6
7
8
9
10
11
assert.deepEqual(testArray, result);
// [~/source/js_tests]$ mocha flatten_test.js
//
// Flatten
// ✓ should return the original array if it's a single element with no nested arrays
// ✓ should return a two element array with no sub-arrays
// - should return a flattened array when passed a nested array
//
// 2 passing (10ms)
// 1 pending

Moving on, now we should be able to focus all our attention on the processArrayItem function since that’s going to be doing most of the rest of the work. Currently, it naively pushes each array item into the result array, assuming that it’s a single element and not a sub-array. Clearly not what we want here so let’s see what we can do about that. We’ll add a test and then write some code to see if we can get it to go green.

flatten_test.js
1
2
3
4
5
6
it("should flatten sub-arrays", () => {
var testArray = [1, [2, 3]];
var result = f.flatten(testArray);
assert.deepEqual(result, [1, 2, 3]);
});
flatten.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
var processArrayItem = (item) => {
if (item instanceof Array) {
var res = flatten(item);
console.log('flattened subarray:', res);
result.push(res);
} else {
result.push(item);
}
};
// [~/source/js_tests]$ mocha flatten_test.js
//
// Flatten
// ✓ should return the original array if it's a single element with no nested arrays
// ✓ should return a two element array with no sub-arrays
// flattened subarray: [ 2, 3 ]
// 1) should flatten simple sub-arrays
// - should return a flattened array when passed a nested array
//
// 2 passing (10ms)
// 1 pending
// 1 failing
//
// 1) Flatten should flatten simple sub-arrays:
//
// AssertionError: [ 1, [ 2, 3 ] ] deepEqual [ 1, 2, 3 ]
// + expected - actual

So yeah, that didn’t quite work; the recursive function worked perfectly, but it returned its results all at once. I anticipated this and added a bit of console.loging; you can see we got back an array that got pushed into the results array as a single element, and puts us right back around where we started.

So… how do we attack this? Since we’re already using Ramda, we could just use the concat method for this:

flatten.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
var processArrayItem = (item) => {
if (item instanceof Array) {
result = R.concat(result, flatten(item));
} else {
result.push(item);
}
};
// [~/source/js_tests]$ mocha flatten_test.js
//
// Flatten
// ✓ should return the original array if it's a single element with no nested arrays
// ✓ should return a two element array with no sub-arrays
// ✓ should flatten simple sub-arrays
// - should return a flattened array when passed a nested array
//
// 3 passing (7ms)
// 1 pending

That works! Of course, if we’re using Ramda… we could’ve also just called R.flatten on this and then called it a day. :)

I’m also looking at my original flatten function and realizing it’s a bit redundant. We’re flattening an array; of course it’s going to take array objects! So… we don’t actually need to have everything inside a nested “is this an array?” check. Let’s remove that and replace it with a simple guard clause.

flatten.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
var R = require('ramda');
function flatten(inputArray) {
var result = [];
if (!inputArray instanceof Array) {
return result.push(inputArray);
}
if (inputArray.length === 1 && !(inputArray[0] instanceof Array)) {
// Nothing further to do. Return the original array.
return inputArray;
}
var processArrayItem = (item) => {
if (item instanceof Array) {
result = R.concat(result, flatten(item));
} else {
result.push(item);
}
};
R.forEach(processArrayItem, inputArray);
return result;
}
module.exports.flatten = flatten;
// [~/source/js_tests]$ mocha flatten_test.js
//
// Flatten
// ✓ should return the original array if it's a single element with no nested arrays
// ✓ should return a two element array with no sub-arrays
// ✓ should flatten simple sub-arrays
// - should return a flattened array when passed a nested array
//
// 3 passing (7ms)
// 1 pending

Since we have tests, we can see we didn’t break anything there. Let’s try un-skipping that more complicated test, applying that deepEqual fix to it, and seeing if we get the behavior we want.

flatten_test.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
it("should return a flattened array when passed a nested array", () => {
var testArray = [1, [2, 3], [4, [5, 6]]];
var result = f.flatten(testArray);
assert.deepEqual(result, [1, 2, 3, 4, 5, 6]);
});
// [~/source/js_tests]$ mocha flatten_test.js
//
// Flatten
// ✓ should return the original array if it's a single element with no nested arrays
// ✓ should return a two element array with no sub-arrays
// ✓ should flatten simple sub-arrays
// ✓ should return a flattened array when passed a nested array
//
// 4 passing (7ms)

Oh—but we don’t have a test case for that guard clause! We know everything works when we pass in an array. Let’s also make sure we get the expected behavior if we just pass in a bare element - it should “flatten” it into a single element array. I’ll note I’ve broken with good practice here; by the strict approach to TDD, I should have written a test for this prior to even adding that guard clause. However, I think it’s okay to be a little lax while you’re spiking concepts (or blogging :)). Even so, let’s clean this up.

flatten_test.js
1
2
3
4
5
6
7
it("should return an array when passed a bare object", () => {
var result = f.flatten(5);
assert.deepEqual(result, [5]);
});
// 1) Flatten should return an array when passed a bare object:
//
// AssertionError: [] deepEqual [ 5 ]

Huh… well, crap. Let’s tweak that guard clause:

flatten.js
1
2
3
if (!(inputArray instanceof Array)) {
return [inputArray];
}

… and that gets it. Note that the instanceof check needs to be wrapped in extra parentheses to negate it within an if statement; it took me a while to run that one down.

All that aside, we’ve accumulated a bit of clutter here; we don’t actually need the initial check for single-length arrays, since our processArrayItem function will handle that scenario just fine. Let’s go ahead and strip that out and see what happens:

flatten.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
function flatten(inputArray) {
var result = [];
if (!(inputArray instanceof Array)) {
return [inputArray];
}
var processArrayItem = (item) => {
if (item instanceof Array) {
result = R.concat(result, flatten(item));
} else {
result.push(item);
}
};
R.forEach(processArrayItem, inputArray);
return result;
}
// [~/source/js_tests]$ mocha flatten_test.js
//
// Flatten
// 1) should return the original array if it's a single element with no nested arrays
// ✓ should return a two element array with no sub-arrays
// ✓ should flatten simple sub-arrays
// ✓ should return a flattened array when passed a nested array
// ✓ should return an array when passed a bare object
//
// 4 passing (10ms)
// 1 failing
//
// 1) Flatten should return the original array if it's a single element with no nested arrays:
//
// AssertionError: [ 1 ] == [ 1 ]

We’re not returning the same object anymore, so we need to fix our test to use deepEqual there too… and that gets it.

Next up - we don’t actually need to use ramda’s forEach method here; Javascript itself has forEach built in to the array object. It also has a concat method, so we can entirely write this in vanilla JavaScript:

flatten.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
function flatten(inputArray) {
var result = [];
if (!(inputArray instanceof Array)) {
return [inputArray];
}
inputArray.forEach((item) => {
if (item instanceof Array) {
result = result.concat(flatten(item));
} else {
result.push(item);
}
});
return result;
}
module.exports.flatten = flatten;

And to recap, here’s what our tests look like right now - I made a minor tweak to keep the result versus test values consistently ordered across tests, since that’s more readable while debugging.

flatten_test.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
var assert = require('assert');
var f = require('./flatten');
describe("Flatten", () => {
it("should return the original array if it's a single element with no nested arrays", () => {
var testArray = [1];
var result = f.flatten(testArray);
assert.deepEqual(result, testArray);
});
it("should return a two element array with no sub-arrays", () => {
var testArray = [1, 2];
var result = f.flatten(testArray);
assert.deepEqual(result, testArray);
});
it("should flatten simple sub-arrays", () => {
var testArray = [1, [2, 3]];
var result = f.flatten(testArray);
assert.deepEqual(result, [1, 2, 3]);
});
it("should return a flattened array when passed a nested array", () => {
var testArray = [1, [2, 3], [4, [5, 6]]];
var result = f.flatten(testArray);
assert.deepEqual(result, [1, 2, 3, 4, 5, 6]);
});
it("should return an array when passed a bare object", () => {
var result = f.flatten(5);
assert.deepEqual(result, [5]);
});
});

All of the tests run and pass. It’s a relatively functional approach; our flatten function doesn’t have any external side effects and we’re not using any explicit ‘for’ loops with an index (I generally dislike these). We could replace the if statement in the forEach with a Ramda cond matcher, but honestly - that feels like massive overkill, especially since we don’t actually need Ramda for anything else. So, in this case, I think I’d advocate keeping it simple and not introducing a substantial dependency that I don’t really need.

The only thing I don’t particularly care for is using the concat method; it feels like we’re cheating a bit since we’re getting another array back in some cases and shoving it into our results array. This works, and it solves the problem - but I don’t totally like it.

One approach would be adding an accumulator to our flatten function allowing us to pass in the original result array, if it exists, and simply append more values to that. Let’s try it! We’ll write a quick accumulatingFlatten function and then just change the module.exports to return that for the flatten function instead; that’ll let us see if all our tests still work.

flatten.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
function accumulatingFlatten(inputArray, acc) {
if (!(inputArray instanceof Array)) { return [inputArray]; }
acc = acc || [];
inputArray.forEach((item) => {
if (item instanceof Array) {
accumulatingFlatten(item, acc);
}
else {
acc.push(item);
}
});
return acc;
}
module.exports.flatten = accumulatingFlatten;
// [~/source/js_tests]$ mocha flatten_test.js
//
// Flatten
// ✓ should return the original array if it's a single element with no nested arrays
// ✓ should return a two element array with no sub-arrays
// ✓ should flatten simple sub-arrays
// ✓ should return a flattened array when passed a nested array
// ✓ should return an array when passed a bare object
//
// 5 passing (7ms)

Awesome. Still works.

This post got a bit long - I try to post fairly substantial code examples, because I find posts with lots of small snippets hard to follow. Hopefully it’s a useful discussion of how you’d reason your way through a toy problem like this, with some demonstration of testing thrown in for good measure. It bears repeating, though - this is a code golf problem, but please don’t write your own array flattening function for production use. Lodash and Ramda are both well-written, extensively-tested libraries that already have a method for this, and you’re better off just using that.

Bank Transactions with Ramda, part 2

When we left off, we’d gotten our data imported from the CSV, run a map operation on it to add some extra metadata, demonstrated how to use Ramda to filter it, and had a quick demonstration of simple currying. Let’s move on. To recap, this is where we’re starting from:

parse.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
const fs = require('fs');
const csv = require('fast-csv');
const R = require('ramda');
function getTransactionsFromFile(fname) {
return new Promise(async (resolve, reject) => {
if (!fs.existsSync(fname)) {
return reject('file does not exist!');
}
let transactions = [];
let stream = fs.createReadStream(fname);
csv.fromStream(stream, { headers: true })
.on('data', (data) => { transactions.push(data); })
.on('error', (err) => { return reject(error); })
.on('end', () => { return resolve(transactions); });
});
}
async function run() {
try {
let transactionList = await getTransactionsFromFile('./data.csv');
processTransactions(transactionList);
}
catch (ex) {
console.log('I caught an error:', ex);
return;
}
}
const tagTransaction = (transaction) => {
// Note that this is checking transactions by exact payee name match;
// "MCDONALDS #372" and "MCDONALDS #778" would not match.
const checkPayeeMatch = R.contains(transaction['Payee Name']);
transaction.isRestaurant = checkPayeeMatch(restaurantPayees);
transaction.isPet = checkPayeeMatch(petPayees);
transaction.Amount = parseFloat(transaction.Amount);
return transaction;
};
const processTransactions = (transactionList) => {
const taggedTransactions = R.map(tagTransaction, transactionList);
const petTransactions = R.filter(t => t.isPet, taggedTransactions);
console.log('You had', petTransactions.length, 'purchases for the dog');
};
// > node parse.js
// You had 10 purchases for the dog

Summing up

Okay, so let’s dive in. Next up, we’d like to see how much we spent taking care of the dog in the past couple of months - day care, vet bills, purchases at the pet store, stuff like that. This is pretty straightforward in the transaction processing block, fortunately:

1
2
3
4
5
6
7
8
9
10
11
12
13
const processTransactions = (transactionList) => {
const taggedTransactions = R.map(tagTransaction, transactionList);
const petTransactions = R.filter(t => t.isPet, taggedTransactions);
console.log('You had', petTransactions.length, 'purchases for the dog');
let petTotal = R.sum(R.map(t => t.Amount, petTransactions));
console.log("You spent", petTotal.toFixed(2), "on the dog last year.");
};
// > node parse.js
// You had 10 purchases for the dog
// You spent -924.60 on the dog last year.

Note that since we wanted to total up the amount specifically and what we have is an array of transaction objects, we first had to map the transactions collection to extract just the value of the Amount property. Afterward, it’s very straightforward and we just drop it straight into R.sum, which takes a single array argument.

That said, we can apply some additional tools here to again create general summarizer function that will take any array of objects that have an Amount property and generate a summary for any of them - so if we had collections of both pet totals and restaurant totals, we could create the function once and run either array through it, like so:

1
2
3
const summarizer = R.pipe(R.map(t => t.Amount), R.sum);
let petTotal = summarizer(petTransactions);
let restaurantTotal = summarizer(restaurantTransactions);

Simple printer functions

So that’s pretty cool. Next, let’s put together a simple printer function to generate a one-line summary of a transactions block, again using some of the same principles we’ve seen illustrated here so far:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
const processTransactions = (transactionList) => {
const taggedTransactions = R.map(tagTransaction, transactionList);
const petTransactions = R.filter(t => t.isPet, taggedTransactions);
const summarizer = R.pipe(R.map(t => t.Amount), R.sum);
const generateSummary = (transactions) => {
return `${transactions.length} transactions for a total of \$${Math.abs(summarizer(transactions)).toFixed(2)}`;
};
console.log('Pet expenses:', generateSummary(petTransactions));
}
// > node parse.js
// Pet expenses: 10 transactions for a total of $924.60

Pattern matching-ish with cond

Finally, let’s take a look at using R.cond to achieve something that behaves kind of like pattern matching in functional languages like Elixir. It’s not a perfect translation of the concept, but the technique does work pretty well. For a contrived example, let’s say we wanted to break out the different sorts of pet-related transactions: food transactions from the pet store, versus “care” things like vet visits, grooming, and day care.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
const classifyPetTransactions = (transactionList) => {
let care = [];
let food = [];
const classifyCare = (t) => R.contains(t['Payee Name'], ["CAMP BOW WOW", "VET", "GROOMER"]);
const classifyFood = (t) => t['Payee Name'] === "PET STORE";
const classifier = R.cond([
[classifyFood, (t) => food.push(t)],
[classifyCare, (t) => care.push(t)]
]);
R.forEach(classifier, transactionList);
return [care, food];
}
const processTransactions = (transactionList) => {
const taggedTransactions = R.map(tagTransaction, transactionList);
const petTransactions = R.filter(t => t.isPet, taggedTransactions);
const summarizer = R.pipe(R.map(t => t.Amount), R.sum);
const generateSummary = (transactions) => {
return `${transactions.length} transactions for a total of \$${Math.abs(summarizer(transactions)).toFixed(2)}`;
};
const [petCare, petFood] = classifyPetTransactions(petTransactions);
console.log('Pet expenses:', generateSummary(petTransactions));
console.log('Food expenses:', generateSummary(petFood));
console.log('Care expenses:', generateSummary(petCare));
}
// > node parse.js
// Pet expenses: 10 transactions for a total of $924.60
// Food expenses: 3 transactions for a total of $174.90
// Care expenses: 7 transactions for a total of $749.70

In classifyPetTransactions we’re running each item through R.cond; what happens there is that cond takes an array of [predicate, transformer] elements and returns a function. Predicates should return a “truthy” value. You then pass an object to the function which was returned, and that object will be passed to the predicate, in the order they were defined, until it hits the first one to return a truthy value and applies the transformer function. In our case, the transformer doesn’t actually transform, but instead pushes the transaction into a predefined array. We could, of course, have also have set a property on the transaction, or performed some other action:

1
2
3
4
const classifier = R.cond([
[classifyFood, (t) => t.tag = "food"],
[classifyCare, (t) => t.tag = "care"]
]);

All of this is, of course, a relatively simple and contrived example, but I find that’s often helpful. I found cond a bit difficult to understand at first, and it was by working with relatively simple examples like this one that I finally gained an understanding of it.

In Conclusion

Some questions you may be asking: why is this useful? Why would I bother with this when existing language constructs allow me to accomplish all of this already?

That’s a fair question! For a very simple and contrived example like this, it’s true - you can do this very easily without needing any of the functionality offered by Ramda or Lodash. Even with more complicated stuff, you can still get by without doing it. However, I’ve applied these techniques in production applications and found that they result in logic that’s significantly easier to understand and modify.

One great example of that: I worked on a system with a requirement that if the customer selected a package of services, we should show them similar package that would be upgrades in some way from what they had selected. The initial solution that comes to mind is, of course, to have some kind of table of packages and how they related to each other. The problem with that approach was that packages were unique to geographic areas and changed frequently - so the preferred solution was building that information solely from the package web service.

In fact, in the original version of the application, that’s exactly how my coworker implemented it. Unfortunately, when we were rewriting that portion of the application backend, we weren’t able to use the original code as written. Since I thought it was rather confusing as implemented, I took a crack at rewriting it along the principles described in these two posts - though in this case, since we’d already taken a dependency on Lodash, I used that and a heavy dose of the Lodash curry method. What I found was that this broke the code down into a series of straightforward steps that were easy to reason about.

It’s not going to be a hammer for every nail that you run across, but when you have these sorts of problems that you need to break down, it’s a fantastic tool to have in the box. Even if it doesn’t end up being the approach I use, I’ve found it influences the way I think about and break down problems.

It’s also worth noting this doesn’t have to be exclusive to JavaScript, either - the approach will work just as well in other languages! Ruby has many of the same functions (map, find, reduce, sum, etc. as well as currying) on its own enumerable methods. C# and LINQ also allow this (Select, Where, Sum, Aggregate) - though I haven’t tried to do currying in C# and it appears to be a bit more complex there.

To recap, here’s what our final file looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
const fs = require('fs');
const csv = require('fast-csv');
const R = require('ramda');
function getTransactionsFromFile(fname) {
return new Promise(async (resolve, reject) => {
if (!fs.existsSync(fname)) {
return reject('file does not exist!');
}
let transactions = [];
let stream = fs.createReadStream(fname);
csv.fromStream(stream, { headers: true })
.on('data', (data) => { transactions.push(data); })
.on('error', (err) => { return reject(error); })
.on('end', () => { return resolve(transactions); });
});
}
async function run() {
try {
let transactionList = await getTransactionsFromFile('./data.csv');
processTransactions(transactionList);
}
catch (ex) {
console.log('I caught an error:', ex);
return;
}
}
const tagTransaction = (transaction) => {
// Note that this is checking transactions by exact payee name match;
// "MCDONALDS #372" and "MCDONALDS #778" would not match.
const checkPayeeMatch = R.contains(transaction['Payee Name']);
transaction.isRestaurant = checkPayeeMatch(restaurantPayees);
transaction.isPet = checkPayeeMatch(petPayees);
transaction.Amount = parseFloat(transaction.Amount);
return transaction;
};
const classifyPetTransactions = (transactionList) => {
let care = [];
let food = [];
const classifyCare = (t) => R.contains(t['Payee Name'], ["CAMP BOW WOW", "VET", "GROOMER"]);
const classifyFood = (t) => t['Payee Name'] === "PET STORE";
const classifier = R.cond([
[classifyFood, (t) => food.push(t)],
[classifyCare, (t) => care.push(t)]
]);
R.forEach(classifier, transactionList);
return [care, food];
}
const processTransactions = (transactionList) => {
const taggedTransactions = R.map(tagTransaction, transactionList);
const petTransactions = R.filter(t => t.isPet, taggedTransactions);
const summarizer = R.pipe(R.map(t => t.Amount), R.sum);
const generateSummary = (transactions) => {
return `${transactions.length} transactions for a total of \$${Math.abs(summarizer(transactions)).toFixed(2)}`;
};
const [petCare, petFood] = classifyPetTransactions(petTransactions);
console.log('Pet expenses:', generateSummary(petTransactions));
console.log('Food expenses:', generateSummary(petFood));
console.log('Care expenses:', generateSummary(petCare));
}

Thanks for reading, and I hope you found it useful!

Parsing Bank Transactions with ramda.js

Recently, I’d wanted to sort through a bunch of transaction data from my bank to figure out what our spending trends were in a couple of areas. I suppose I could’ve done this quite effectively with Excel or Apple Numbers, but then I said, hey, that’s boring. :) I’ve been doing a lot of documentation and research stuff at work lately and really wanted to get my hands on a little toy project for a change of pace.

That said, I didn’t want to spend too much time getting bogged down in stuff, so I decided to do it with node.js; node 7.6.0 was released recently and has native async/await, so it seemed like a great choice for a couple hours of hacking.

Setting up

First thing to do was to get a bunch of transaction data from my bank’s web site; they make it easy to download in CSV. That returns a file with the following format:

1
"Date","Reference Number","Payee Name","Memo","Amount","Category Name"

The payee name field seems to have a hard limit of 15 characters for most entries, unless they originated within the bank’s system. Annoying and not totally descriptive, but whatever, we can deal with it.

First order of business: get the file in. I’m using fast-csv to parse the files. The actual process of that is pretty straightforward. Since this is a small file, rather than process the stream events individually, I’d rather just append them all to a collections object, read ‘em in, and then return them all at once. If you were doing this on a huge file, that’s probably a bad idea. :) However, for ~500 transactions or so, it’s no big deal. So let’s do that: we’re going to read in the file, append all the transactions into an array, then finally resolve the promise and return the data once we hit the end of the file. Along the way, we’re also going to reject the promise if the CSV parsing throws an error.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
function getTransactionsFromFile(fname) {
return new Promise(async (resolve, reject) => {
if (!fs.existsSync(fname)) {
return reject('file does not exist!');
}
let transactions = [];
let stream = fs.createReadStream(fname);
csv.fromStream(stream, { headers: true })
.on('data', (data) => { transactions.push(data); })
.on('error', (err) => { return reject(error); })
.on('end', () => { return resolve(transactions); });
});
}
async function run() {
try {
let transactionList = await getTransactionsFromFile('./data.csv');
processTransactions(transactionList);
}
catch (ex) {
console.log('I caught an error:', ex);
return;
}
}
let processTransactions = (transactionList) => {
console.log('got', transactionList.length, 'transactions');
};
run();
// > node parse.js
// got 478 transactions

Operating on data with map

Excellent. Data. Now how to parse it?

Traditionally, this sort of thing would involve writing a bunch of looping code to iterate through the transactions array, examining each one, and possibly appending it to another array if it was a transaction that was of interest to us. Then we’d write more loops to do other operations, like summing up all the transactions of a given type. Pretty straightforward, but honestly, I find this approach tedious.

This is where Ramda comes in. You can use lodash as well, but I’ve become rather fond of Ramda over the past year or so. Though I wouldn’t go adding it indiscriminately into existing projects that already use lodash, it’s been my library of choice for doing map/filter/reduce/sort type operations in my personal projects. I like that it automatically curries functions when you don’t supply all the arguments, and that it in fact encourages this use by making the collection/array object the last argument supplied to the function.

I particularly like this approach because I think it’s more explicit. If you’re using a loop, anyone else reading the code has to take the time to inspect the loop and understand what the code in it is doing. When I see R.map called, I immediately think “Okay, this is transforming data somehow.” I will acknowledge this may be a little confusing at first to developers who aren’t experienced with the concepts. This is where mentoring comes in, though, and I think it’s much more straightforward once it “clicks” with folks.

So let’s say we want to figure out how much we spent on the dog, and also how much we spent eating out. First, we need to tag our transactions with some additional information:

  • Was it income or an expense?
  • Was it a restaurant?
  • Was it pet-related?

We don’t have to do it this way, but I’m going to run the transactions array through a map operation that transforms them by adding additional data to them. First we need a function that takes a single transaction, inspects the payee and payment amounts, and tags them with some data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
const restaurantPayees = ["MCDONALDS", "CHIPOTLE", "JACKS DELI"];
const petPayees = ["CAMP BOW WOW", "VET", "PET STORE", "GROOMER"];
const tagTransaction = (transaction) => {
// Note that this is checking transactions by exact payee name match;
// "MCDONALDS #372" and "MCDONALDS #778" would not match.
if(R.contains(transaction['Payee Name'], restaurantNames)) {
transaction.isRestaurant = true;
}
else if (R.contains(transaction['Payee Name'], petPayees)) {
transaction.isPet = true;
}
transaction.isIncome = parseFloat(transaction.Amount) > 0.0;
return transaction;
};

And then, just to confirm it’s working correctly, let’s also add a quick filter operation on the pet transactions to see how many there were.

1
2
3
4
5
6
7
8
9
const processTransactions = (transactionList) => {
const taggedTransactions = R.map(tagTransaction, transactionList);
const petTransactions = R.filter(t => t.isPet, taggedTransactions);
console.log('You had', petTransactions.length, 'purchases for the dog');
};
// > node parse.js
// You had 10 purchases for the dog

Quickly demonstrating currying

But wait! We also have an opportunity to use Ramda’s automatic currying of functions. Put simply, this allows us to call a function without supplying its final values, getting back in turn another function that you can apply repeatedly against different values, using the same initial input. That would look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
const tagTransaction = (transaction) => {
// Note that this is checking transactions by exact payee name match;
// "MCDONALDS #372" and "MCDONALDS #778" would not match.
const checkPayeeMatch = R.contains(transaction['Payee Name']);
transaction.isRestaurant = checkPayeeMatch(restaurantPayees);
transaction.isPet = checkPayeeMatch(petPayees);
transaction.Amount = parseFloat(transaction.Amount);
return transaction;
};
// > node parse.js
// You had 10 purchases for the dog

As you can see, we built a curried checkPayeeMatch function by calling R.contains and only supplying the payee name field; we can then use it to check the same payee against both the restaurant and pet-related payee names. It’s a small thing, but an example of “don’t repeat yourself” in action.

So far, we’ve imported our transactions into an array (using async/await, even!) written a tagging function to add some additional metadata to our transaction, used ramda to apply that mapping function to the transactions array and get back a new array of modified transactions, and used a simple currying example to reduce repetition in our code.

In the next article on this, we’ll look at using R.sum to total up transactions, outputting data with r.forEach, and a new way to implement conditional logic with R.cond.

Hello Hexo

Finally decided to get off of the Wordpress train here and move my blog over to a static site generator. After looking at a couple of different options, I decided to give Hexo a try and see how it worked out. Fairly happy with it so far - it seems to have very sensible defaults out of the box.

Could’ve gone with Middleman (used it on professional projects in the past) or Hugo, but I wanted to go with something a little different and something that didn’t require me to install Pygments to get working syntax highlighting. It’s a perfectly fine package to work with, but I currently do little or no Python and didn’t really feel like taking the time to get pip set up to install Pygments properly. Laziness is sometimes a virtue. :)

Installing a list of extensions in Visual Studio Code

I’ve been setting up a new OS X installation and wanted to quickly get Visual Studio Code set back up. Atom has a really handy command line utility, apm, that lets you do useful things like export a list of extensions and reinstall them elsewhere by passing in that list as a command line argument.

Unfortunately, while Visual Studio Code’s command line utility allows you to get a list of extensions with code --list-extensions which you can pipe into a text file, it doesn’t appear to have any way to automatically install the extensions to that file.

Fortunately, a minute with Google and Stack Overflow turns up this very helpful answer to run a command for each line in a text file. From there, some quick trial and error got me to this:

1
while read in; do code --install-extension "$in"; done < ~/vscode-extensions.txt

Not quite as convenient as Atom’s solution, but a nice way to make sure you don’t overlook anything.