Flattening a List

Getting back to writing articles after spending the better part of a month fighting off a sinus infection and helping my wife get over a nasty cold. Normally I love northeast Ohio, but I’m so over winter right now.

I read a post a month or so ago asking why it’s so difficult for programmers to write code to flatten a list… so naturally, this got me thinking about it and I wanted to tackle it. To start with, let’s try tackling it in Javascript.

For this sort of question, I actually think a dynamic language like JavaScript or Ruby is probably the ideal choice, and the reason for that is hinted at in the post when he remarks that:

Candidates fail to write proper method signatures. They get confused about what type of list they should use. Some start with List of integers List<Integer> ints. They fail to see how they will store a List<Integer> to a List<Integer>.

Since JavaScript lets you stuff pretty much anything into an array - for better or sometimes worse, as the case may be - the problem is much more straightforward to approach there.

He asks candidates to write test cases. Let’s start with a simple stub file and a test - I’m using Mocha here.

flatten.js
1
2
3
4
5
function flatten(inputArray) {
return inputArray;
}
module.exports.flatten = flatten;
flatten_test.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
var assert = require('assert');
var f = require('./flatten');
describe("Flatten", () => {
it("should return a flattened array when passed a nested array", () => {
var testArray = [1,[2,3], [4, [5,6]]];
var result = f.flatten(testArray);
assert.equal(result, [1,2,3,4,5,6]);
});
});
// [~/source/js_tests]$ mocha flatten_test.js -R list
//
// 1) Flatten Should return a flattened array when passed a nested array:
//
// AssertionError: [ 1, [ 2, 3 ], [ 4, [ 5, 6 ] ] ] == [ 1, 2, 3, 4, 5, 6 ]

Of course, since we’re just throwing the original array right back out, it immediately bombs. Not so useful! But on the plus side, this gives us a nice test harness for executing the file, so let’s dig into this further. I’m going to it.skip() our first test here to reduce test clutter while we work our way back into this. First, let’s make sure we’re handling a couple of straightfoward base cases correctly, such as single-element arrays that aren’t a sub-array:

flatten.js
1
2
3
4
5
6
7
8
9
10
11
12
function flatten(inputArray) {
var result = [];
if (inputArray instanceof Array) {
if (inputArray.length === 1 && !(inputArray[0] instanceof Array)) {
// Nothing further to do. Return the original array.
return inputArray;
}
}
return result;
}
flatten_test.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
it("should return the original array if it's a single element with no nested arrays", () => {
var testArray = [1];
var result = f.flatten(testArray);
assert.equal(testArray, result);
});
// [~/source/js_tests]$ mocha flatten_test.js
//
// Flatten
// ✓ should return the original array if it's a single element with no nested arrays
// - should return a flattened array when passed a nested array
//
//
// 1 passing (8ms)
// 1 pending

Okay. Now let’s try a multi-element array with no sub-arrays:

flatten.js
1
2
3
4
5
6
7
8
9
10
11
12
if (inputArray instanceof Array) {
if (inputArray.length === 1 && !(inputArray[0] instanceof Array)) {
// Nothing further to do. Return the original array.
return inputArray;
}
var processArrayItem = (item) => {
result.push(item);
};
R.forEach(processArrayItem, inputArray);
}
flatten_test.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
it("should return a two element array with no sub-arrays", () => {
var testArray = [1, 2];
var result = f.flatten(testArray);
assert.equal(testArray, result);
});
// [~/source/js_tests]$ mocha flatten_test.js
//
// Flatten
// ✓ should return the original array if it's a single element with no nested arrays
// 1) should return a two element array with no sub-arrays
// - should return a flattened array when passed a nested array
//
// 1 passing (8ms)
// 1 pending
// 1 failing
//
// 1) Flatten should return a two element array with no sub-arrays:
//
// AssertionError: [ 1, 2 ] == [ 1, 2 ]

Alrighty then… wait, what? Why is it saying that [1, 2] doesn’t equal [1, 2]? We need to check using deepEqual instead of just equal - it’s a bit much to get into here, but this link goes into a quick overview of the differences. Interestingly, using equal works fine on single-element arrays and doesn’t try to tell you that [1] and [2] are equal.

Change the assertion and the tests pass:

flatten_test.js
1
2
3
4
5
6
7
8
9
10
11
assert.deepEqual(testArray, result);
// [~/source/js_tests]$ mocha flatten_test.js
//
// Flatten
// ✓ should return the original array if it's a single element with no nested arrays
// ✓ should return a two element array with no sub-arrays
// - should return a flattened array when passed a nested array
//
// 2 passing (10ms)
// 1 pending

Moving on, now we should be able to focus all our attention on the processArrayItem function since that’s going to be doing most of the rest of the work. Currently, it naively pushes each array item into the result array, assuming that it’s a single element and not a sub-array. Clearly not what we want here so let’s see what we can do about that. We’ll add a test and then write some code to see if we can get it to go green.

flatten_test.js
1
2
3
4
5
6
it("should flatten sub-arrays", () => {
var testArray = [1, [2, 3]];
var result = f.flatten(testArray);
assert.deepEqual(result, [1, 2, 3]);
});
flatten.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
var processArrayItem = (item) => {
if (item instanceof Array) {
var res = flatten(item);
console.log('flattened subarray:', res);
result.push(res);
} else {
result.push(item);
}
};
// [~/source/js_tests]$ mocha flatten_test.js
//
// Flatten
// ✓ should return the original array if it's a single element with no nested arrays
// ✓ should return a two element array with no sub-arrays
// flattened subarray: [ 2, 3 ]
// 1) should flatten simple sub-arrays
// - should return a flattened array when passed a nested array
//
// 2 passing (10ms)
// 1 pending
// 1 failing
//
// 1) Flatten should flatten simple sub-arrays:
//
// AssertionError: [ 1, [ 2, 3 ] ] deepEqual [ 1, 2, 3 ]
// + expected - actual

So yeah, that didn’t quite work; the recursive function worked perfectly, but since it returned its results all at once. I anticipated this and added a bit of console.loging; you can see we got back an array that got pushed into the results array as a single element, and puts us right back around where we started.

So… how do we attack this? Since we’re already using Ramda, we could just use the concat method for this:

flatten.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
var processArrayItem = (item) => {
if (item instanceof Array) {
result = R.concat(result, flatten(item));
} else {
result.push(item);
}
};
// [~/source/js_tests]$ mocha flatten_test.js
//
// Flatten
// ✓ should return the original array if it's a single element with no nested arrays
// ✓ should return a two element array with no sub-arrays
// ✓ should flatten simple sub-arrays
// - should return a flattened array when passed a nested array
//
// 3 passing (7ms)
// 1 pending

That works! Of course, if we’re using Ramda… we could’ve also just called R.flatten on this and then called it a day. :)

I’m also looking at my original flatten function and realizing it’s a bit redundant. We’re flattening an array; of course it’s going to take array objects! So… we don’t actually need to have everything inside a nested “is this an array?” check. Let’s remove that and replace it with a simple guard clause.

flatten.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
var R = require('ramda');
function flatten(inputArray) {
var result = [];
if (!inputArray instanceof Array) {
return result.push(inputArray);
}
if (inputArray.length === 1 && !(inputArray[0] instanceof Array)) {
// Nothing further to do. Return the original array.
return inputArray;
}
var processArrayItem = (item) => {
if (item instanceof Array) {
result = R.concat(result, flatten(item));
} else {
result.push(item);
}
};
R.forEach(processArrayItem, inputArray);
return result;
}
module.exports.flatten = flatten;
// [~/source/js_tests]$ mocha flatten_test.js
//
// Flatten
// ✓ should return the original array if it's a single element with no nested arrays
// ✓ should return a two element array with no sub-arrays
// ✓ should flatten simple sub-arrays
// - should return a flattened array when passed a nested array
//
// 3 passing (7ms)
// 1 pending

Since we have tests, we can see we didn’t break anything there. Let’s go ahead and try un-skipping that more complicated test and applying that deepEqual fix to it and see if we get the behavior we’re wanting.

flatten_test.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
it("should return a flattened array when passed a nested array", () => {
var testArray = [1, [2, 3], [4, [5, 6]]];
var result = f.flatten(testArray);
assert.deepEqual(result, [1, 2, 3, 4, 5, 6]);
});
// [~/source/js_tests]$ mocha flatten_test.js
//
// Flatten
// ✓ should return the original array if it's a single element with no nested arrays
// ✓ should return a two element array with no sub-arrays
// ✓ should flatten simple sub-arrays
// ✓ should return a flattened array when passed a nested array
//
// 4 passing (7ms)

Oh - but we don’t have a test case for that guard clause! We know everything works when we pass in an array. Let’s also make sure we get the expected behavior if we just pass in a bare element - it should “flatten” it into a single element array. I’ll note I’ve broken with good practice here; by the strict approach to TDD, I should have written a test for this prior to even adding that guard clause. However - I think it can be okay to be a little lax while you’re spiking concepts (or blogging :)), but even so, let’s clean this up.

flatten_test.js
1
2
3
4
5
6
7
it("should return an array when passed a bare object", () => {
var result = f.flatten(5);
assert.deepEqual(result, [5]);
});
// 1) Flatten should return an array when passed a bare object:
//
// AssertionError: [] deepEqual [ 5 ]

Huh… well, crap. Let’s tweak that guard clause:

flatten.js
1
2
3
if (!(inputArray instanceof Array)) {
return [inputArray];
}

… and that gets it. Note that the instanceof check needs to be wrapped in extra parentheses to negate it within an if statement; it took me a while to run that one down.

All that aside, we’ve accumulated a bit of clutter here; we don’t actually need the initial check for single-length arrays, since our processArrayItem function will handle that scenario just fine. Let’s go ahead and strip that out and see what happens:

flatten.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
function flatten(inputArray) {
var result = [];
if (!(inputArray instanceof Array)) {
return [inputArray];
}
var processArrayItem = (item) => {
if (item instanceof Array) {
result = R.concat(result, flatten(item));
} else {
result.push(item);
}
};
R.forEach(processArrayItem, inputArray);
return result;
}
// [~/source/js_tests]$ mocha flatten_test.js
//
// Flatten
// 1) should return the original array if it's a single element with no nested arrays
// ✓ should return a two element array with no sub-arrays
// ✓ should flatten simple sub-arrays
// ✓ should return a flattened array when passed a nested array
// ✓ should return an array when passed a bare object
//
// 4 passing (10ms)
// 1 failing
//
// 1) Flatten should return the original array if it's a single element with no nested arrays:
//
// AssertionError: [ 1 ] == [ 1 ]

We’re not returning the same object anymore, so we need to fix our test to use deepEqual there too… and that gets it.

Next up - we don’t actually need to use ramda’s forEach method here; Javascript itself has forEach built in to the array object. It also has a concat method, so we can entirely write this in vanilla JavaScript:

flatten.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
function flatten(inputArray) {
var result = [];
if (!(inputArray instanceof Array)) {
return [inputArray];
}
inputArray.forEach((item) => {
if (item instanceof Array) {
result = result.concat(flatten(item));
} else {
result.push(item);
}
});
return result;
}
module.exports.flatten = flatten;

And to recap, here’s what our tests look like right now - I made a minor tweak to keep the result versus test values consistently ordered across tests, since that’s more readable while debugging.

flatten_test.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
var assert = require('assert');
var f = require('./flatten');
describe("Flatten", () => {
it("should return the original array if it's a single element with no nested arrays", () => {
var testArray = [1];
var result = f.flatten(testArray);
assert.deepEqual(result, testArray);
});
it("should return a two element array with no sub-arrays", () => {
var testArray = [1, 2];
var result = f.flatten(testArray);
assert.deepEqual(result, testArray);
});
it("should flatten simple sub-arrays", () => {
var testArray = [1, [2, 3]];
var result = f.flatten(testArray);
assert.deepEqual(result, [1, 2, 3]);
});
it("should return a flattened array when passed a nested array", () => {
var testArray = [1, [2, 3], [4, [5, 6]]];
var result = f.flatten(testArray);
assert.deepEqual(result, [1, 2, 3, 4, 5, 6]);
});
it("should return an array when passed a bare object", () => {
var result = f.flatten(5);
assert.deepEqual(result, [5]);
});
});

All of the tests run and pass. It’s a relatively functional approach; our flatten function doesn’t have any external side effects and we’re not using any explicit ‘for’ loops with an index (I generally dislike these). We could replace the if statement in the forEach with a Ramda cond matcher, but honestly - that feels like massive overkill, especially since we don’t actually need Ramda for anything else. So, in this case, I think I’d advocate keeping it simple and not introducing a substantial dependency that I don’t really need.

The only thing I don’t particularly care for is using the concat method; it feels like we’re cheating a bit since we’re getting another array back in some cases and shoving it into our results array. This works, and it solves the problem - but I don’t totally like it.

One approach would be adding an accumulator to our flatten function allowing us to pass in the original result array, if it exists, and simply append more values to that. Let’s try it! We’ll write a quick accumulatingFlatten function and then just change the module.exports to return that for the flatten function instead; that’ll let us see if all our tests still work.

flatten.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
function accumulatingFlatten(inputArray, acc) {
if (!(inputArray instanceof Array)) { return [inputArray]; }
acc = acc || [];
inputArray.forEach((item) => {
if (item instanceof Array) {
accumulatingFlatten(item, acc);
}
else {
acc.push(item);
}
});
return acc;
}
module.exports.flatten = accumulatingFlatten;
// [~/source/js_tests]$ mocha flatten_test.js
//
// Flatten
// ✓ should return the original array if it's a single element with no nested arrays
// ✓ should return a two element array with no sub-arrays
// ✓ should flatten simple sub-arrays
// ✓ should return a flattened array when passed a nested array
// ✓ should return an array when passed a bare object
//
// 5 passing (7ms)

Awesome. Still works.

This post got a bit long - I try to post fairly substantial code examples, because I find posts with lots of small snippets hard to follow. Hopefully it’s a useful discussion of how you’d reason your way through a toy problem like this, with some demonstration of testing thrown in for good measure. It bears repeating, though - this is a code golf problem, but please don’t write your own array flattening function for production use. Lodash and Ramda are both well-written, extensively-tested libraries that already have a method for this, and you’re better off just using that.

Bank Transactions with Ramda, part 2

When we left off, we’d gotten our data imported from the CSV, run a map operation on it to add some extra metadata, demonstrated how to use Ramda to filter it, and had a quick demonstration of simple currying. Let’s move on. To recap, this is where we’re starting from:

parse.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
const fs = require('fs');
const csv = require('fast-csv');
const R = require('ramda');
function getTransactionsFromFile(fname) {
return new Promise(async (resolve, reject) => {
if (!fs.existsSync(fname)) {
return reject('file does not exist!');
}
let transactions = [];
let stream = fs.createReadStream(fname);
csv.fromStream(stream, { headers: true })
.on('data', (data) => { transactions.push(data); })
.on('error', (err) => { return reject(error); })
.on('end', () => { return resolve(transactions); });
});
}
async function run() {
try {
let transactionList = await getTransactionsFromFile('./data.csv');
processTransactions(transactionList);
}
catch (ex) {
console.log('I caught an error:', ex);
return;
}
}
const tagTransaction = (transaction) => {
// Note that this is checking transactions by exact payee name match;
// "MCDONALDS #372" and "MCDONALDS #778" would not match.
const checkPayeeMatch = R.contains(transaction['Payee Name']);
transaction.isRestaurant = checkPayeeMatch(restaurantPayees);
transaction.isPet = checkPayeeMatch(petPayees);
transaction.Amount = parseFloat(transaction.Amount);
return transaction;
};
const processTransactions = (transactionList) => {
const taggedTransactions = R.map(tagTransaction, transactionList);
const petTransactions = R.filter(t => t.isPet, taggedTransactions);
console.log('You had', petTransactions.length, 'purchases for the dog');
};
// > node parse.js
// You had 10 purchases for the dog

Summing up

Okay, so let’s dive in. Next up, we’d like to see how much we spent taking care of the dog in the past couple of months - day care, vet bills, purchases at the pet store, stuff like that. This is pretty straightforward in the transaction processing block, fortunately:

1
2
3
4
5
6
7
8
9
10
11
12
13
const processTransactions = (transactionList) => {
const taggedTransactions = R.map(tagTransaction, transactionList);
const petTransactions = R.filter(t => t.isPet, taggedTransactions);
console.log('You had', petTransactions.length, 'purchases for the dog');
let petTotal = R.sum(R.map(t => t.Amount, petTransactions));
console.log("You spent", petTotal.toFixed(2), "on the dog last year.");
};
// > node parse.js
// You had 10 purchases for the dog
// You spent -924.60 on the dog last year.

Note that since we wanted to total up the amount specifically and what we have is an array of transaction objects, we first had to map the transactions collection to extract just the value of the Amount property. Afterward, it’s very straightforward and we just drop it straight into R.sum, which takes a single array argument.

That said, we can apply some additional tools here to again create general summarizer function that will take any array of objects that have an Amount property and generate a summary for any of them - so if we had collections of both pet totals and restaurant totals, we could create the function once and run either array through it, like so:

1
2
3
const summarizer = R.pipe(R.map(t => t.Amount), R.sum);
let petTotal = summarizer(petTransactions);
let restaurantTotal = summarizer(restaurantTransactions);

Simple printer functions

So that’s pretty cool. Next, let’s put together a simple printer function to generate a one-line summary of a transactions block, again using some of the same principles we’ve seen illustrated here so far:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
const processTransactions = (transactionList) => {
const taggedTransactions = R.map(tagTransaction, transactionList);
const petTransactions = R.filter(t => t.isPet, taggedTransactions);
const summarizer = R.pipe(R.map(t => t.Amount), R.sum);
const generateSummary = (transactions) => {
return `${transactions.length} transactions for a total of \$${Math.abs(summarizer(transactions)).toFixed(2)}`;
};
console.log('Pet expenses:', generateSummary(petTransactions));
}
// > node parse.js
// Pet expenses: 10 transactions for a total of $924.60

Pattern matching-ish with cond

Finally, let’s take a look at using R.cond to achieve something that behaves kind of like pattern matching in functional languages like Elixir. It’s not a perfect translation of the concept, but the technique does work pretty well. For a contrived example, let’s say we wanted to break out the different sorts of pet-related transactions: food transactions from the pet store, versus “care” things like vet visits, grooming, and day care.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
const classifyPetTransactions = (transactionList) => {
let care = [];
let food = [];
const classifyCare = (t) => R.contains(t['Payee Name'], ["CAMP BOW WOW", "VET", "GROOMER"]);
const classifyFood = (t) => t['Payee Name'] === "PET STORE";
const classifier = R.cond([
[classifyFood, (t) => food.push(t)],
[classifyCare, (t) => care.push(t)]
]);
R.forEach(classifier, transactionList);
return [care, food];
}
const processTransactions = (transactionList) => {
const taggedTransactions = R.map(tagTransaction, transactionList);
const petTransactions = R.filter(t => t.isPet, taggedTransactions);
const summarizer = R.pipe(R.map(t => t.Amount), R.sum);
const generateSummary = (transactions) => {
return `${transactions.length} transactions for a total of \$${Math.abs(summarizer(transactions)).toFixed(2)}`;
};
const [petCare, petFood] = classifyPetTransactions(petTransactions);
console.log('Pet expenses:', generateSummary(petTransactions));
console.log('Food expenses:', generateSummary(petFood));
console.log('Care expenses:', generateSummary(petCare));
}
// > node parse.js
// Pet expenses: 10 transactions for a total of $924.60
// Food expenses: 3 transactions for a total of $174.90
// Care expenses: 7 transactions for a total of $749.70

In classifyPetTransactions we’re running each item through R.cond; what happens there is that cond takes an array of [predicate, transformer] elements and returns a function. Predicates should return a “truthy” value. You then pass an object to the function which was returned, and that object will be passed to the predicate, in the order they were defined, until it hits the first one to return a truthy value and applies the transformer function. In our case, the transformer doesn’t actually transform, but instead pushes the transaction into a predefined array. We could, of course, have also have set a property on the transaction, or performed some other action:

1
2
3
4
const classifier = R.cond([
[classifyFood, (t) => t.tag = "food"],
[classifyCare, (t) => t.tag = "care"]
]);

All of this is, of course, a relatively simple and contrived example, but I find that’s often helpful. I found cond a bit difficult to understand at first, and it was by working with relatively simple examples like this one that I finally gained an understanding of it.

In Conclusion

Some questions you may be asking: why is this useful? Why would I bother with this when existing language constructs allow me to accomplish all of this already?

That’s a fair question! For a very simple and contrived example like this, it’s true - you can do this very easily without needing any of the functionality offered by Ramda or Lodash. Even with more complicated stuff, you can still get by without doing it. However, I’ve applied these techniques in production applications and found that they result in logic that’s significantly easier to understand and modify.

One great example of that: I worked on a system with a requirement that if the customer selected a package of services, we should show them similar package that would be upgrades in some way from what they had selected. The initial solution that comes to mind is, of course, to have some kind of table of packages and how they related to each other. The problem with that approach was that packages were unique to geographic areas and changed frequently - so the preferred solution was building that information solely from the package web service.

In fact, in the original version of the application, that’s exactly how my coworker implemented it. Unfortunately, when we were rewriting that portion of the application backend, we weren’t able to use the original code as written. Since I thought it was rather confusing as implemented, I took a crack at rewriting it along the principles described in these two posts - though in this case, since we’d already taken a dependency on Lodash, I used that and a heavy dose of the Lodash curry method. What I found was that this broke the code down into a series of straightforward steps that were easy to reason about.

It’s not going to be a hammer for every nail that you run across, but when you have these sorts of problems that you need to break down, it’s a fantastic tool to have in the box. Even if it doesn’t end up being the approach I use, I’ve found it influences the way I think about and break down problems.

It’s also worth noting this doesn’t have to be exclusive to JavaScript, either - the approach will work just as well in other languages! Ruby has many of the same functions (map, find, reduce, sum, etc. as well as currying) on its own enumerable methods. C# and LINQ also allow this (Select, Where, Sum, Aggregate) - though I haven’t tried to do currying in C# and it appears to be a bit more complex there.

To recap, here’s what our final file looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
const fs = require('fs');
const csv = require('fast-csv');
const R = require('ramda');
function getTransactionsFromFile(fname) {
return new Promise(async (resolve, reject) => {
if (!fs.existsSync(fname)) {
return reject('file does not exist!');
}
let transactions = [];
let stream = fs.createReadStream(fname);
csv.fromStream(stream, { headers: true })
.on('data', (data) => { transactions.push(data); })
.on('error', (err) => { return reject(error); })
.on('end', () => { return resolve(transactions); });
});
}
async function run() {
try {
let transactionList = await getTransactionsFromFile('./data.csv');
processTransactions(transactionList);
}
catch (ex) {
console.log('I caught an error:', ex);
return;
}
}
const tagTransaction = (transaction) => {
// Note that this is checking transactions by exact payee name match;
// "MCDONALDS #372" and "MCDONALDS #778" would not match.
const checkPayeeMatch = R.contains(transaction['Payee Name']);
transaction.isRestaurant = checkPayeeMatch(restaurantPayees);
transaction.isPet = checkPayeeMatch(petPayees);
transaction.Amount = parseFloat(transaction.Amount);
return transaction;
};
const classifyPetTransactions = (transactionList) => {
let care = [];
let food = [];
const classifyCare = (t) => R.contains(t['Payee Name'], ["CAMP BOW WOW", "VET", "GROOMER"]);
const classifyFood = (t) => t['Payee Name'] === "PET STORE";
const classifier = R.cond([
[classifyFood, (t) => food.push(t)],
[classifyCare, (t) => care.push(t)]
]);
R.forEach(classifier, transactionList);
return [care, food];
}
const processTransactions = (transactionList) => {
const taggedTransactions = R.map(tagTransaction, transactionList);
const petTransactions = R.filter(t => t.isPet, taggedTransactions);
const summarizer = R.pipe(R.map(t => t.Amount), R.sum);
const generateSummary = (transactions) => {
return `${transactions.length} transactions for a total of \$${Math.abs(summarizer(transactions)).toFixed(2)}`;
};
const [petCare, petFood] = classifyPetTransactions(petTransactions);
console.log('Pet expenses:', generateSummary(petTransactions));
console.log('Food expenses:', generateSummary(petFood));
console.log('Care expenses:', generateSummary(petCare));
}

Thanks for reading, and I hope you found it useful!

Parsing Bank Transactions with ramda.js

Recently, I’d wanted to sort through a bunch of transaction data from my bank to figure out what our spending trends were in a couple of areas. I suppose I could’ve done this quite effectively with Excel or Apple Numbers, but then I said, hey, that’s boring. :) I’ve been doing a lot of documentation and research stuff at work lately and really wanted to get my hands on a little toy project for a change of pace.

That said, I didn’t want to spend too much time getting bogged down in stuff, so I decided to do it with node.js; node 7.6.0 was released recently and has native async/await, so it seemed like a great choice for a couple hours of hacking.

Setting up

First thing to do was to get a bunch of transaction data from my bank’s web site; they make it easy to download in CSV. That returns a file with the following format:

1
"Date","Reference Number","Payee Name","Memo","Amount","Category Name"

The payee name field seems to have a hard limit of 15 characters for most entries, unless they originated within the bank’s system. Annoying and not totally descriptive, but whatever, we can deal with it.

First order of business: get the file in. I’m using fast-csv to parse the files. The actual process of that is pretty straightforward. Since this is a small file, rather than process the stream events individually, I’d rather just append them all to a collections object, read ‘em in, and then return them all at once. If you were doing this on a huge file, that’s probably a bad idea. :) However, for ~500 transactions or so, it’s no big deal. So let’s do that: we’re going to read in the file, append all the transactions into an array, then finally resolve the promise and return the data once we hit the end of the file. Along the way, we’re also going to reject the promise if the CSV parsing throws an error.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
function getTransactionsFromFile(fname) {
return new Promise(async (resolve, reject) => {
if (!fs.existsSync(fname)) {
return reject('file does not exist!');
}
let transactions = [];
let stream = fs.createReadStream(fname);
csv.fromStream(stream, { headers: true })
.on('data', (data) => { transactions.push(data); })
.on('error', (err) => { return reject(error); })
.on('end', () => { return resolve(transactions); });
});
}
async function run() {
try {
let transactionList = await getTransactionsFromFile('./data.csv');
processTransactions(transactionList);
}
catch (ex) {
console.log('I caught an error:', ex);
return;
}
}
let processTransactions = (transactionList) => {
console.log('got', transactionList.length, 'transactions');
};
run();
// > node parse.js
// got 478 transactions

Operating on data with map

Excellent. Data. Now how to parse it?

Traditionally, this sort of thing would involve writing a bunch of looping code to iterate through the transactions array, examining each one, and possibly appending it to another array if it was a transaction that was of interest to us. Then we’d write more loops to do other operations, like summing up all the transactions of a given type. Pretty straightforward, but honestly, I find this approach tedious.

This is where Ramda comes in. You can use lodash as well, but I’ve become rather fond of Ramda over the past year or so. Though I wouldn’t go adding it indiscriminately into existing projects that already use lodash, it’s been my library of choice for doing map/filter/reduce/sort type operations in my personal projects. I like that it automatically curries functions when you don’t supply all the arguments, and that it in fact encourages this use by making the collection/array object the last argument supplied to the function.

I particularly like this approach because I think it’s more explicit. If you’re using a loop, anyone else reading the code has to take the time to inspect the loop and understand what the code in it is doing. When I see R.map called, I immediately think “Okay, this is transforming data somehow.” I will acknowledge this may be a little confusing at first to developers who aren’t experienced with the concepts. This is where mentoring comes in, though, and I think it’s much more straightforward once it “clicks” with folks.

So let’s say we want to figure out how much we spent on the dog, and also how much we spent eating out. First, we need to tag our transactions with some additional information:

  • Was it income or an expense?
  • Was it a restaurant?
  • Was it pet-related?

We don’t have to do it this way, but I’m going to run the transactions array through a map operation that transforms them by adding additional data to them. First we need a function that takes a single transaction, inspects the payee and payment amounts, and tags them with some data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
const restaurantPayees = ["MCDONALDS", "CHIPOTLE", "JACKS DELI"];
const petPayees = ["CAMP BOW WOW", "VET", "PET STORE", "GROOMER"];
const tagTransaction = (transaction) => {
// Note that this is checking transactions by exact payee name match;
// "MCDONALDS #372" and "MCDONALDS #778" would not match.
if(R.contains(transaction['Payee Name'], restaurantNames)) {
transaction.isRestaurant = true;
}
else if (R.contains(transaction['Payee Name'], petPayees)) {
transaction.isPet = true;
}
transaction.isIncome = parseFloat(transaction.Amount) > 0.0;
return transaction;
};

And then, just to confirm it’s working correctly, let’s also add a quick filter operation on the pet transactions to see how many there were.

1
2
3
4
5
6
7
8
9
const processTransactions = (transactionList) => {
const taggedTransactions = R.map(tagTransaction, transactionList);
const petTransactions = R.filter(t => t.isPet, taggedTransactions);
console.log('You had', petTransactions.length, 'purchases for the dog');
};
// > node parse.js
// You had 10 purchases for the dog

Quickly demonstrating currying

But wait! We also have an opportunity to use Ramda’s automatic currying of functions. Put simply, this allows us to call a function without supplying its final values, getting back in turn another function that you can apply repeatedly against different values, using the same initial input. That would look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
const tagTransaction = (transaction) => {
// Note that this is checking transactions by exact payee name match;
// "MCDONALDS #372" and "MCDONALDS #778" would not match.
const checkPayeeMatch = R.contains(transaction['Payee Name']);
transaction.isRestaurant = checkPayeeMatch(restaurantPayees);
transaction.isPet = checkPayeeMatch(petPayees);
transaction.Amount = parseFloat(transaction.Amount);
return transaction;
};
// > node parse.js
// You had 10 purchases for the dog

As you can see, we built a curried checkPayeeMatch function by calling R.contains and only supplying the payee name field; we can then use it to check the same payee against both the restaurant and pet-related payee names. It’s a small thing, but an example of “don’t repeat yourself” in action.

So far, we’ve imported our transactions into an array (using async/await, even!) written a tagging function to add some additional metadata to our transaction, used ramda to apply that mapping function to the transactions array and get back a new array of modified transactions, and used a simple currying example to reduce repetition in our code.

In the next article on this, we’ll look at using R.sum to total up transactions, outputting data with r.forEach, and a new way to implement conditional logic with R.cond.

Hello Hexo

Finally decided to get off of the Wordpress train here and move my blog over to a static site generator. After looking at a couple of different options, I decided to give Hexo a try and see how it worked out. Fairly happy with it so far - it seems to have very sensible defaults out of the box.

Could’ve gone with Middleman (used it on professional projects in the past) or Hugo, but I wanted to go with something a little different and something that didn’t require me to install Pygments to get working syntax highlighting. It’s a perfectly fine package to work with, but I currently do little or no Python and didn’t really feel like taking the time to get pip set up to install Pygments properly. Laziness is sometimes a virtue. :)

Installing a list of extensions in Visual Studio Code

I’ve been setting up a new OS X installation and wanted to quickly get Visual Studio Code set back up. Atom has a really handy command line utility, apm, that lets you do useful things like export a list of extensions and reinstall them elsewhere by passing in that list as a command line argument.

Unfortunately, while Visual Studio Code’s command line utility allows you to get a list of extensions with code --list-extensions which you can pipe into a text file, it doesn’t appear to have any way to automatically install the extensions to that file.

Fortunately, a minute with Google and Stack Overflow turns up this very helpful answer to run a command for each line in a text file. From there, some quick trial and error got me to this:

1
while read in; do code --install-extension "$in"; done < ~/vscode-extensions.txt

Not quite as convenient as Atom’s solution, but a nice way to make sure you don’t overlook anything.