Francis Stokes: Blog

Programming ❤️ Math

Exploring the Peano Axioms With Algebraic Data Types

The Peano Axioms are 8 simple rules that can define the natural number system. What are the natural numbers? They’re are just the numbers 0, 1, 2, 3, … and so on, up to infinity.

This system is interesting because in mathematics, you need to be rigorous; You can’t go around saying numbers (or any mathematical object) just are, you have be able to really describe what you mean when you talk about zeroness or oneness.

The Axioms

The Peano Axioms describe a system with two elements: An object 0, and a function S. I’m using these words object and function quite loosely. As you will see below, 0 could actually be any symbol you choose, and the function S (known as the successor function) also essentially acts as just a symbol - that is to say, it is not necessary to define S function further than what is shown below.

These are the axioms as stated on wikipedia:

  1. 0 is a natural number
  2. For every natural number x, x = x (reflexivity)
  3. For all natural numbers x and y, if x = y, then y = x (symmetry)
  4. For all natural numbers x, y and z, if x = y and y = z, then x = z (transitivity)
  5. For all a and b, if b is a natural number and a = b, then a is also a natural number (closure under equality)
  6. For every natural number n, S(n) is a natural number
  7. For all natural numbers m and n, m = n if and only if S(m) = S(n) (injection)
  8. For every natural number n, S(n) = 0 is false. That is, there is no natural number whose successor is 0

What these axioms state is that we have an initial number 0, which is not the successor of any other number (i.e. it is the smallest), and that we can create new numbers by applying a successor function to any number. So there is a successor of 0 called S(0), and a successor to that called S(S(0)), and so on.

Practically speaking, you can read a Peano encoded natural number as simply being the number of Ss it has. So S(0) is 1, S(S(0)) is 2, and so on.

Here’s a table of the first few Peano numbers

N Peano
0 0
1 S(0)
2 S(S(0))
3 S(S(S(0)))
4 S(S(S(S(0))))

Algebraic Data Types

We can represent this system in types by utilising something called an Algebraic Data Type. An Algebraic type is either:

For this article we will focus on the sum type. There are many common sum types that we use every day. Boolean is a sum type of the types True and False. In Haskell it might be defined as

1
data Boolean = True | False

In JavaScript, we can use a library like daggy or algebraic-types to create a sum type. For these examples I will use algebraic-types because it provides a mechanism to properly define recursive types.

1
2
3
4
5
6
const {createUnionType} = require('algebraic-types');

const Boolean = createUnionType('Boolean', {
  True: [],
  False: []
});

True and False are both types that belong to the type Boolean. This is a simple but powerful concept, because it implies that meaning can be encoded directly on the type level. To illustrate this further, let’s take a breif look at a slightly more complex sum type, Maybe, which encodes possible nothingness.

In Haskell:

1
data Maybe a = Just a | Nothing

In JavaScript:

1
2
3
4
5
6
const {createUnionType, variable} = require('algebraic-types');

const Maybe = createUnionType('Maybe', {
  Nothing: [],
  Just: [variable('a')]
});

This should be read “A Maybe of ‘a’ is either Just ‘a’ or it is Nothing, where ‘a’ can be any value. You never need to write a function or method that returns null to indicate no value because you can do it on the type level.

One more thing that we can do with sum types is define recursive structures. A binary tree is laughably trival to implement as a sum type:

In Haskell:

1
data BinaryTree a = Leaf a | Branch BinaryTree BinaryTree;

In JavaScript:

1
2
3
4
5
6
const {createUnionType, variable, recursive} = require('algebraic-types');

const BinaryTree = createUnionType('BinaryTree', {
  Leaf: [variable('a')],
  Branch: [recursive('left'), recursive('right')]
});

Which is to say a BinaryTree is either Leaf of some value ‘a’, or it’s a Branch of two more BinaryTrees (both of which can be either a Leaf or a Branch).

Back to Peano

Now we have a basic understanding of sum types, we can apply the concepts of encoding meaning and recursive definition to creating a type for Peano Numbers. I will use JavaScript for the rest of the examples, but the translation to Haskell is trival.

1
2
3
4
const Peano = createUnionType('Peano', {
  O: [],
  S: [recursive('x')]
});

The Peano type is made of two types: O, which has no variables of its own, and S, which has one variable that must be another Peano type. The recursive definition ensures that we can only have axiomatically correct constructions:

1
2
3
4
5
6
7
8
9
10
11
const {S, O} = Peano;

// All the following are valid
const zero = O();
const one = S(zero);
const two = S(one);
const three = S(S(S(O())));

// But this is not
const invalid = S("hello");
// -> Error: Variable x on constructor S must be a Peano

So just by creating this type, we have the first axiom down. The next three axioms are related to the properties of equality in this system. We can encode them by adding a method eq to prototype of the Peano constructor. Since methods are defined on the Peano type rather than the on the child types S or O, there must be a mechanism to check which type the Peano actually is inside a method. algebraic-types adds a method called match to every type you create with it, which can perform this operation (called pattern matching). We can use this to build a recursive definition for eq.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Peano.prototype.eq = function (y) {
  const x = this;

  return x.match({
    O: () => y.match({
      O: () => true,
      S: () => false
    }),
    S: innerX => y.match({
      O: () => false,
      S: innerY => innerX.eq(innerY)
    })
  });
}

const zero = O();
const one = S(zero);

const x = S(one);
const y = S(S(zero));
const z = S(S(O()));

// Reflexivity
x.eq(x); // -> true

// Symmetry
x.eq(y); // -> true
y.eq(x); // -> true

// Transitivity
x.eq(y); // -> true
y.eq(z); // -> true
x.eq(z); // -> true

The rest of the axioms fall out of either the type defintion itself, or the implementation of equality.

Let’s break down the recursive definition for equality a little bit, as the same principle will be used to define more operations like addition and multiplication.

To compare if two Peanos, x and y are equal, we simply have to look at their types with pattern-matching. If x is a O and y is an S of anything, then they cannot be equal, but if y is also a O then we can say that they are equal. If both x and y are an S of something, then we must recursively call eq on their inner values. If they are indeed equal, this process will continue until both inner values reach O. If they are not equal, then one of the values will reach O while the other is an S of something, which we have already defined as non-equal.

More operations

Addition

Even though we can construct Peano numbers, and check if they are equal to each other, this can be described as neither particularly useful nor particularly interesting. What would be interesting is the ability to add one Peano to another, yielding a new Peano.

1
2
3
4
5
6
7
8
9
10
11
12
Peano.prototype.add = function (y) {
  const x = this;

  return x.match({
    O: () => y,
    S: innerX => S(y).add(innerX)
  });
};

// 2 + 3
S(S(O())).add(S(S(S(O()))));
// -> S(S(S(S(S(O())))))

Adding x and y is another recursive operation. Since for any number we know that y + 0 = y, pattern-matching a O value for x means that we can just return the other value (even if that value was a zero itself). If the value of x is an S however, and we increase the other y by wrapping it in an S, and add that to the inner value of x, then the effect is to add one to y and minus one from x. x will eventually match O and the final value of y will be returned.

This is kind of like have two piles of blocks that you want to add. If you just move one block at a time from the first pile into the second pile until there are no blocks left in the first pile, then the second pile will of course contain the number of blocks in the second pile plus the number of blocks in the first.

Subtraction

A very similar trick can be used for subtraction, but first it’s good to mention that subtraction in the Peano Axioms is not really well defined, since there are no negative numbers (axiom #8). For this reason we will say that an expression like 4 - 5 in our world will simply equal 0, rather than -1.

Going back to the idea of two piles, if we remove one block from each pile until one of the piles is empty, then whatever is left on the other pile is the result of (Pile #1) - (Pile #2). You can try it for yourself if you need some convincing.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Peano.prototype.sub = function (y) {
  const x = this;

  return x.match({
    O: () => y,
    S: innerX => y.match({
      O: => S(innerX),
      S: => innerX.sub(y)
    })
  });
};

// 3 - 2
S(S(S(O()))).sub(S(S(O())));
// -> S(O())

Notice that we need to put x back into an S if y is a O, because otherwise we would be off by one.

Multiplication

Multiplication is just glorified addition, so we should be able to build it onto of the existing add method.

1
2
3
4
5
6
7
8
9
10
11
12
13
Peano.prototype.mul = function (y) {
  const x = this;

  return x.match({
    O: () => O(),
    S: innerX => y.match({
      O: () => O(),
      S: innerY => S(innerX).add(
        innerY.mul(S(innerX))
      )
    })
  });
};

Anything times 0 is always 0, so in both those cases are easy. To multiply two numbers larger than zero we just keep adding xs while y is greater than 0, unwrapping a level of y every time.

Converting to and from Integers

It could be useful (as far as this system is useful in a practical sense) to be able to convert Peano numbers to regular Numbers, and to convert positive integers to Peano numbers. We can write a toInteger method like:

1
2
3
4
5
6
7
8
9
10
Peano.prototype.toInteger = function () {
  const x = this;
  return x.match({
    O: () => 0,
    S: innerX => 1 + innerX.toInteger()
  });
};

S(S(S(S(O())))).toInteger();
// -> 4

Which is basically just a recursive operation that unfolds to something like 1 + 1 + 1 + 1 + ... + 0.

fromInteger can be a static method on the Peano object which works in the other direction, by recursively wrapping n layers of S around an O.

1
2
3
4
5
6
7
8
9
10
11
12
13
Peano.fromInteger = (n) => {
  const f = (total, n) => {
    if (n > 0) {
      return f(S(total), n - 1);
    }
    return total;
  };

  return f(O(), n);
};

Peano.fromInt(5);
// -> S(S(S(S(S(O())))))

This has the nice effect of not blowing up if we throw in a negative number - we’d simply get 0 in those cases.

Now we can perform advanced calculations like

1
2
3
4
5
6
7
8
9
const three = S(S(S(O())));

Peano
  .fromInteger(50)
  .mul(three)
  .sub(S(three))
  .add(O())
  .toInteger();
// -> 146

Conclusion

The Peano Axioms have fascinated me ever since I learned about them because they are a digestible piece of mathematics that is somehow apart from the rote concepts you learn while making your way through the education system. The system shines a subtle light on the real underpinnings of mathematics; the rigor of defining the properties of a mathematical object. At school you might be tricked into thinking that numbers just are, but really they are just one of many mathematical objects with different kinds of properties. If you’re like me, and perhaps never knew about this side of math, you can check out the field of Abstract Algebra, that classifies objects and systems of this kind based on the properties they have.

But from a practical development perspective, as a number system, this isn’t particularly useful in everyday life. That said, you might find some real value in modeling with types. It’s a really powerful concept when applied correctly, especially when combined with functions that operate on types. You can discern a lot about a program by just looking at the signatures of such functions - what types do they take as parameters and what types do they return. If you’ve read this article coming from the JavaScript world, I advise you to checkout Haskell (especially if you’ve enjoyed the benefits of TypeScript). It takes programming with types to a fine art.

If you’ve found this article interesting, I’d really love to hear from you about it! Send me tweet @fstokesman

Arcsecond: Parsing in JavaScript Made Easy

images

I recently starting making a serious attempt at learning Haskell (anyone who has tried before will probably sympathise that it usually takes a couple of tries to crack it). Amongst the many cool things Haskell has to offer is an amazing parsing library that comes with the standard set of packages called Parsec, which lets you describe how to parse complex grammars in what essentially looks like natural language.

Here is how a CSV parser is implemented using Parsec. Don’t worry if you don’t understand all the syntax, the point is that the whole parser is specified in just four lines.

1
2
3
4
5
6
7
8
9
10
11
-- Taken from http://book.realworldhaskell.org/read/using-parsec.html

import Text.ParserCombinators.Parsec

csvFile = endBy line eol
line = sepBy cell (char ',')
cell = many (noneOf ",\n")
eol = char '\n'

parseCSV :: String -> Either ParseError [[String]]
parseCSV input = parse csvFile "(unknown)" input

If you want to read more checkout http://book.realworldhaskell.org

This post is not about Haskell however, but rather a library I wrote called arcsecond which is based on Parsec, with the goal of bringing that same expressivity to JavaScript.

Parser combinators

arcsecond is a parser combinator library, in which complex parsers can be built by composing simple parsers. The simplest parsers just match specific strings or characters:

1
2
3
4
5
6
7
8
9
10
11
const { digit, letter, char, str, regex } = require('arcsecond');

// digit will parse any number 0-9
// letter will parse any letter a-z or A-Z

// char and str will parse specific sequences
const getA = char ('a');
const getHello = str ('hello');

// regex will parse any given regular expression
const getLowNumbers = regex (/^[0-4]+/);

These can then be combined together with the combinators in the library.

1
2
3
4
5
6
7
8
9
10
11
12
13
const { char, str, sequenceOf, choice } = require('arcsecond');

const worldOrMarsOrVenus = choice ([
  str ('world'),
  str ('mars'),
  str ('venus')
]);

const combinedParser = sequenceOf ([
  str ('hello'),
  char (' '),
  worldOrMarsOrVenus
]);

Then the new parsers can be used with text:

1
2
3
4
5
6
7
8
9
const { parse, char, str, sequenceOf, choice } = require('arcsecond');

// parse creates a function we can use for parsing text
const parseText = parse (combinedParser);

console.log(
  parseText ('hello world')
)
// -> [ 'hello', ' ', 'world' ]

Combinators

The combinators are where it gets cool. In arcsecond a combinator is a higher order parser, which takes one or more parsers as its input and gives back a new parser that combines those in some way. If you’ve used higher order components in react like connect, withRouter, or withStyles, then you’re already familiar with the idea.

As shown above, sequenceOf is a combinator that will parse text using each of the parsers in order, collecting their results into an array.

choice, on the other hand, will try each of its parsers in order and use the first one that matches. The library contains more, like many, which takes a parser as an argument and matches as much as it can using that parser, collecting up the results in an array:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
const { parse, many, choice, str } = require('arcsecond');

const animal = choice ([
  str ('dog'),
  str ('whale'),
  str ('bear'),
  str ('ant')
]);

const manyAnimals = many (animal);

// parse creates a function we can use for parsing text
const parseText = parse (manyAnimals);

console.log(
  parseText ('dogantwhaledogbear')
)
// -> [ 'dog', 'ant', 'whale', 'dog', 'bear' ]

You can use sepBy to create a parser that matches items separated by something matched by another parser.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
const { parse, sepBy, choice, str, char } = require('arcsecond');

const animal = choice ([
  str ('dog'),
  str ('whale'),
  str ('bear'),
  str ('ant')
]);

const commaSeparated = sepBy (char (','));

const commaSeparatedAnimals = commaSeparated (animal);

// parse creates a function we can use for parsing text
const parseText = parse (commaSeparatedAnimals);

console.log(
  parseText ('dog,ant,whale,dog,bear')
)
// -> [ 'dog', 'ant', 'whale', 'dog', 'bear' ]

between will let you match items that occur between two other parsers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
const { parse, between, sepBy, choice, str, char } = require('arcsecond');

const animal = choice ([
  str ('dog'),
  str ('whale'),
  str ('bear'),
  str ('ant')
]);

const betweenBrackets = between (char ('[')) (char (']'));
const commaSeparated = sepBy (char (','));

const commaSeparatedAnimals = commaSeparated (animal);
const arrayOfAnimals = betweenBrackets (commaSeparatedAnimals);

// parse creates a function we can use for parsing text
const parseText = parse (arrayOfAnimals);

console.log(
  parseText ('[dog,ant,whale,dog,bear]')
)
// -> [ 'dog', 'ant', 'whale', 'dog', 'bear' ]

Curried Functions

arcsecond makes use of curried functions. If you don’t know what a curried function is, give my article “Making Functional Programming Click” a read. If you really can’t be bothered right now, open that in another tab and read this executive summary.

A curried function is one that, if it takes more than one argument, it instead returns a new function that takes the next argument. It’s easier to see with some code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Normal function, just a single function taking both arguments and returning a result
function normalAdd(a, b) {
  return a + b;
}

normalAdd(1, 2);
// -> 3


// Curried function, a function that takes the first argument and then returns
// *another function*, which takes the next argument.
function curriedAdd(a) {
  return function(b) {
    return a + b;
  }
}

curriedAdd(1)(2);
// -> 3

// We can use curried functions to easily make new, more specific functions
const addOne = curriedAdd(1);

addOne(2);
// -> 3

As you can see above, curriedAdd is first called with 1. Since it then returns a function, we can go ahead and call that with 2, which finally returns the actual result.

We can use curriedAdd to create a new function by calling it with just one argument and then assigning the result to a variable. As a language, JavaScript treats functions as a first class citizen, meaning they can be passed around and assigned to variables.

This principle lies at the heart of arcsecond, and every function in the library is defined this way. sepBy take two parsers — first a separator parser, and second a value parser. Because it is curried, it is easy to create a more specific combinator like commaSeparated by only supplying the first argument.

If you’re not used to it, it will probably seem strange. But a good lesson to learn as a software developer is not to have knee-jerk bad reactions to things that you don’t immediately understand. There is usually a reason, and you’ll find a lot more value by discovering that reason rather than dismissing it.

Error Handling

If you try to parse a string which is not correctly formatted you would expect to get some kind of error message. arcsecond uses a special data type called an Either, which is either a value or an error. It’s like a Promise, which can be either Rejected or Resolved, but without the implication of being asynchronous. In an Either however, the “resolved” type is called a Right, and the “rejected” type is called a Left.

The return type of parse is an Either. You can get at the value or error like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
const { parse, sequenceOf, str, char } = require('arcsecond');

const parser = sequenceOf ([
  str ('hello'),
  char (' '),
  str ('world')
]);

const parseResult = parse(parser)('hello world');

// We can use the cata method, which is basically a switch/case but for types
parseResult.cata({
  Left: err => {
    // do something here with the error
  },
  Right: result => {
    // result -> [ 'hello', ' ', 'world' ] 
  }
});

However this might not fit well into your codebase if you don’t have more functional code. For that reason there are two other options. The first is to convert the Either into a Promise:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
const { parse, sequenceOf, str, char, toPromise } = require('arcsecond');

const parser = sequenceOf ([
  str ('hello'),
  char (' '),
  str ('world')
]);

const parseResult = parse(parser)('hello world');

toPromise(parseResult)
  .catch(err => {
    // do something here with the error
  })
  .then(result => {
    // result -> [ 'hello', ' ', 'world' ] 
  });

Or you can use toValue, which must be wrapped in a try/catch block:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
const { parse, sequenceOf, str, char, toValue } = require('arcsecond');

const parser = sequenceOf ([
  str ('hello'),
  char (' '),
  str ('world')
]);

const parseResult = parse(parser)('hello world');

try {
  const result = toValue(parseResult);
  // -> [ 'hello', ' ', 'world' ] 
} catch (err) {
  // do something with err
}

Something more complex: JSON

Let’s put arcsecond through its paces by using it to create a parser for JSON.

Click here to skip ahead and see the full JSON parser in one file

Values

JSON only has 7 possible values:

  • String
  • Number
  • true
  • false
  • null
  • Array
  • Object

So to write a JSON parser, we just need to write parsers for all these values.

Types

In order for our parser to be useful we need to be able to identify what we’ve parsed, and the best way to do that is to put the results into a data type, which will provide a common interface for us to interact with the JSON tree. Every type has a type name, a value, and a toString function to pretty print the structure.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// Function to create a single-valued "type constructor"
const makeBasicType = typeName => value => ({
  type: typeName,
  value,
  toString: () => `${typeName}(${value})`
});

// Function to create a multi-valued "type constructor"
const makeMultiType = typeName => values => ({
  type: typeName,
  value: values,
  toString: () => `${typeName}(${values.map(v => v.toString()).join(', ')})`
});



const stringType = makeBasicType('String');
const numberType = makeBasicType('Number');
const booleanType = makeBasicType('Boolean');

const arrayType = makeMultiType('Array');
const objectType = makeMultiType('Object');

// For the object, we can store the key/value pairs in their own type
const keyValuePair = (key, value) => ({
  type: 'KeyValuePair',
  value: [key, value],
  toString: () => `KeyValuePair(${key.toString()}, ${value.toString()})`
});

const nullType = () => ({
  type: 'Null',
  value: null,
  toString: () => 'Null'
});



arrayType([
  stringType("hello"),
  numberType(42),
  stringType("world"),
  numberType(84),
]).toString();
// -> Array(String(hello), Number(42), String(world), Number(84))

With our types in hand let’s start with the absolute simplest of parsers: true, false and null. These are just literal strings:

1
2
3
4
5
6
7
8
9
10
11
const { parse, str, choice } = require('arcsecond');

const nullParser = str ('null') .map(nullType);
const trueParser = str ('true') .map(booleanType);
const falseParser = str ('false') .map(booleanType);

const jsonParser = choice ([
  nullParser,
  trueParser,
  falseParser
]);

The parsers in arcsecond have a map method — just like arrays —which allows you to transform the value the parser matched. With map we can put the values matched into the data types defined above.

Numbers are a bit more complex. The JSON specification has this railroad diagram showing how a number can be parsed:

images

The forks in the railroads show optionality, so the simplest number that can be matched is just 0.

Basically a numbers like:

  • 1
  • -0.2
  • 3.42e2
  • -0.4352E-235

Are all valid in JSON, while something like 03.46 is not, because no path in the railroad would allow for that.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
const {
  choice,
  sequenceOf,
  char,
  digits,
  regex,
  possibly,
  anyOfString
} = require('arcsecond');

// The possibly parser either returns what it matches or null
// We can use the possibly parser to make one that returns empty string
// instead of null, which is more useful in this case
const orEmptyString = parser => possibly (parser) .map(x => (x) ? x : '');

// We also want to get sequences of matches, but then join them back
// together afterwards, instead of being in an array
const joinedSequence = parsers => sequenceOf (parsers) .map(x => x.join(''));

const numberParser = joinedSequence ([
  orEmptyString (char ('-')),
  choice ([
    char ('0'),
    regex (/^[1-9][0-9]*/)
  ]),
  orEmptyString (joinedSequence ([
    char ('.'),
    digits
  ])),
  orEmptyString (joinedSequence([
    anyOfString ('eE'),
    orEmptyString (anyOfString ('-+')),
    digits
  ]))
]) .map (x => numberType(Number(x)));


// Now we can add the numberParser to the jsonParser
const jsonParser = choice ([
  nullParser,
  trueParser,
  falseParser,
  numberParser
]);

If you take the time to read through the numberParser, you’ll see it lines up pretty much 1:1 with the diagram above.

Let’s try strings next. A string is anything between double quotes, but it can also contain escaped quotes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
const {
  between,
  many,
  choice,
  sequenceOf,
  char,
  anythingExcept
} = require('arcsecond');

// Like joinedSequence, we want to get many of the same matches and then join them
const joinedMany = parser => many (parser) .map(x => x.join(''));

const betweenQuotes = between (char ('"')) (char ('"'));

const stringParser = betweenQuotes (joinedMany (choice ([
  joinedSequence ([
    char ('\\'),
    char ('"')
  ]),
  anythingExcept (char ('"'))
]))) .map(stringType);

// Adding the stringParser to the jsonParser
const jsonParser = choice ([
  nullParser,
  trueParser,
  falseParser,
  numberParser,
  stringParser
]);

The anythingExcept parser comes in very handy here, and is especially expressive when compared with the image on the JSON spec website.

images

That only leaves Array and Object, which can both be pitfalls because they are basically just containers of jsonValue. To illustrate how this might go wrong, we can write the Array parser the “wrong” way first and then see how to address it.

We can use the whitespace parser — which matches zero or more whitespace characters — to ensure the the array brackets and comma operator allow for any (optional) whitespace that might be there.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
const {
  between,
  char,
  choice,
  sepBy,
  whitespace
} = require('arcsecond');

const whitespaceSurrounded = between (whitespace) (whitespace)
const betweenSquareBrackets = between (whitespaceSurrounded (char ('['))) (whitespaceSurrounded(char (']')));
const commaSeparated = sepBy (whitespaceSurrounded (char (',')));

const arrayParser = betweenSquareBrackets (commaSeparated (jsonParser)) .map(arrayType);

const jsonParser = choice ([
  nullParser,
  trueParser,
  falseParser,
  numberParser,
  stringParser,
  arrayParser
]);

// ReferenceError: jsonParser is not defined

Because arrayParser is defined in terms of jsonParser, and jsonParser is defined in terms of arrayParser, we run into a ReferenceError. If we moved the definition of arrayParser below jsonParser, we’d still have the same problem. We can fix this by wrapping the jsonParser in a special parser, aptly named recursiveParser. The argument to recursiveParser is a thunk, which will allow us to reference variables that are not yet in scope.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
const {
  between,
  char,
  choice,
  sepBy,
  whitespace,
  recursiveParser
} = require('arcsecond');

const whitespaceSurrounded = between (whitespace) (whitespace)
const betweenSquareBrackets = between (whitespaceSurrounded (char ('['))) (whitespaceSurrounded(char (']')));
const commaSeparated = sepBy (whitespaceSurrounded (char (',')));

// Move this parser to the top, and wrap it in a recursiveParser,
// which takes a function that returns a parser
const jsonParser = recursiveParser (() => choice ([
  nullParser,
  trueParser,
  falseParser,
  numberParser,
  stringParser,
  arrayParser
]));

const arrayParser = betweenSquareBrackets (commaSeparated (jsonParser)) .map(arrayType);

Implementing the arrayParser is actually quite trivial — as simple as JSON values, separated by commas, in square brackets.

Object is only marginally more complex. The values in an object are pairs of strings some other JSON value, with a colon as a separator.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
const {
  between,
  char,
  choice,
  sepBy,
  whitespace,
  recursiveParser
} = require('arcsecond');

const betweenCurlyBrackets = between (whitespaceSurrounded (char ('{'))) (whitespaceSurrounded(char ('}')));

const jsonParser = recursiveParser (() => choice ([
  nullParser,
  trueParser,
  falseParser,
  numberParser,
  stringParser,
  arrayParser,
  objectParser
]));

const keyValuePairParser = sequenceOf ([
  stringParser,
  whitespaceSurrounded (char (':')),
  jsonParser
]) .map(([key, _, value]) => keyValuePair(key, value));

const objectParser = betweenCurlyBrackets (commaSeparated (keyValuePairParser)) .map(objectType);

And that’s it. The jsonValue can be used to parse a JSON document in it’s entirety. The full parser can be found here as a gist.

Bonus Parser: CSV

Since I opened up with a CSV parser in Haskell, let’s see how that would look in arcsecond. I’ll keep it minimal and forgo creating data types to hold the values, and some extra strengthening that the Parsec version also doesn’t have.

The result should be an array of arrays — the outer array holds “lines” and the inner arrays contain the elements of the line.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
const {
  parse,
  char,
  many,
  regex,
  anythingExcept,
  sepBy
} = require('arcsecond');

const joinedMany = parser => many (parser) .map(x => x.join(''));

const cell = joinedMany (anythingExcept (regex (/^[,\n]/)));
const cells = sepBy (char (',')) (cell);
const parser = sepBy (char ('\n')) (cells);

const data = `
1,React JS,"A declarative efficient and flexible JavaScript library for building user interfaces"
2,Vue.js,"Vue.js is a progressive incrementally-adoptable JavaScript framework for building UI on the web."
3,Angular,"One framework. Mobile & desktop."
4,ember.js,"Ember.js - A JavaScript framework for creating ambitious web applications"`;

console.log(
  parse (parser) (data)
);
// -> [
//   [ '' ],
//   [ '1', 'React JS', '"A declarative efficient and flexible JavaScript library for building user interfaces"' ],
//   [ '2', 'Vue.js', '"Vue.js is a progressive incrementally-adoptable JavaScript framework for building UI on the web."' ],
//   [ '3', 'Angular', '"One framework. Mobile & desktop."' ],
//   [ '4', 'ember.js', '"Ember.js - A JavaScript framework for creating ambitious web applications"' ],
// ]

Conclusion

There are quite a few key features of arcsecond that didn’t get a mention in this article, including the fact it can parse context sensitive languages, and the parsing model is based on a Fantasy Land -compliant Monad. My main goal with this project was to bring the same level of expressivity that Parsec has to JavaScript, and I hope I’ve been able to do that.

Please check out the project on github along with all the API docs and examples, and think of arcsecond the next time you find yourself writing an incomprehensible spaghetti regex — you might be using the wrong tool for the job! You can install the latest version with:

npm i arcsecond

Infinite Data Structures in JavaScript

Languages like Haskell are able to deal directly with Infinite Lists, but can that ability be brought to the world of JavaScript?


images

In the real world we deal with “infinite” ideas all the time. All the positive numbers is an example of an infinite concept we wouldn’t bat an eye at. But typically when we are programming we have to think of these infinite concepts in a finite way. You can’t have an Array of all the positive numbers (at least not in JavaScript!).

What I want to introduce in this article is the idea of an Infinite List data structure, which can represent some never ending sequence, and let’s us use common operations like map and filter to modify and create new sequences.

1
2
3
4
5
const allPositiveNumbers = /* somehow we can create this */;

// We can create new Infinite structures using map and filter
const allEvenNumbers = allPositiveNumbers.filter(x => x % 2 === 0);
const allOddNumbers = allEvenNumbers.map(x => x - 1);

When we want to actually use the data, we can just take some concrete amount of it.

1
2
3
4
5
6
7
8
allPositiveNumbers.take(10);
// -> [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]

allEvenNumbers.take(10);
// -> [ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 ]

allOddNumbers.take(10);
// -> [ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19 ]

Creating an Infinite List

To create the kind of structure defined above, we first need some way to just describe an infinite thing without needing to actually evaluate it (which would kill the computer really fast). To do that we first need to understand 2 things:

  • The iterator pattern
  • Generator functions

The iterator pattern

Let’s start with the iterator pattern first. The iterator pattern is a kind of design pattern where you can produce a lot of values, one at a time. It’s basically just an object with a .next() method. When that method is called, it returns another object with 2 keys: value and done. As we call .next(), the done property indicates whether the iterator has more values to give us, and the value is of course the value. Below is a simple iterator that returns values 0 to 3:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
const createIterator = () => {
  let x = 0;
  return {
    next: () => {
      if (x > 3) {
        return { value: undefined, done: true };
      } else {
        const currentValue = x;
        x += 1;
        return { value: currentValue, done: false };
      }
    }
  }
};

const iterator = createIterator();
iterator.next();
// -> { value: 0, done: false }

iterator.next();
// -> { value: 1, done: false }

iterator.next();
// -> { value: 2, done: false }

iterator.next();
// -> { value: 3, done: false }

iterator.next();
// -> { value: undefined, done: true }

This kind of pattern has a first-class place in JavaScript. Arrays, Objects, Maps, and Sets can all implement the iterator pattern by conforming to an interface called Iterable.

Generator functions

Generator functions are a special kind of function in JavaScript which is declared with the function* syntax. Generator functions are used to create iterator objects (ones with a .next() method), but in much clearer and more concise way. Below is a finite generator that creates an equivalent iterator:

1
2
3
4
5
6
7
const createIterator = function* () {
  let x = 0;
  while (x < 4) {
    yield x;
    x += 1;
  }
};

Here, the yield keyword is used to indicate the value that is returned every time the iterator is called. You can think of the function pausing after every yield, and picking up where it left off when the iterator’s .next() method is called again.

All it would take to make this an infinite generator is to turn the while (x < 4) loop into a while (true) loop. For a better feel, here’s an infinite generator for the famous fibonacci sequence:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
const createFibSeqIterator = function* () {
  let a = 0;
  let b = 1;
  while (true) {
    yield b;
    const tmp = a;
    a = b;
    b = b + tmp;
  }
};

const fibSeq = createFibSeqIterator();

fibSeq.next() // -> { value: 1, done: false }
fibSeq.next() // -> { value: 1, done: false }
fibSeq.next() // -> { value: 2, done: false }
fibSeq.next() // -> { value: 3, done: false }
fibSeq.next() // -> { value: 5, done: false }

Putting it together

Iterators seem like a representation of something infinite because it lets us ask for the .next() element indefinitely. Furthermore, a generator seems like a good way to specify the iterator, because we can write succinct algorithms for infinite series without the boilerplate of manually crafting iterators.

But this still isn’t enough, because as powerful as generators are, they’re not really compositional. If I wanted to to create a generator where I filtered to all the fibonacci numbers that ended with a 5, I can’t easily use my existing createFibSeqIterator() to do that — i.e. I can’t compose the idea of the original generator + some new operation on it.

To remedy this, we first need to encapsulate the generator into a data type, which we can do with a class:

1
2
3
4
5
class Infinite {
  constructor (generatorFn) {
    this.generator = generatorFn;
  }
}

It’s on this class that we will implement our operations like filter, map, and take.

Infinitely Lazy

You might be puzzled when thinking about how we could implement an operation such as filter. The answer is simple: we do it lazily. Instead of actually trying to filter our list, we just make a note inside the Infinite class. Then when the user wants concretely .take() some elements of it, we can do the actual business of filtering then.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class Infinite {
  constructor (generatorFn) {
    this.generator = generatorFn;
    this.transformations = [];
  }

  filter(filterFn) {
    const newInfinite = new Infinite(this.generator);
    newInfinite.transformations = this.transformations.slice();
    newInfinite.transformations.push({
      type: 'filter',
      fn: filterFn
    });

    return newInfinite;
  }
}

The Infinite class gets a new transformations property, and filter creates a new Infinite with the same generator and transformations array, and pushes a transformation into the list.

We now have all the components we need now to write a .take() that will make the list concrete.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
class Infinite {
  constructor (generatorFn) {
    this.generator = generatorFn;
    this.transformations = [];
  }

  filter(filterFn) {
    const newInfinite = new Infinite(this.generator);
    newInfinite.transformations = this.transformations.slice();
    newInfinite.transformations.push({
      type: 'filter',
      fn: filterFn
    });

    return newInfinite;
  }

  take(n) {
    const iterator = this.generator();
    const concrete = new Array(n);
    let index = 0;

    while (index < n) {
      const {value, done} = iterator.next();

      if (done) {
        // The generator wasn't infinite, return what we got up until this point
        return concrete;
      }

      // Loop over the transformations and apply them to the value
      let x = value;
      let filtered = false;
      for (let i = 0; i < this.transformations.length; i++) {
        const T = this.transformations[i];

        if (T.type === 'filter') {
          if (!T.fn(x)) {
            filtered = true;
          }
        }
      }

      // After the transformations are complete, if we didn't filter, then we can
      // x to the concrete list
      if (!filtered) {
        concrete[index] = x;
        index++;
      }
    }

    return concrete;
  }
}

When we call .take(n), we can create an iterator out of the generator, and then initialize a fixed-length array with n elements. This will be our concrete list. An index variable can be used to keep track of how many concrete values we’ve collected so far. Using a while loop, we can get a value out of the iterator, and then run our list of transformations on it. If one of those transformations is a filter, and it doesn’t pass the test, we simply don’t add it to our concrete list, and repeat the loop. When we’ve collected n elements, we’re done and can return the concrete list.

Let’s see how that looks with the fibonacci example from before:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
const fibonacciSequence = new Infinite(function* () {
  let a = 0;
  let b = 1;
  while (true) {
    yield b;
    const tmp = a;
    a = b;
    b = b + tmp;
  }
});

const fibsEndingWith5 = fibonacciSequence.filter(x => {
  const str = x.toString();
  return str[str.length - 1] === '5';
});

fibonacciSequence.take(5);
// -> [ 1, 1, 2, 3, 5 ]

fibsEndingWith5.take(5);
// -> [ 5, 55, 6765, 75025, 9227465 ]

To implement map, we could use much the same approach as with filter. The map method itself just clones the current Infinite and adds a new transformation to the list. Inside take, it’s enough to just add an else-if inside the transformation loop.

Conclusion

In this article we’ve explored the iterator pattern and generator functions in order to build a compositional and lazily-evaluated Infinite List data structure.

The Infinite List in this article is a bit limited however, because it lacks some operations that would make it truly useful, like a map implementation, dependent filtering, or the ability to combine Infinites together (like pairing every fibonacci number with a prime number, for example).

For these I created lazy-infinite, a more powerful Infinite List structure, that conforms to the Fantasy-Land specification. I’d love for you to take a look on github, or to give it a try in your next project with

npm i lazy-infinite