Category Archives: Computation

Meeting Julia

In my last post I mentioned the new language Julia. It deserves more than a single paragraph, so I thought I’d walk through a problem, and tackle it with the language.

The problem is a stock standard one: investigating the fitting of an SIR model of disease spread to an outbreak of influenza at an English boarding school. Of the 763 pupils, on the first day only one was infected, and the number of infected pupils changed over 14 days as:

1   2   3   4   5   6   7   8   9  10  11  12  13  14
3   6  25  73 222 294 258 237 191 125  69  27  11   4

We will try to find parameters \beta and \gamma so that when the equations

\displaystyle{\frac{dS}{dt}=-\beta IS}

\displaystyle{\frac{dI}{dt}=\beta IS-\gamma I}

\displaystyle{\frac{dR}{dt}=\gamma I}

are solved, the values of I fit the data.

Note that the sum of all the right hand sides is zero; this indicates that S+I+R is constant, which for this model is the population. (Other models allow for births, deaths and other population changes).

Finding the parameters can be done in several steps:

  1. Set up a function for the system of differential equations.
  2. From that function, create a second function which is the sum of squared differences between the data, and the computed values from the equations.
  3. Minimize that sum.

The first function can be created in Julia (following the example here) as:

function SIR(t,x,p)
     S = x[1]
     I = x[2]
     R = x[3]
     beta = p[1]
     gamma = p[2]
     [-beta*S*I, beta*S*I-gamma*I, gamma*I]
end

The default behaviour of functions in Julia is to return the line before the “end” statement; thus in a function such as this one, there is no need for a “return” statement (although one exists).

To solve a system of equations we need access to the ODE package, which I’ll assumed you’ve installed with

Pkg.add()

This can be loaded into a running Julia session either with

using ODE

which adds all the ODE functions into the top namespace, or as

import ODE

which adds all the functions in the ODE namespace. So to call the Dormand-Prince method, you would use

ode45

if all methods were in the top namespace, or as

ODE.ode45

in the second case.

So, for example, we could attempt to solve the equations as:

p = [0.01, 0.1]
t, x = ODE.ode45((t,x)->SIR(t,x,p),[0:0.1:14],[762.,1.,0.]);

To plot the functions, we’ll use the plotting library Winston (Winston, Julia from 1984 – geddit?), which is fairly basic, but enough for our simple needs:

import Winston
wn = Winston

wn.plot(t,x[:,1],"b",t,x[:,2],"g",t,x[:,3],"r",[1:14],data',"k*")

SIR_julia

The green curve corresponds to I, the number of infected individuals at a given time, and the asterisks to the data. Clearly these parameters do not provide a very good fit.

The next step is to create a sum of squares function to minimize. To do this, we will solve the ODE system with the ode4 method which uses a fixed step size. This means we can ensure that there are computed values for I at all the integer points:

function ss(b::Vector)
    data = [3 6 25 73 222 294 258 237 191 125 69 27 11 4];
    t,x = ode.ode4((t,x)->SIR(t,x,b),[0:0.05:14],[762.,1.,0.]);             
    sum((data-x[21:20:281,2]').^2)
    end

Now this function can be optimized, using a method from the Optim package:

import Optim

Optim.optimize(ss,[0.001,1],method=:cg)

The “method” variable here defaults to the Nelder-Mead method, but for this particular function I found that the conjugate-gradient method gave the best results.

This produces quite a bit of information, of which the salient bits are:

 * Minimum: .0021806376138654117
.44517197444683443

 * Value of Function at Minimum: 4507.078964

So, let’s try these parameters and see what happens:

p = [.0021806376138654117, .44517197444683443];
t, x = ODE.ode45((t,x)->SIR(t,x,p),[0:0.1:14],[762.,1.,0.]);
wn.plot(t,x[:,1],"b",t,x[:,2],"g",t,x[:,3],"r",[1:14],data',"k*")

And here’s the result:
SIR_julia2

As you can see, the fit is remarkably good.

As far as problems go, this is not particularly hard – conceptually speaking it’s probably at an undergraduate level. On the other hand, it’s not completely trivial, and does seem to me to give the computational environment a bit of a work-out. And Julia solves it with ease.

The best Matlab alternative (3)

Over two years ago I wrote The best Matlab alternative with a follow-up a bit later, which seem to have engendered a far amount of discussion.

Well, things have moved on in the computational world, and a user is now spoiled for choice.

Octave has just had version 3.8 released; this version comes with a GUI (which is not enabled by default). However, the GUI is a very fine thing indeed:

octave_gui

As you see, it has all the bells and whistles you may want: command window and history, variable browser, file browser, editor, and a documentation reader. The GUI will become standard as of version 4.0.

I think that Octave has come along in leaps and bounds over the past few years, and as a drop-in Matlab replacement (if that’s what you want) I don’t think it can be bettered.

Scilab is at version 5.4.1, and is a mature product. I have come across a few comments in newgroups to the effect that Scilab is better than Octave for some advanced numerical work. However, I’m not sure if these are just opinions or are backed up with solid data and timings. I used to like Scilab a great deal, but now I don’t see that it offers anything over Octave other than Xcos (a dynamic systems modeller, similar to Simulink), and a few niche extras, such as solutions of boundary value problems.

Python was a language I hadn’t considered in my first post, but with the addition of NumPy and SciPy (plus numerous add-on packages) becomes a definite contender for serious number-crunching. The web is full of stories about people who have dumped Matlab for Python. And of course with Python you get a lovely programming language, and extensions such as Cython, which gives you C-type speed. To get an idea of the packages, see here. Python has a interactive shell IPython which adds enormously to Python’s ease of use.

Julia is the “new kid on the block”: a language which has been designed from the ground up for efficiency and speed, as well as convenience and ease of use. It is supposed to have the power of Matlab, the speed of C, and the elegance of Python. It is still in the early stages (the most recent version is 0.3.0), but it shows enormous promise, and has garnered an impressive list of users and developers. And there is already a small – but growing – list of add-on packages. The Julia web pages show some timings which seem to indicate that Julia is faster (sometimes by several orders of magnitude) than its contenders. (Well, they would, wouldn’t they? They would hardly show timings which were slower than other environments.)

My particular interest is image processing, in which Octave does very well (its image package is quite mature now), followed by Python (with its Mahotas and scikit-image packages). Scilab has its SIP toolbox, but I spent several days trying to install it and gave up. There is an Images package for Julia, but it doesn’t seem on a first glance to have the breadth of the others.

I ran a single test myself: to create a 1000 x 1000 random matrix, invert it, and find the trace of the product of the original and its inverse. In Octave:

tic; A = rand(1000,1000);Ai = inv(A);trace(A*Ai);toc

Scilab:

tic; A = rand(1000,1000);Ai = inv(A);trace(A*Ai);toc

Python (importing numpy as np):

%timeit A = np.matrix(np.random.rand(1000,1000));Ai = A.I;np.trace(A*Ai)

Julia:

tic();A=rand(1000,1000);Ai = inv(A);trace(A*Ai);toc()

Note that Octave, Scilab and Julia all have similar syntax; Python is slightly different because functions don’t all exist in the top namespace. This means, for example, you can use a function called “rand” from different packages, with different functionality, easily.

And here are the results (running the code three times each):

Octave: 1.08677; 1.04735; 1.10616 seconds

Scilab: 3.502; 3.425; 3.663 seconds

Python: 2.33; 2.28; 2.39 seconds

Julia: 0.47034; 0.35776; 0.36403 seconds

This is hardly definitive, but it does show how fast Julia can be, and in relation how slow Scilab can be.

These four environments are by no means the end. R is extensively used, and has over 5000 add-on packages at the CRAN repository. However, I’ve never used R myself. I believe it has branched out from its statistical foundations, and is now more of a general use environment, but still with the best statistical functionality in the business. Then there’s the Perl Data Language, apparently used by astrophysicists, and the GNU Data Language, about which I know nothing. See the Wikipedia page for a list of others.

(Note that I’ve not discussed Freemat this time; I don’t believe it’s a serious contender. I had a look through some of its source code, and most of its files seem to be cut-down Matlab/Octave files without any error checking. Anyway, there seems to be no need to use niche software when Octave is so mature and so well-supported by its developers and users.)

My recommendations? Use Octave, unless you have some niche requirements, in which case you may find Scilab more suitable. If you’re prepared to sacrifice maturity for speed, give Julia a go. If your wish is a mature programming language as your base, then Python.

A very long-running program

In the interests of random number generation, I’ve been experimenting with the iteration

x\to g^x\pmod{p}

for a prime p and a primitive root g\mod p.  Now, it turns out that some primes have a primitive root which generates all non-zero residues, and others don’t.  For example, consider the prime p=19, which has primitive roots 2, 3, 10, 13,14,15.  Starting with x=1, we find the iterations produce:

\displaystyle{\begin{array}{ll}2:&2, 4, 16, 5, 13, 3, 8, 9, 18, 1\\  3:& 3,8,6,7,2,9,18,1\\  10:&10,9,18,1\\  13:&13,15,8,16,9,18,1\\  14:&14,9,18,1\\  15:&15,8, 5, 2, 16, 6, 11, 3, 12, 7, 13, 10, 4, 9, 18, 1  \end{array}}

We see that no primitive root generates all non-zero residues. However, for p=23, we find that the primitive root 20 does generate all non-zero residues:

20,18,2, 9, 5, 10, 8, 6, 16, 13, 14, 4, 12, 3, 19, 17, 7, 21, 15, 11, 22, 1

Let’s call a prime such as 23, an “exponentially complete” prime. You can see a list of the first few such primes at http://oeis.org/A214876 (which has been contributed by me).

My question of the moment is: “Is 2^{31}-1 exponentially complete?” The prime 2^{31}-1 is one beloved by random number theorists, because modulo arithmetic can be performed very efficiently using bit shifts. In C, for example, z \mod p can be computed with

(z&p)+(z>>31)

However, I can’t think of any method of finding this out except by testing each primitive root, to determine the length of its cycle. Even using GNU C, and the PARI library, and some speed-ups (for example, replacing modular exponentiation with just one multiplication from pre-computed arrays), my desktop computer has been grinding away for two days now, and is up to primitive root 1678; that is, the program has taken 2 days to test 380 primitive roots. Since there are \phi(2^{31}-2)=534600000 primitive roots, at this rate it will take something like 534600000/380\times 2 = 2813684.21 days, or about 7703.66 years. That’s a very long time.

Easy Simpson’s rule

Simpson’s rule, as I’m sure all readers of this blog will know, is a simple rule for numerical integration (or quadrature). It can be obtained in two ways, either by determining the interpolating polynomial over three equally spaced points and integrating that polynomial, or by finding the values w_i for which

\displaystyle{\int_a^bf(x)\,dx = w_0f(a)+w_1f(a+h)+w_2f(b)}

(where h=(b-a)/2) is accurate for f(x)=1,x,x^2.  Over the interval [0,2]\, Simpson’s rule is

\displaystyle{\int_0^2f(x)\,dx = \frac{1}{3}\Bigl(f(0)+4f(1)+f(2)\Bigr)}

and over a general interval

\displaystyle{\int_a^bf(x)\,dx = \frac{b-a}{6}\Bigl(f(a)+4f(a+h)+f(b)\Bigr)}.

Simpson’s rule has the added advantage that it is also accurate for cubic polynomials.

On its own the rule is not hugely accurate, so one way of increasing the accuracy is to divide the interval into smaller, equally sized regions and apply Simpson’s rule to each of them. This provides the composite Simpson’s rule:

\displaystyle{\int_a^bf(x)\,dx \approx \frac{b-a}{3n}\Bigl(f(a)+4f(a+h)+2f(a+2h)+4f(a+3h)+\cdots\Bigr.}

\displaystyle{\Bigl.+2f(a+(n-2)h)+4f(a+(n-1)h)+f(b)\Bigr)}

where n is even and h=(b-a)/n. Sometimes this rule is written

\displaystyle{\int_a^bf(x)\,dx \approx \frac{b-a}{6n}\Bigl(f(a)+4f(a+h)+2f(a+2h)+4f(a+3h)+\cdots\Bigr.}

\displaystyle{\vspace{1cm}\Bigl.+2f(a+(2n-2)h)+4f(a+(2n-1)h)+f(b)\Bigr)}

where there is no restriction on n and h=(b-a)/2n. For a spirited criticism of this rule, both from practical and pedagogical perspectives, see here.

However, in a standard generic mathematics course, Simpson’s rule is as far as most students get with quadrature. The trouble is that the order of the weights (the coefficients of the function values):

1,4,2,4,2,4,...,2,4,1

is tricky to program. Either you have to run two loops, one for the 4′s and the other for the 2′s, or you have to do a little test at each stage to see whether you are at a 2 or 4 point, or you might invoke the modulus function. None of those ways are particularly straightforward, especially if, like many beginning students, you have little programming experience.

So here’s a super easy way. We start with end points a and b and an even integer n. We first create a list of all the values at which the function is to be computed:

xs=[a+kh \mbox{\; for\;} k=0,1,2,\ldots n]

where h=(b-a)/n, and then a list of the weights:

w=[3-(-1)^k \mbox{\; for\;} k=0,1,2,\ldots n].

This last list is [2,4,2,4,\ldots,2,4,2].

We then compute

\sum w_if(xs_i)

However! note that our list of weights starts and finishes with 2′s instead of 1′s. We correct for this by simply subtracting f(a) and f(b) from the sum. Our “easy” Simpson’s rule is then:

\displaystyle{\frac{b-a}{2n}\Bigl(\sum w_if(xs_i)-f(a)-f(b)\Bigr)}.

This is almost trivial to compute. For example, on a TI-nspire CAS calculator, we might have:

a:=0

b:=1

\mbox{Define\;}f(x)=e^{-x^2}

n:=16

h:=(b-a)/n

xs:=\mbox{seq}(a+k\cdot h,k,0,n)

w:=\mbox{seq}(3-(-1)^k,k,0,n)

(\mbox{sum}(w\cdot f(xs))-f(a)-f(b))\cdot (b-a)/n

The TI-nspire has the nice attribute that a function applied to a list will produce a list of all function values, and the product of two lists is a list containing all the products of corresponding elements. The Casio Classpad has the same functionality with lists, and the above commands can be used.

This approach of using 3-(-1)^k to obtaining the weights for Simpson’s rule seems not widely used, but it seems to me to be absurdly simple. I’ve only come across it in one other place.

Initial comparison of TI-nspire CAS and Casio Classpad 330 calculators

The TI-nspire CAS and the Classpad 330 are the top of the line CAS calculators, and are used in both secondary and tertiary education worldwide.  And each one has its passionate adherents and equally passionate detractors.  However, it’s hard to find a balanced comparison (would you expect to find the TI calculator favourably reviewed in a Casio forum?) TI have released a color version: the TI-nspire CX CAS; and Casio have announced a color version of the ClassPad, to be available in the second half of this year.

For the purposes of this comparison, I’ll use the current monochrome TI-nspire CAS, and the current ClassPad 330.  Here is what they look like:

Also this post will be entirely based on using the handheld calculators, not the computer software which is also available for each of them. And it will be by its very nature superficial; I hope to cover particular topics in more detail in later posts.

Interface

The TI-nspire has a screen size of 320×240 pixels; the ClassPad 160×240, exactly half the size of the TI-nspire. This means that the TI-nspire display is sharper. The TI-nspire has a menu driven system entirely driven by button presses; the ClassPad uses a touch screen controlled with a stylus. I find the TI-nspire much easier to use; I don’t particularly like jabbing the stylus at a screen with such poor resolution. It’s worth noting as a side issue that the stylus is easily lost; they seem to get looser in their housing with age. And many second hand ClassPads are advertised with “stylus missing”.

The ClassPad also defaults on readability. Here’s a couple of screenshots from Casio:

Note that at the top of the screen there is a menu bar, underneath of which is a “toolbar”. At the low resolution of the screen, I find the toolbar icons to be all but unreadable. Some of them are menu headers, which when selected produce a tools menu. This is a list of tiny icons, which to my eyes could mean anything. (Even with my reading glasses, and my eyesight, though not as sharp as once it was, is not that bad.) It may have been a cost issue: producing a touch screen cheaply, but I find the low resolution of the ClassPad a real hindrance to its use.

Commands on the TI-nspire are usually obtained with at most 4 keystrokes: a menu key, then at most 3 menu choices. Commands on the ClassPad, resolution notwithstanding, are also fairly easy to obtain, either with the menu bar, or from the onscreen keyboard. This image:

shows that Casio have in fact designed a very clever interface which makes good use of the limited space to obtain most commands and functions easily.

For comparison, here is a TI screenshot, showing a linear function and some values:

Note that the screen, though maybe cluttered for some, is still quite readable and sharp.

The TI-nspire uses a “Documents” system, and a document may have any number of different “pages”: calculation, geometry, lists & spreadsheet, data & statistics and so on. Variables may be transferred between pages, so that a variable, say a list created and named in a spreadsheet, can be opened in a calculator page. The ClassPad has “eActivities”; here is a description from a Casio page:

An eActivity is a new, exciting innovation. It is small packet of learning that can be written by anyone, even yourself! It is loaded into the calculator and a student is then free to work through it. It is normally a scripted learning experience that has all the functions of the calculator at its disposal. eActivities can be used to present students with assignments, learning opportunities or in fact to build your own simple software…

Mathematical strength

I think that the calculators are much of a muchness here. True, the ClassPad offers some goodies such as the Laplace and Fourier transforms, which might give it a slight edge. However, there are also a few anomalies. The TI-nspire provides LU factorization of a matrix M in the form of the matrices P,\,L,\,U, where P is a permutation matrix for which PM=LU; the ClassPad just returns L and U. This means that given a matrix M the ClassPad will return L and U matrices for which possibly LU\ne M. An example is the matrix

\displaystyle{M=\begin{bmatrix}0&4&3\\ 1&2&-1\\ 3&1&2\end{bmatrix}}

for which

\displaystyle{P=\begin{bmatrix}0&0&1\\ 1&0&0\\ 0&1&0\end{bmatrix},\,L=\begin{bmatrix}1&0&0\\ 0&1&0\\ \frac{1}{3}&\frac{5}{12}&1\end{bmatrix},\,U=\begin{bmatrix}3&1&2\\ 0&4&3\\ 0&0&-\frac{35}{12}\end{bmatrix}}

This may be a small worry, but surely it’s a simple thing and worth getting right?

I haven’t applied (as yet) a CAS test-suite to either calculator. You have to remember that in terms of raw computing power, these calculators are lightweights, and can only be expected to solve relatively small problems. But here’s one test I did yesterday. On the TI-nspire, I found the factorization of Cole’s number:

2^{67}-1=193707721 \cdot 761838257287

performed almost immediately. Impressive! Just to check this wasn’t a fluke, I obtained the following factorization:

24!-1=625793187653 \cdot 991459181683

in a little under three minutes. I tried the first factorization on the ClassPad and got nothing in a reasonable time (several minutes). So either the TI-nspire uses a better factorizing algorithm, or it has a more powerful processor.

This is an issue I’ll try exploring in more detail in later posts.

Variables and programming

The TI-nspire has three ways of assigning a value to a variable:

  • Define x=3
  • 3 \rightarrow x (the arrow is obtained by using the “sto” key)
  • x:=3 (The “:=” is obtained with two key presses: crtl, :=)

The ClassPad has two ways:

  • 3 \Rightarrow x
  • x:=3

without, however, any single key to obtain “:=”. And only the arrow method can be used in programs.

Both calculators use a version of Basic on the handhelds. Apparently the scripting language lua can be used with both, but I suspect only on the associated computer software, and not directly on the handhelds. And they both provide the full gamut of control statements (loops, tests etc). Both calculators distinguish between functions and programs, although in slightly different ways.

The ClassPad offers very nice sequence handling: you can set up a number of interlinked sequences, and set ‘em off. For example, you can enter

\displaystyle{\begin{array}{rl}a_{n+1}&=(4-b_n+c_n)/4\\ a_0&=1\end{array}}

\displaystyle{\begin{array}{rl}b_{n+1}&=(2-2\times a_n-c_n)/5\\ b_0&=1\end{array}}

\displaystyle{\begin{array}{rl}c_{n+1}&=(11-a_n)/3\\ c_0&=1\end{array}}

(which are the sort of equations which may be encountered in Jacobi’s iteration method for linear equations), in the Sequence page, and just run them. The TI-nspire has a “Lists & Spreadsheet” page, in which you can perform similar calculations, but not as elegantly as this.

As a mini-comparison, here are for-loops on each:

TI-nspire ClassPad
For i,1,10
x:=x+1
EndFor
For 1\Rightarrowi to 10
x+1\Rightarrowx
Next

Graphics

Both calculators support 2D graphing (they would hardly have a market if they didn’t). The ClassPad also supports 3D graphing, which is supposed to be a huge feather in its cap, but in fact I find this to be of limited use. Just as an experiment, I attempted to plot the monkey saddle surface z=x^3-3xy^2 and couldn’t make any sense of the result. Even with zooming, moving the graph around with the stylus, I couldn’t obtain any intuitive feel of the shape of the graph. Partly this is because there doesn’t seem to be any hidden line algorithm available. However, the drag and drop capability of the ClassPad, with the stylus, is one area which it shines. The TI-nspire’s touchpad, although it works, is clumsy in comparison.

Which is better?

I much prefer (for my purposes) the TI-nspire’s sharper screen and interface. However, when the new colour model fx-CP400 is released I will certainly reconsider. I have no loyalty to either brand. In the end, it comes down to what you and everybody else is using. My daughter is using a ClassPad 330 for her year 11 mathematics (its use is mandated by her school) and is very satisfied with it. I would imagine that even with the models discussed in this post (the monochrome TI-nspire already superseded, the ClassPad 330 about to be), either one would happily manage pretty much all school, and most college, mathematics.

Maxima on Android

Such a long long time since I last posted!  Very poor blogging etiquette.  Anyway, to break the drought, here’s news of the best thing to come my way in the last six months – fully featured Maxima running on Android (this screenshot from its website):

maxima-screen-shot1 maxima-screen-shot2

At a conference I attended late last year, sponsored by calculator companies, the Maple reps (both of whom I knew), claimed that calculators were an old-fashioned outmoded technology, especially now that with smartphones, ipods etc, you can have access to the Internet in your pocket (Wolfram Alpha, anybody)?  I still think there’s a lot to be said though for a device which you can use independently of a network, and in fact I believe most calculator users use very little of their functionality.  How many CAS calculator users, I wonder, program on or with their machines?

But now here’s a fully featured, mature CAS running on a portable device – the first such one, as far as I know.  There are in fact other CAS’s available for portable systems: PocketCAS, Mathomatic, Yacas (which I’ve written about before), no doubt others.  None of them, as far as I can tell, work particularly well, or are in any way complete.  It doesn’t take long before you run up against a wall of something they are unable to do.

So I see Maxima as being the first complete portable CAS – and I’m delighted to see it on Android.  It works fine on a smartphone, and is quite wonderful on a tablet.

Its home page is here.

The best Matlab alternative (2)

A year or so ago, I published a blog post called The best Matlab Alternative.  In that post, I discussed the merits of Scilab, Octave and Freemat, and decided that for me, Octave suited me best.  However, Scilab is probably the more full featured and powerful.

I’ve since discovered that there’s another contender, in some ways better than all of the above: Python.  My requirement for a “Matlab alternative” was that:

  1. It must install under Windows. This is absolutely necessary for me, at least, since most of my students use Windows. A very few use OSX, and hardly any use Linux.
  2. It must, as far as possible, be identical in use with Matlab. The commands should be the same, and simple scripts and functions should transfer between the two with no effort.
  3. It should have a reasonably nice interface, as similar as possible to Matlab’s own. A basic command line only interface running in a terminal won’t cut it with my students.

Suppose we drop condition 2, and require software which can do pretty much most of what Matlab can do, in a nice visual environment, we find Python coming out looking very good indeed.  Now, I have been a lover of Matlab, and its open source alternatives, for many years, and I do like the rapid prototyping possible with Matlab, as well as its ease for dealing with large data types.  And vectorization (where a single command is automatically applied to every element of a multi-dimensional array) provides some wonderfully elegant and simple code.  (It can also of course provide for dense obtuseness.)

However, I have also become aware of some criticisms of Matlab.  A major one is vendor lock-in: the code, the language, the IDE, is all owned by The Mathworks.  If they decide to radically change something, you have no alternative but keep paying the license fees to keep up.  Second is that the language has no object-oriented capabilities, at least in a natural way;(see comment below) that the language has no standard as such – it’s just whatever The Mathworks decide to provide.  Thirdly is the namespace: all functions exist as top-level objects, and all functions (“m-files”) in the Matlab path (which usually will include multiple directories) are available when you start up.  Fourth is that an m-file can only contain one function (and maybe some sub-functions), whose name must be the same name as the file.  This means that if you create some new functions – and most users do – you need to be sure that it doesn’t conflict with any other functions.  This problem is exacerbated if you include lots of toolboxes.  The Image Processing Toolbox alone adds 270 new functions to the Matlab namespace  and there are 38 other toolboxes available.  That’s a lot of functions.

One list of criticisms is available here; another (called “Abandon Matlab”) is here.  Note that the second site recommends (as well as Python), R, which the first site also pans.  There’s no accounting for tastes.

So, back to Python.

Python is a deservedly popular language, supported by a huge user base, tons of books, articles, websites, conferences, seminars, and so on.  It is fully open source, and it is managed by a non-profit organization called the Python Software Foundation.

Python has lots of libraries for scientific computing, starting with numpy (“numeric python”) and scipy (“scientific python”).   And there are others – such as scikits-image which extends scipy’s image processing capabilities (the Python Imaging Library does not seem to be currently in active development).  Many more are listed here.

What’s more, Python has lots of IDEs which mean that you can use it pretty much as you would use Matlab, Octave or Scilab.  For a nice visual IDE with all the trimmings, there are Eric IDE, Spyder, or PyDev, which is an Eclipse plugin, plus lots more, both open source and commercial.

For those of us who are satisfied with less, there’s iPython, an enhanced shell (which is what I like to use).  And there are again lots more, including editors and all sorts of other goodies for development, here.

Let’s finish with a tiny example: to create a 1500 x 1500 random matrix, invert it, and check the trace of the product of the matrix and its inverse.  I’ll do all this in the iPython shell, which I invoke from my linux command line with ipython.

Once in the iPython shell, we import the pylab modules; this includes numpy, scipy, and the plotting library matplotlib:

In[1]: from pylab import *

(In fact this can be done from the command line with iPython, but this command should work in any Python environment.)

Now to define the matrix, compute its inverse, and check the trace of their product:

In [2]: m = mat(rand(1500,1500))
In [3]: mi = linalg.inv(m)
In [4]: np.trace(m*mi)
Out[4]: 1500.0

And of course all this is nice and fast (on my linux laptop):

In [5]: %time m = mat(rand(1500,1500)); mi = linalg.inv(m); np.trace(m*mi)
CPU times: user 6.52 s, sys: 0.21 s, total: 6.73 s
Wall time: 3.78 s

Note: there is now a follow up to this post called (now this is original and clever): The best Matlab alternative (3).

Sarrus rule, and extensions to higher orders

Sarrus’ rule is a handy method of computing 3\times 3 determinants. As it is not immediately extendable to higher order determinants, I’m wary of teaching it to students, who don’t see why its use is restricted to 3\times 3. It is basically a short hand method of the standard formula for determinants:

\displaystyle{\mbox{det}(A) = \sum_{\pi\in S_n}\mathrm{sgn}(\pi)a_{1,\pi(1)}a_{2,\pi(2)}\cdots a_{n,\pi(n)}}.

This can be proved to be the unique function which is linear in each row, alternating in the rows (in other words, is zero if two rows are equal) and has the value 1 if A=I. So for a 3\times 3 determinant, we have

\displaystyle{\mbox{det}(A) = a_{11}a_{22}a_{33}+a_{12}a_{23}a_{31}+a_{13}a_{21}a_{32}}

       \displaystyle{-a_{11}a_{23}a_{32}-a_{12}a_{21}a_{33}-a_{13}a_{22}a_{31}.}

The six permutations, with their signs, are

\displaystyle{\begin{array}{ccc|c}\multicolumn{3}{c|}{\pi} &\mbox{sgn}\\ \hline 1&2&3&1\\ 2&3&1&1\\ 3&1&2&1\\ 1&3&2&-1\\ 2&1&3&-1\\ 3&2&1&-1\end{array}}

Note that all cyclic permutations of 1,2,3 have the same sign, and so do all cyclic permutations of 3,2,1.

This brings us to Sarrus’s rule:

The first two columns of the determinant are copied down to the right, and all diagonals multiplied. The blue diagonals are positive, and the red ones negative. In this case the forward diagonals are all positive, and the back diagonals negative. All the diagonal products, positive and negative, are added to produce the determinant. (By “negative” I mean that the value of the product is multiplied by -1.)

Note that in the 3\times 3 case all the six diagonals include all the six possible permutations of the column indices.

If we tried to do for a 4\times 4 determinant, we would run into trouble, as we would only have 8 possible diagonal products, and we need 24. Let us define a “Sarrus array” to be an n\times n array, with the first n-1 columns repeated, and the product of all diagonals computed, with regard to the sign of the permutation. So the image above is a 3\times 3 Sarrus array, and it completely determines the 3\times 3 determinant.

A 4\times 4 Sarrus array is

Note that because we have an even number of objects to permute (4), a cyclic permutation changes the sign. But we still only have 8 of the 24 permutations needed. To obtain the other 16, we need two more Sarrus arrays: one for which columns 3 and 4 of the original array are switched:

and another array for which columns 2 and 3 of the original array are switched:

These three arrays allow us to use a Sarrus-type rule to compute 4\times 4 determinants. (Warning: don’t try this at home, unless you have a responsible adult to help you.)

These rules don’t originate with me, of course; you can see the same rule here. I’m sure I’m the seven millionth person to have done this. However, the pictures here are all mine – created with TiKZ in LaTeX.

Higher orders…

Note that a Sarrus array is completely determined by the order of the first n columns. So for 3\times 3 determinants we need the single array [1, 2, 3], but for 4\times 4 determinants we need three arrays [1,2,3,4],[1,2,4,3],[1,3,2,4].

For higher orders, we need n! permutations, and a Sarrus array produces only 2n diagonals. So we’re going to need (n-1)!/2 arrays. To find them, we simply make a list of all permutations, and we choose the first permutations from it, remove all cyclic permutations and cyclic permutations of its reverse, and continue.

This can be encapsulated in this little Sage program:

def shift(x): return x[1:]+[x[0]]

def sarrus(n):
    L = []
    PL = [[p[i] for i in range(n)] for p in Permutations(n)]
    for i in range((factorial(n-1))/2):
        p = PL[0]
        L += [p]
        pr = p[::-1]
        for j in range(n):
            PL.remove(p)
            PL.remove(pr)
            p = shift(p)
            pr = shift(pr)
    return L;

Then:

sage: sarrus(4)

[[1, 2, 3, 4], [1, 2, 4, 3], [1, 3, 2, 4]]

sage: s5 = sarrus(5)
sage: for i in s5: print i

[1, 2, 3, 4, 5]
[1, 2, 3, 5, 4]
[1, 2, 4, 3, 5]
[1, 2, 4, 5, 3]
[1, 2, 5, 3, 4]
[1, 2, 5, 4, 3]
[1, 3, 2, 4, 5]
[1, 3, 2, 5, 4]
[1, 3, 4, 2, 5]
[1, 3, 5, 2, 4]
[1, 4, 2, 3, 5]
[1, 4, 3, 2, 5]

Note that for five elements a cyclic permutation preserves sign, and a reversal also preserves sign. So each array is either positive or negative:

sage: for i in s5: print i,Permutation(i).is_even()

[1, 2, 3, 4, 5] True
[1, 2, 3, 5, 4] False
[1, 2, 4, 3, 5] False
[1, 2, 4, 5, 3] True
[1, 2, 5, 3, 4] True
[1, 2, 5, 4, 3] False
[1, 3, 2, 4, 5] False
[1, 3, 2, 5, 4] True
[1, 3, 4, 2, 5] True
[1, 3, 5, 2, 4] False
[1, 4, 2, 3, 5] True
[1, 4, 3, 2, 5] False

It’s all rather silly, really…

Computing determinants: elimination and condensation

Elimination

Laplace expansion along one row is the first method we might learn, but the second is of course elimination: using row operations to put the determinant into triangular form, from whence the computation consists of multiplying the elements on the diagonal. This is of course much more efficient than Laplace expansion, because it is of polynomial order, in fact of order n^3.

Without going into all the sordid detail, here is a brief precis of the method for an n\times n determinant |A|:

Step 1: Rearrange the rows if necessary so that a non-zero element is at position (1,1). (If no non-zero element exists, then the value of the determinant will be zero, and we have no need of any further work.) For each row i with 2\le i<n replace row R_i with

\displaystyle{R_i-\frac{a_{i1}}{a_{11}}R_1}.

This will have the effect of making all elements in the first column (except of the first) zero.

Step k. Rearrange the rows R_{k+1},R_{k+2},\ldots R_n if necessary so that a non-zero element is at position (k+1,k+1). (As before if no non-zero element exists, then the value of the determinant will be zero, and we have no need of any further work.) For each row i with k+1\le i<n replace row R_i with

\displaystyle{R_i-\frac{a_{ik}}{a_{kk}}R_k}.

This will have the effect of making all elements in the k-th column beneath the main diagonal zero. Note that for any row rearrangement we must consider the sign of the determinant: an odd number of row swaps (as we know) will change its sign.

At step k, the number of elements which will be altered is (n-k)^2. So the total number of alterations is

\displaystyle{\sum_{k=1}^{n-1}(n-k)^2 = \frac{n(n-1)(2n-1)}{6}}.

Each “alteration” consists of a subtraction, a multiplication, and a division. Clearly the total number of operations is O(n^3). And this is the method we teach our students and program into our computers (with a little extra: finding the largest possible value to put on the main diagonal, so as to minimize rounding errors.)

Chio condensation

For most people, elimination is all that’s needed: a fast, efficient method for computing determinants of arbitrary size. However, it is not necessarily the best method for every case. Suppose for example, we wish to compute the determinant of

\displaystyle{A=\left[\begin{array}{cccc}17& 25& 30& 14\\ 26& 19& 11& 16\\ 15 &17& 26& 23\\ 11& 26& 11& 27\end{array}\right]}.

Since all elements of A are integers, its determinant will also be an integer. However, introducing fractions with elimination will almost certainly result in loss of precision, and the final value will be at best a numerical approximation. So what we need here is a method which will allow us to compute a determinant without all the division. The trick is to multiply all lower rows by the current diagonal element before the elimination step. This will change the value of the determinant, so we’ll adjust for that later.

Here’s a 4\times 4 determinant:

\displaystyle{D = \left|\begin{array}{cccc}a&b&c&d\\ e&f&g&h\\ i&j&k&l\\ m&n&o&p\end{array}\right|}

Multiply all rows (except the first), by a:

\displaystyle{a^3D = \left|\begin{array}{cccc}a&b&c&d\\ ae&af&ag&ah\\ ai&aj&ak&al\\ am&an&ao&ap\end{array}\right|}

Now for the elimination:

R_2 \leftarrow R_2-eR_1

R_3 \leftarrow R_3-iR_1

R_4 \leftarrow R_4-mR_1

This produces:

\displaystyle{a^3D = \left|\begin{array}{cccc}a&b&c&d\\ 0&af-eb&ag-ec&ah-ed\\ 0&aj-ib&ak-ic&al-id\\ 0&an-mb&ao-mc&ap-md\end{array}\right|}

Notice that all elements (from row and column 2 onwards) have the form of 2\times 2 determinants. In fact,we can write this as:

\displaystyle{a^3D = a\left|\begin{array}{ccc} \displaystyle{\left|\begin{array}{cc}a&b\\ e&f\end{array}\right|}  &\displaystyle{\left|\begin{array}{cc}a&c\\ e&g\end{array}\right|} & \displaystyle{\left|\begin{array}{cc}a&d\\ e&h\end{array}\right|}\\[5mm]  \displaystyle{\left|\begin{array}{cc}a&b\\ i&j\end{array}\right|}  &\displaystyle{\left|\begin{array}{cc}a&c\\ i&k\end{array}\right|} & \displaystyle{\left|\begin{array}{cc}a&d\\ i&l\end{array}\right|}\\[5mm]  \displaystyle{\left|\begin{array}{cc}a&b\\ m&n\end{array}\right|}  &\displaystyle{\left|\begin{array}{cc}a&c\\ m&o\end{array}\right|} & \displaystyle{\left|\begin{array}{cc}a&d\\ m&p\end{array}\right|}\\\end{array}\right|}

or as

\displaystyle{D = \frac{1}{a^2}\left|\begin{array}{ccc} \displaystyle{\left|\begin{array}{cc}a&b\\ e&f\end{array}\right|}  &\displaystyle{\left|\begin{array}{cc}a&c\\ e&g\end{array}\right|} & \displaystyle{\left|\begin{array}{cc}a&d\\ e&h\end{array}\right|}\\[5mm]  \displaystyle{\left|\begin{array}{cc}a&b\\ i&j\end{array}\right|}  &\displaystyle{\left|\begin{array}{cc}a&c\\ i&k\end{array}\right|} & \displaystyle{\left|\begin{array}{cc}a&d\\ i&l\end{array}\right|}\\[5mm]  \displaystyle{\left|\begin{array}{cc}a&b\\ m&n\end{array}\right|}  &\displaystyle{\left|\begin{array}{cc}a&c\\ m&o\end{array}\right|} & \displaystyle{\left|\begin{array}{cc}a&d\\ m&p\end{array}\right|}\\\end{array}\right|}

This is an example of condensation, where an n\times n determinant is reduced to an (n-1)\times(n-1) determinant. And this can be expressed more formally as follows:

If A is an n\times n matrix, then |A| = (1/a_{11}^{n-2})|B|, where B is the (n-1)\times(n-1) matrix for which

\displaystyle{b_{ij} = \left|\begin{array}{cc}a_{11}&a_{1,j+1}\\ a_{i+1,1}&a_{i+1,j+1}\end{array}\right|}

Note that since, by construction, |B| is divisible by a_{11}^{n-1}, then if a_{ij} are integers, all operations will produce integer outputs.

This particular condensation can be traced back to an 1853 article by somebody called F. Chio, about whom I can find nothing. However, according to Howard Eves, this approach can be traced back to an 1949 paper by Hermite.

Note that the k-th condensation requires (n-k)^2 2\times 2 determinants, so the number of operations is the same order as elimination, except that the divisions have been vastly reduced. And all that’s required to make this condensation work is to ensure a non-zero value in the top left.

Dodgson condensation

For some reason, this is much better known that Chio’s. Probably because Charles Lutwidge Dodgson is himself so much more famous. Apparently Queen Victoria, having read the Alice books, and enjoyed them, requested more of the author’s books, and was astonished to be presented with “ELEMENTARY TREATISE ON DETERMINANTS” which you can read online here. To my mind, Dodgson condensation is clumsier than Chio’s; on the other hand, it does have the cachet of Dodgson’s name attached to it. And happily you can read Dodgson’s original description here.

The method is based around what is now called Desnanot-Jacobi identity. Let A be an n\times n matrix. From A we extract five matrices: A_{\mathrm{INNER}}, being the (n-2)\times(n-2) matrix consisting of all rows and columns from 2 to n-1, and the upper right, upper left, bottom right and bottom left (n-1)\times (n-1) submatrices. That is, A_{\mathrm{UR}}, A_{\mathrm{UL}}, A_{\mathrm{BR}}, A_{\mathrm{BL}} have top left corners a_{11}, a_{12}, a_{21}, a_{22} respectively. Then the identity states that

|A||A_{\mathrm{INNER}}| = |A_{\mathrm{UL}}||A_{\mathrm{BR}}|-|A_{\mathrm{UR}}||A_{\mathrm{BL}}|.

You can read a charmingly witty proof of this identity here.

The actual condensation works like this. Suppose A[1] is the n\times n matrix whose determinant we wish to compute. Define A[0] to be the (n+1)\times (n+1) matrix consisting entirely of ones. Then for k = 2,3,\ldots,n-1 define A[k] by

\displaystyle{a[k]_{ij} = \frac{1}{a[k-2]_{i+1,j+1}}\left|\begin{array}{cc}a[k-1]_{ij}&a[k-1]_{i,j+1}\\ a[k-1]_{i+1,j}&a[k-1]_{i+1,j+1}\end{array}\right|}

In other words, A[k] is constructed using all the determinants of 2\times 2 submatrices of A[k-1], and each determinant is divided by the corresponding term from the inner matrix of A[k-2].

So for example if

\displaystyle{A[k] = \left[\begin{array}{cccc}a&b&c&d\\ e&f&g&h\\ i&j&k&l\\ m&n&o&p\end{array}\right]}

then we first produce the matrix

\displaystyle{\left[\begin{array}{ccc} \displaystyle{\left|\begin{array}{cc}a&b\\ e&f\end{array}\right|}  &\displaystyle{\left|\begin{array}{cc}b&c\\ f&g\end{array}\right|} & \displaystyle{\left|\begin{array}{cc}c&d\\ g&h\end{array}\right|}\\[5mm]  \displaystyle{\left|\begin{array}{cc}e&f\\ i&j\end{array}\right|}  &\displaystyle{\left|\begin{array}{cc}f&g\\ j&k\end{array}\right|} & \displaystyle{\left|\begin{array}{cc}g&h\\ k&l\end{array}\right|}\\[5mm]  \displaystyle{\left|\begin{array}{cc}i&j\\ m&n\end{array}\right|}  &\displaystyle{\left|\begin{array}{cc}j&k\\ n&o\end{array}\right|} & \displaystyle{\left|\begin{array}{cc}k&l\\ o&p\end{array}\right|}\\\end{array}\right]}

Then A[k+1] is obtained by dividing each element above by the inner elements of A[k-1].

The difficulty of applying this method is dealing with zeros in the inner elements of A[k-1]. Dodgson suggests reordering the rows of the matrix to remove all such zeros, and then performing any such recalculations as are required. Anyway, you can see some examples in Dodgson’s original paper, or here.

It seems curious that Dodgson – not known for scholarly reading of other mathematician’s work – should know and apply the Desnanot-Jacobi identity. If you or your library has access to it, a fascinating account of this method and its history is given in “‘Shutting up like a telescope’: Lewis Carroll’s ‘Curious’ Condensation Method for Evaluating Determinants”, by Adrian Rice and Eve Torrence, in The College Mathematics Journal, Vol. 38, No. 2, March 2007, pp 85 – 95.

Computing determinants: expansions

I’ve spoken vehemently before about my dislike of determinants, and in a straight forward computational lower level marices-and-vectors subject I’ll have nothing to do with them. However, I am currently teaching engineers, who seem enamoured of determinants. They love things like Cramer’s rule (hideously inefficient as it is), and it must be admitted that determinants do make their presence felt in other places: Jacobians for multiple changes of variables, Wronskians, etc. So I’ve been teaching determinants and how to compute them by hand.

I assumed that having been teaching determinants for years off and on, that I’d come across most methods for their computation. But no!

So this post gathers a few methods together. I’ll be brief, plenty of sordid details can be found on each method online.

Laplace expansion
This is the one we learn at our mothers’ knees, so to speak, as mathematical toddlers. It can be defined recursively as follows

|a| = a

For an n\times n matrix, pick any row or column. For simplicity’s sake, suppose we pick row i, with elements a_{i1},a_{12},\ldots,a_{in}. Then

\displaystyle{|A| = \sum_{j=1}^n(-1)^{i+j}a_{ij}M_{ij}}

where M_{ij} is the minor of a_{ij}, and which is defined to be the determinants of what’s left when every element in the same row and column of a_{ij} is removed.

It’s instructive to determine the cost of this calculation. Suppose M_n, A_n are the numbers of multiplications and addtions respective;y, required to compute an n\times n determinant. We have M_1=A_1=0 and M_2 = 2,A_2=1. (We shall treat subtraction as having the same cost as addition.)

By the expansion rule above, an n\times n determinant requires the computation of n different n-1\times n-1 determinants, with n extra multiplications and n-1 additions. Thus:

M_n = n(M_{n-1}+1)

A_n = n(A_{n-1}+1)-1.

The following table shows how these values increase:

\displaystyle{\begin{array}{c|c|c}n&M_n&A_n\\ \hline 2&2&1\\ 3&9&5\\ 4&40&23\\ 5&205&119\\ 6 &1236 &719\\ 7 &8659& 5039\\ 8 &69280 &40319\\ 9 &623529 &362879\\ 10 &6235300 &3628799\end{array}}

It’s clear from the definition, and rammed home by this table, that the work required to compute a determinant by this method is of order n!. Thus, very inefficient.

General Laplace expansion
Some authors call this the Laplace expansion, and the version above then is a just a particular case of it. Curiously enough I’d never come across it until quite recently. The idea is quite simple; the main difficulty is finding good notation to describe it. Right, here goes.

Suppose we have an n\times n matrix A, and suppose we pick k rows, with indices

R = 0\le r_1< r_2<\ldots< r_k\le n

and also pick k columns, with indices

C = 0\le c_1< c_2<\ldots< c_k\le n.

So R,C are simple lists of length k containing strictly increasing row and column indices. We then define the matrix

A[RC]

to be the k\times k matrix whose ij-th element is A[r_i][c_j]. That is, A[RC] consists of all the "intersections" of rows in R with columns in C. We also define \overline{A}[RC] to be the (n-k)\times(n-k) matrix obtained when all the rows in R and columns in C are removed.

For example, suppose

\displaystyle{A = \left[\begin{array}{ccccc}1&2&3&4&5\\ 6&7&8&9&10\\ 11&12&13&14&15\\ 16&17&18&19&20\\ 21&22&23&24&25\end{array}\right]}

Let R = 1,4 and C = 2, 3. Then

\displaystyle{A[RC]=\left[\begin{array}{cc}2&3\\ 17&18\end{array}\right]}

and

\displaystyle{\overline{A}[RC]=\left[\begin{array}{ccc}6&9&10\\ 11&14&15\\ 21&24&25\end{array}\right]}.

Given all this, the generalized version of the Laplace expansion theorem claims that

\displaystyle{\mbox{det}(A) = \sum (-1)^\sigma\mbox{det}(A[RC])\mbox{det}(\overline{A}[RC])}

where the sum is taken over all possible {n\choose k} sets of k columns, and \sigma is the sum of all the row and column indices. Clearly the original Laplace expansion is just this with k=1. For some nice examples, with pictures, see this elegant exposition by David Eberly.

But is this generalized method any better than simply using k=1? Well, suppose that the least possible number of multiplications and additions to obtain an n\times n determinant using any Laplace expansion are M_n, A_n respectively.

Suppose we perform the generalized expansion for an n\times n determinant with a particular 1\le k<n. There will be {n\choose k} k\times k determinants to evaluate, and the same number of (n-k)\times(n-k) determinants. These all need to be multiplied, and the products added. This means that

\displaystyle{(M_n,A_n) = {n\choose k}\large((M_k,A_k)+(M_{n-k},A_{n-k})+\left({n\choose k},{n\choose k}-1\right)}

More neatly:

\displaystyle{M_n = {n\choose k}(M_n+M_{n-k}+1)}

\displaystyle{A_n = {n\choose k}(A_n+A_{n-k}+1)-1}

By symmetry, these values will be the same for k and n-k. So the smallest value will be obtained by computing these values for all k = 1,2,\ldots,\lfloor n/2\rfloor and choosing the value which produces the smallest values for M_n (since multiplication is more computationally expensive than addition, we must minimize the number of multiplications).

This gives the following table:

\displaystyle{\begin{array}{c|c|c}n&M_n&A_n\\ \hline 1&0&0\\2&2&1\\ 3&9&5\\ 4&30&17\\ 5&120&69\\6&380&219\\ 7&1400&804\\ 8&4270&2449\\ 9&19026&10961\\ 10&60732&35027\end{array}}

We can see that these values are smaller than those in the first table above, but they are still not sub-exponential (at least, I don’t think they are.)

I think I’ll split this into multiple posts. That’s enough for one.