# Environment for creative processing of text and numerical data ## Seppo Mustonen (9 July, 2007): On the swapping method

Solution of Survo puzzles by using the products of marginal sums

```It has been seen that in practice Survo puzzles (without a real
solver program) are treated in surprisingly different means.
Most often the solution is found by pure logical reasoning but
sometimes computational support offered by Survo (COMB operation,
The standard procedure is to fill empty cells in the table gradually so
that in each stage (when reasoning is optimal) the selected number
is the only possible one. However, many people find the solution
at least partially by trial and error.

Here a different approach is introduced. It is based on some previously
presented ideas and it transforms the task into a form resembling
tackling of typical chess problems.
By this 'swapping method' many hard Survo puzzles may be solved
in a few steps very easily, but on the other hand there are many
puzzles where the method is inefficient. Another weakness is that by
this method the uniqueness of the solution cannot be proved.

The swapping method is related to the main idea of my original
solver program where

the table (originally filled by numbers in random order) is
improved by swapping numbers step by step so that at the end
the computed sums become identical to the given ones.

and to the common observation that

small numbers are located in crossings of small sums and
large numbers in crossings of large ones

or more precisely to remarks made e.g. by Joonas Kauppinen and
Markku Verkasalo that

the products of the marginal sums crudely indicate the positions
of the correct numbers in the final solution.

Usage of those products in this way has a natural statistical basis.
If the table of the Survo puzzle is interpreted as a contingency
table computed for two independent variables, the expected frequency
in any cell is the product of marginal sums divided by the sample
size. Thus those products as such are proportional to expected
frequencies.

By combining the principles given above, many of the hard (especially
open) Survo puzzles can be solved easily according to the style of
"Mate-in-N-move" problems in chess.

Let us see how it happens in an open 4x4 problem I presented and solved
in my expository paper (on pages 17-21)

*   *   *   *  14
*   *   *   *  28
*   *   *   *  44
*   *   *   *  50
16  33  40  47

The table of the products of the marginal sums is then

224  462  560  658  14
448  924 1120 1316  28
704 1452 1760 2068  44
800 1650 2000 2350  50
16   33   40   47

and when the original table is filled by numbers 1-16 according to
sizes of these products, the following preliminary solution is obtained:

1   3   4   5  14
2   8   9  10  28
6  11  13  15  44
7  12  14  16  50
16  33  40  47

It is not a correct solution. For example, the sum in the first row
is 1+3+4+5=13 and not 14, but all the sums are at least close to the
true ones as one can see in the following table.

Sum  OK   D
1   3   4   5  13  14  -1
2   8   9  10  29  28   1
6  11  13  15  45  44   1
7  12  14  16  49  50  -1
Sum  16  34  40  46
OK   16  33  40  47
D     0   1   0  -1         W=6

Here row/column 'Sum' tell the computed sums, 'OK' the correct sums,
and 'D' their differences (errors) D=Sum-OK. The sum of absolute values
of errors is W=6.
The task thereafter is trying to reduce the error W by swapping
two numbers at a time. A swap (5,6) seems now to be good since it leads
to a better setup

Sum  OK   D
1   3   4   6  14  14   0
2   8   9  10  29  28   1
5  11  13  15  44  44   0
7  12  14  16  49  50  -1
Sum  15  34  40  47
OK   16  33  40  47
D    -1   1   0   0         W=4

having a smaller total error W=4.
Then it is really easy to see that a swap (7,8) leads to the correct
solution

Sum  OK   D
1   3   4   6  14  14   0
2   7   9  10  28  28   0
5  11  13  15  44  44   0
8  12  14  16  50  50   0
Sum  16  33  40  47
OK   16  33  40  47
D     0   0   0   0         W=0

Thus the problem is solved by two steps and the solution seems to be
much shorter than my original solution. However, in the original
solution also the uniqueness was proved.

In most problems, more swaps are needed but often the solution can
still be found without difficulties.

For example, a harder Survo puzzle presented in the same paper
(on page 24)

A  B  C  D
1  *  *  *  * 51
2  *  *  *  * 36
3  *  *  *  * 32
4  *  *  *  * 17
51 42 26 17

is solvable by the swapping method in five simple steps.
The solution can be seen as a Flash demo.

In order to find out how the swapping method works generally
in all 5327 uniquely solvable, open 4x4 Survo puzzles, I have
computed (by a special program) the shortest solution for them each.

The numbers of swaps are distributed according to the table

Swaps         0     1     2     3     4     5     6     7     8     9
frequency    85   262   522   977  1271  1222   696   255    36     1
cumulative   85   347   869  1846  3117  4339  5035  5290  5326  5327
%           1.6   6.5  16.3  34.7  58.5  81.5  94.5  99.3       100.0

The products of sums give the right solution immediately in 85 cases
and 81.5 per cent of these puzzles can be solved by less than 6 swaps.

I have tried to solve rather many puzzles of this type by the
swapping method manually. For creating the original setup defined
by the products of sums and for keeping record of swaps and their
influence in the setup, a special sucro /SP_SWAP has been made.
According to my experience, the solution is found rather easily
at least when the number of swaps needed does not exceed 5.

However, when starting to solve a problem, one cannot know in
advance the number of swaps needed. Then the original W value
predicts roughly this number. W varies between 0 and 22 (in these
4x4 cases) and it is related to the number of swaps as follows:

W       swaps<6   Frequencies of W values
0-4      100.0 %       467
6        99.5 %       417
8        99.0 %       609
10        93.6 %       879
12        85.8 %       896
14        70.6 %       929
16        56.6 %       681
18        49.6 %       312
20        46.8 %       126
22        54.5 %        11

all       81.5 %      5327

Thus, if the original sum of errors is 10 or less, this kind of puzzle
is solvable by less than 6 swaps in 93.6 per cent of cases.

The mean and the standard deviation of the number of swaps are
about 4 and 1.6, respectively. If the swapping procedure is
started from a randomly initiated table, the corresponding statistics
are 12.6 and 1.3 which numbers are calculated here.
So the mean is much higher.

The success of the swapping method (at least in many 4x4 cases)
is largely based also on the fact that if chi-square values are computed
for all these 5327 tables, they are located in the interval (0.25,8.25).
So even the greatest of them is smaller than the degrees of freedom 9
which is the expected value for independent variables. Thus, in these
tables "the independence of variables" is exaggerated and therefore
the products of the marginal sums predict well the correct solution.

It is typical when using the swapping method that in each step the error
sum W decreases and then a smart solver sees the best choices
immediately. Sometimes, however, the situation is more complicated.
For example, the starting value of W may be so small that this strategy
will not work. Then one must make "sacrifices" (as in chess problems)
and allow swaps where W grows temporarily. In such cases also other
kind of logical reasoning must be empoyed as in this example.

* * *

For simplifying usage of the swapping technique, I have made a sucro
/SP_SWAP which at first initiates the table according to the products
of sums and lets thereafter the user repeatedly to swap numbers
by using the mouse and X and Y keys. After each swap the sucro updates
the setup and displays a list of swaps used so far. It is also
possible to solve non-open Survo puzzles having some fixed numbers
in the table. In these cases the initialization program (a small
Survo module SPGUESS) used by /SP-SWAP employs a generalized
products of sums technique. In these cases the numbers given readily
in the table are displayed with a blue background and then the player
knows to avoid swapping those "fixed points".

/SP_SWAP requires the complete SURVO MM version.

A special Java applet is also available
for solving randomly created Survo puzzles by the swapping technique.

* * *

When using /SP_SWAP the Survo puzzle must be saved as a Survo matrix
file. For example, the problem

*   *   *   *  14
*   *   *   *  28
*   *   *   *  44
*   *   *   *  50
16  33  40  47

is saved in the form

MAT SAVE AS A
0   0   0   0  14
0   0   0   0  28
0   0   0   0  44
0   0   0   0  50
16  33  40  47   0

and the sucro is activated by the command
/SP_SWAP A

Here are some Survo puzzles which can be solved by the swapping method.

Problem S0: (no swaps needed)
MAT SAVE AS A
0   0   0   0  15
0   0   0   0  32
0   0   0   0  41
0   0   0   0  48
14  33  38  51   0

Problem S1: (three swaps needed)
MAT SAVE AS A
0   0   0   0  48
0   0   0   0  43
0   0   0   0  34
0   0   0   0  11
47  41  30  18   0

Problem S2: (four swaps needed)
MAT SAVE AS A
0   0   0   0  55
0   0   0   0  44
0   0   0   0  23
0   0   0   0  14
45  39  29  23   0

Problem S3:
MAT SAVE AS A
0   0   0   0   0  27
0   0   0   0   0  37
0   0   0   0   0  56
9  17  24  31  39   0

Problem S4:
MAT SAVE AS A
0   0   0   0  57
0   0   0   0  41
0   0   0   0  27
0   0   0   0  11
44  36  31  25   0

Problem S5:
MAT SAVE AS A
0   0   0   0  56
0   0   0   0  38
0   0   0   0  31
0   0   0   0  11
45  37  32  22   0

Distribution of swaps for a random permutation

It is not reasonable to find this distribution by computing the
number of swaps needed separately for each permutation since
in the case of 4x4 Survo puzzles there are fact(16)=20922789888000
different permutations. Although one could calculate a million
values in a second, the entire process would last about 240 days.

It is now shown how this distribution can be found exactly and very
quickly by using the matrix and polynomial interpreter of Survo.
It is a good mathematical excercise to explain, why these calculations
give the correct result.

Let's study the product (x-1)*(x-2)*(x-3)*...*(x-15) and expand it to
a polynomial s0+s1*x+s2*x^2+...+s15*x^15.
Then the coefficients s0,s1,s2,s3,...,s15 are Stirling numbers of the
first kind and their absolute values divided by 16! give the proba-
bilities of the sought distribution in the reverse order.

The essential calculations are done by Survo as follows.

The coefficients of the binomials of in the product are saved as
vectors (it would be easy to make a short sucro for the general case):

MAT SAVE AS P
-1
1

MAT SAVE AS P2
-2
1

MAT SAVE AS P3
-3
1

the same for values 4-14

MAT SAVE AS P15
-15
1

The product of binomials:
POL P=P*P2
POL P=P*P3
POL P=P*P4
POL P=P*P5
POL P=P*P6
POL P=P*P7
POL P=P*P8
POL P=P*P9
POL P=P*P10
POL P=P*P11
POL P=P*P12
POL P=P*P13
POL P=P*P14
POL P=P*P15
MAT TRANSFORM P BY abs(X#)

Now the vector P gives the absolute values of coefficients
s0,s1,...,s15. These values are listed by MAT LOAD, their sum
(equal to 16!) is checked by the touch mode computation of Survo
and by dividing by this factorial we get the probabilities

MATRIX P
Polynom
///                real   probability         number ow swaps
0       1307674368000   0.06250000000000    15
1       4339163001600   0.20738931207681    14
2       6165817614720   0.29469385525189    13
3       5056995703824   0.24169796336407    12
4       2706813345600   0.12937153028299    11
5       1009672107080   0.04825704948933    10
6        272803210680   0.01303856761648     9
7         54631129553   0.00261108245341     8
8          8207628000   0.00039228171979     7
9           928095740   0.00004435812552     6
10            78558480   0.00000375468474     5
11             4899622   0.00000023417632     4
12              218400   0.00000001043838     3
13                6580   0.00000000031449     2
14                 120   0.00000000000574     1
15                   1   0.00000000000005     0
20922789888000

Again by using the touch mode it is easy to compute the mean and
the standard deviation
m=12.619271006771 (mean)
S2=161.04238320212  (weighted sum of squares)
s=sqrt(S2-m^2)  s=1.3402919308079 (std.dev.)

A problem requiring a sacrifice

*   *   *   *  49   MD=1700
*   *   *   *  39
*   *   *   *  33
*   *   *   *  15
49  43  29  15

The initial setup of the swapping method is

C1  C2  C3  C4 Sum  OK   D
R1        16  15  11   7  49  49   0
R2        14  13   9   4  40  39   1
R3        12  10   8   3  33  33   0
R4         6   5   2   1  14  15  -1
Sum       48  43  30  15
OK  49  43  29  15     W=4
D  -1   0   1   0

W is only 4, but the puzzle cannot be solved directly by one or two
steps.
Since both minimal sums are equal to 15, it is worthwhile to study
their possible partitions.

COMB P,CUR+1 / P=PARTITIONS,15,4 DISTINCT=1
Partitions 4 of 15: N[P]=6
1 2 3 9
1 2 4 8
1 2 5 7
1 3 4 7
1 3 5 6
2 3 4 6

Then the only permitted combinations of these partitions are
8 4 2 1 - 6 5 3 1  (1 as the common element)
7 5 1 2 - 6 4 3 2  (2 as the common element)
and the first pair is closer to the initial setup.
Therefore it is reasonable to start by swaps (2,3) and (7,8)
and getting correct values for the sums of last row and column,
but W grows to 6 (sacrifice!)

C1  C2  C3  C4 Sum  OK   D
R1        16  15  11   8  50  49   1
R2        14  13   9   4  40  39   1
R3        12  10   7   2  31  33  -2
R4         6   5   3   1  15  15   0
Sum       48  43  30  15
OK  49  43  29  15     W=6
D  -1   0   1   0

Now it is easy to detect that swaps (12,13) and (10,11)

C1  C2  C3  C4 Sum  OK   D
R1        16  15  10   8  49  49   0
R2        14  12   9   4  39  39   0
R3        13  11   7   2  33  33   0
R4         6   5   3   1  15  15   0
Sum       49  43  29  15
OK  49  43  29  15     W=0
D   0   0   0   0

and the problem has been solved by 4 swaps.
```