Seppo Mustonen : Programming Survo in C

3. Example of a SURVO 84C module

The idea and practice of making SURVO 84C modules is first illustrated by an example. To save space and to highlight the main principles, we shall describe coding of a simple module for calculating weighted means from statistical data.
     Usually it is good to start by making a synopsis from the user's point of view and imagine how the things should look if we already had the new operation. In this case we could type following text in the edit field:

13  1 SURVO 84C EDITOR Wed Feb 15 11:46:19 1989         D:\C\PROG\ 100 100 0
 1 *SAVE TEST1
 2 *
 3 *Here is our data set:
 4 *DATA TEST
 5 *Name     Sex   Test1   Test2   Test3
 6 *Karen     F     1.45    3.46     5
 7 *Charles   M     3.22    2.43     3
 8 *Anthony   M     5.00    3.27     2
 9 *Lisa      F    -0.76    4.03     3
10 *Mike      M     1.37    1.88     3
11 *William   M     4.65    -        2
12 *Ann       F     2.16    4.98     2
13 *
14 *MASK=--AAW   / to indicate selection of variables (columns)
15 *CASES=Sex:M  / to indicate selection of observations (lines)
16 *
17 *MEAN TEST,19_
18 *
19 * Means of variables in TEST N=4 Weight=Test3
20 * Variable     Mean     N(missing)
21 * Test1     3.307000       0
22 * Test2     2.433750       1
23 *

Here we have a small application where the data set is on edit lines 4-12, the MEAN operation on line 17 and results (which we hope to receive after activation of the MEAN line) on lines 19-22.

We assume that the MEAN operation has the following syntax:

MEAN <SURVO_84C_data>,<first_line_for_the_results> 

To select variables and observations, we have used two extra specifications (on lines 14-15). There MASK=--AAW selects only columns #3 and #4 (Test1,Test2) for the analysis and column #5 (Test3) is used as a weight variable. CASES=Sex:M indicates that only observations with Sex=M are selected.

We shall see that there will be still more options available if the MEAN module is written according to the standards of SURVO 84C, and all this is achieved with a minimal effort by using ready-made tools of the SURVO 84C libraries.

It should also be noted that the structure of more complicated modules does not differ from that of this example.

The !MEAN module has only one compiland and its main function is listed below in several parts. The line numbers have been added for easier reference.

  1 /* !mean.c 21.2.1986/SM (19.3.1989)
  2 */
  3
  4 #include <stdio.h> 
  5 #include <stdlib.h> 
  6 #include <conio.h> 
  7 #include <malloc.h> 
  8 #include "survo.h"
  9 #include "survoext.h"
 10 #include "survodat.h"
 11
 12 SURVO_DATA d;
 13 double *sum;       /* sums of active variables */
 14 long   *f;         /* frequencies */
 15 double *w;         /* sums of weigths */
 16
 17 long n;
 18 int weight_variable;
 19 int results_line;
 20
 21 main(argc,argv)
 22 int argc; char *argv[];
 23         {
 24         int i;
 25
 26         if (argc==1)
 27             {
 28             printf("This program can be used as a SURVO 84C module only.");
 29             return;
 30             }
 31         s_init(argv[1]);
 32         if (g<2)
 33             {
 34             init_remarks();
 35             rem_pr("MEAN <data>,<output_line>        / S.Mustonen 4.3.1989");
 36             rem_pr("computes means of active variables. Cases can be limited");
 37             rem_pr("by IND and CASES specifications. The observations can be");
 38             rem_pr("weighted by a variable activated by 'W'.");
 39             wait_remarks(2);
 40             return;
 41             }
 42         results_line=0;
 43         if (g>2)
 44             {
 45             results_line=edline2(word[2],1,1);
 46             if (results_line==0) return;
 47             }
 48         i=data_open(word[1],&d); if (i<0) return;
 49         i=sp_init(r1+r-1); if (i<0) return;
 50         i=mask(&d); if (i<0) return;
 51         weight_variable=activated(&d,'W');
 52         i=test_scaletypes(); if (i<0) return;
 53         i=conditions(&d); if (i<0) return;  /* permitted only once */
 54         i=space_allocation(); if (i<0) return;
 55         compute_sums();
 56         printout();
 57         free(sum); free(f); free(w);
 58         data_close(&d);
 59         }

Among the include lines, 8-10 refer to special SURVO 84C include files. Lines 8-9 should always be present in modules. Line 10 (survodat.h) is needed especially in those modules where SURVO 84C data sets and data files are employed.

Line 12 declares the SURVO_DATA structure d which may represent either a data set in the edit field (as DATA TEST in our example) or a SURVO 84C data file or part of it or even a matrix file. The writer of the module has no need to know the actual form of the data set. By using the tools provided by the SURVO 84C library (like data_open on line 48), all these alternatives can be handled similarly. In rare cases where a distinction has to be made, the d.type member of the SURVO_DATA structure d gives the type of the data set at hand.
     On lines 13-15, pointers to various arrays used in MEAN are declared. In order to make the modules general and flexible, we avoid fixed limits in arrays. Therefore all arrays whose sizes depend on application (like number of variables in the analysis) should be defined dynamically. This is done by using the standard space allocation function malloc. It has been employed here for all space reservations through the space_allocation call on line 54.
     Finally, before the main function starts, certain global variables are declared on lines 17-19. To shorten the function calls, we usually prefer using static variables.

When calling the !MEAN module as a child process, the main program of SURVO 84C passes only one parameter (address of the pointer to the array of system pointers as a string). In the main function of !MEAN this parameter (argv[1]) is needed in the s_init call (line 31). It declares all important SURVO 84C system parameters and variables for !MEAN. Thereafter writing of code in !MEAN is like making more functions for the main program.

However, before the s_init call, lines 26-30 are given in order to prevent misuse of !MEAN (direct call of !MEAN from the MS-DOS level).
     After the s_init call we have, for example, r=current line on the screen and r1=first visible edit line on the screen. Hence r1+r-1 is the current (activated) edit line. See the library reference of s_init for the the complete list of system variables which are initialized by s_init.
     The s_init function also analyzes the edit line (MEAN TEST,19) which was activated by the user and splits it into parts word[0]="MEAN", word[1]="TEST" and word[2]="19" giving the total number of `words' found as g. (In this case g=3).

Lines 32-41 are for testing the completeness of the user's call. Observe that MEAN TEST without an edit line for the results is allowed and thus only the case (g<2) (mere MEAN activated) leads to an error message.
     In such a case, the standard modules typically give a short notice of their usage like "Usage: MEAN <data>, L" and the user can get more information by consulting the inquiry system of SURVO 84C.
     On a new module written by the user, the inquiry system cannot provide any information. Therefore it is important to give longer explanations telling all essential features. This should be done with functions init_remarks, rem_pr, and wait_remarks as shown on lines 32-41. These functions emulate the behaviour of the inquiry system. For example, the user can load the explanations appearing on the screen to the edit field.

The next section in the main function (lines 42-47) deals with output in the edit field. As pointed out earlier, the line label (or number) for the results in the edit field may be omitted (case results_line=0). If the line for the results is given (i.e. g>2), it is found by the SURVO 84C library function edline2 (line 45). If no edit line corresponding to the user's command is found, edline2 gives an error message and returns 0 instead of the line number.

Line 48 i=data_open(word[1],&d); if (i<0) return; opens the data set and initializes several variables (members of structure SURVO_DATA d) describing the size and the structure of the data set. For example, we have the following information readily available for the subsequent processing:
d.m # of variables in data (type int)
d.m_act # of active variables (int)
d.n # of observations in data (long)
d.l1 first active observation (long)
d.l2 last active observation (long)
d.varname[0], ..., d.varname[d.m-1] names of variables (char **)
d.vartype[0], ..., d.vartype[d.m-1] types of variables (char **)
byte 0: type 1,2,4,8 or S
byte 1: activation
byte 2: protection
byte 3: scale type
byte 4-: other mask bytes
d.v[0], ..., d.v[d.m_act-1] indices of the active variables (int *)

If the data is not available, data_open displays an error message and returns -1. In that case there is an immediate return to the main program of SURVO 84C.

In SURVO 84C, the operations are not only controlled by parameters written on the activated line (like TEST and 19 in our example), but the modules can also be guided by using various specifications written around the activated line anywhere in the edit field. In our example, such specifications are MASK=--AAW and CASES=Sex:M .
     To take their effects into consideration, we must first read all the specifications written in the current edit field. This happens by calling the sp_init function once (line 49: sp_init(r1+r-1);) where the argument refers to the line currently activated. It implies sp_init to look for specifications primarily around that line. Later the spfind function is called repeatedly to find specifications from a list generated by sp_init.
     The mask function (on line 50) has the task of analysing the VARS specification (or if it does not appear, the MASK specification) through the spfind function. If VARS or MASK exists, mask corrects the activation status of each variable accordingly. If VARS (MASK) is not given, the status of the data set itself determines which are active variables.

Line 51 checks whether any of the variables in the data set have been activated by `W' (using the activated function). If such a variable is found (as Test3 in our example) the index of that variable is returned and it serves as a weight variable in the computations. Otherwise activated returns -1.

One of the unique features of SURVO 84C is the possibility to assess the validity of various statistical methods by checking the scale types of variables. Scale types can be declared for variables in data files only. The user has the freedom to use or not to use this facility. The test_scaletypes call on line 52 does the job in a positive case.
     The observations may be restricted by the CASES and IND specifications. The conditions function (called on line 53) tests that those specifications, if used at all, are written correctly and initializes system variables which are used for scanning data during the computation (through a function called unsuitable).
     After these preliminary checks, we are ready to allocate space for frequencies, sums of weights and weighted sums of observations. The dimension of these arrays must be d.m_act. This happens by calling space_allocation (line 54).
     If the space is succesfully allocated (there is no negative response), the actual computations can start (compute_sums) and the results are printed (printout).
     Finally (on lines 57-58), the allocated space is freed and the data set closed before returning to the main program of SURVO 84C and to the normal editing mode.

Most of the functions called by the main function of !MEAN are either in the Microsoft C run-time library or in the SURVO 84C libraries. The descriptions of the SURVO 84C library functions will be given later in this paper.
     There are only 4 functions called in the main function being specific for the !MEAN module, namely test_scaletypes, space_allocation, compute_sums, and printout. Since !MEAN is a very small module, all of them are in the same compiland together with the main function.

The test_scaletypes function has the following form:

 61 test_scaletypes()
 62         {
 63         int i,scale_error;
 64
 65         scales(&d);
 66         if (weight_variable>=0)
 67             {
 68             if (!scale_ok(&d,weight_variable,RATIO_SCALE))
 69                 {
 70                 sprintf(sbuf,"\nWeight variable %.8s must have ratio scale!",
 71                           d.varname[weight_variable]); sur_print(sbuf);
 72                 WAIT; if (scale_check==SCALE_INTERRUPT) return(-1);
 73                 }
 74             }
 75         scale_error=0;
 76         for (i=0; i<d.m_act; ++i)
 77             {
 78             if (!scale_ok(&d,d.v[i],SCORE_SCALE))
 79                 {
 80                 if (!scale_error)
 81                     sur_print("\nInvalid scale in variables: ");
 82                 scale_error=1;
 83                 sprintf(sbuf,"%.8s ",d.varname[d.v[i]]); sur_print(sbuf);
 84                 }
 85             }
 86         if (scale_error)
 87             {
 88             sur_print("\nIn MEAN score scale at least is expected!");
 89             WAIT; if (scale_check==SCALE_INTERRUPT) return(-1);
 90             }
 91         return(1);
 92         }

The task of this function is to check the scale types of variables selected for the analysis. In small data sets written in the edit field, the scale types of the variables (columns) cannot be given and then no checks are performed; test_scaletypes will simply return 1 which means that everything is OK. However, in data sets saved in SURVO 84C data files, each variable can be labelled with a one character label (mask column #3) which tells the scale type. For example, variables with a ratio scale are labelled with `R' (discrete) or with `r' (continuous) or with `F' (variable is a frequency). If the user omits these labels (each scale label is then `  '), SURVO 84C will skip all scale checks.
     In any case, at first the scales function is called to remove variables which have the scale type label `-', which means that the variable in question has no scale at all. For example, `names' and `addresses' are typically variables (fields) without a scale. Of course, a careful user does not select such variables for computations, but it is safer to have an extra check by the scales function in order to avoid harmful consequences.
     On lines 66-74 the program tests the scale of the weight variable (if it is used). It is done by using the scale_ok function which is set to require RATIO_SCALE for the weight variable. RATIO_SCALE is a predefined (in survodat.h) string constant "  RrF" telling the permitted scale type alternatives.
     If the scale is not OK, an error message is displayed (on lines 70-71). The continuation depends on the value of the SURVO 84C system parameter scale_check. This parameter can be set to 0, 1 or 2 by the user where 0 means that scale_ok always returns 1 and no warning error messages are given, i.e. everything is accepted. The value scale_check=1 implies that messages are given as warnings, but the analysis can be continued. At the strictest level (value SCALE_INTERRUPT=2) the process is actually interrupted as we can see on line 72.
     The remaining lines of test_scaletypes are devoted to corresponding checks for active variables which now should have a SCORE_SCALE at least. See how the d.v[] array selects the d.m_act variables from all d.m variables. (In our example d.m=5, d.m_act=3 and d.v[0]=2, d.v[1]=3, d.v[2]=4.)

The error messages and warnings are given by producing an output string by the standard sprintf function (usually to a global buffer sbuf of max. 256 characters) and then yielding the output by sur_print(sbuf).

The next function to be introduced is space_allocation:

 94 space_allocation()
 95         {
 96         sum=(double *)malloc(d.m_act*sizeof(double));
 97         if (sum==NULL) { not_enough_memory(); return(-1); }
 98         f=(long *)malloc(d.m_act*sizeof(long));
 99         if (f==NULL) { not_enough_memory(); return(-1); }
100         w=(double *)malloc(d.m_act*sizeof(double));
101         if (w==NULL) { not_enough_memory(); return(-1); }
102         return(1);
103         }
104
105 not_enough_memory()
106         {
107         sur_print("\nNot enough memory! (MEAN)");
108         WAIT;
109         }

This function allocates memory for arrays sum, f and w, which all should have d.m_act elements.
     It is strongly recommended to use dynamic memory allocation for all working space which is dependent on the size of the data set. Then no theoretical limits appear for the number of variables, etc. In practice there are always some limits. On the 16 bit micros we typically have still the 64KB limit for a single array unless the huge memory model is used.
     Since errors in memory allocation may have very surprising consequences, it is, of course, possible to start with fixed dimensions and later when all the space requirements are clear, dynamic arrays are established.
     For example, the lines 13-16 in the main function could read:

 13 #define MAX 100
 14 double sum[MAX];       /* sums of active variables */
 15 long f[MAX];           /* frequencies */
 16 double w[MAX];         /* sums of weights */

and space_allocation is not needed at all, but this should be a temporary arrangement only.

The data set will be scanned by the compute_sums function:

111 compute_sums()
112         {
113         int i;
114         long l;
115
116         n=0L;
117         for (i=0; i<d.m_act; ++i)
118             { f[i]=0L; w[i]=0.0; sum[i]=0.0; }
119
120         sur_print("\n");
121         for (l=d.l1; l<=d.l2; ++l)
122             {
123             double weight;
124
125             if (unsuitable(&d,l)) continue;
126             if (weight_variable==-1) weight=1.0;
127             else
128                 {
129                 data_load(&d,l,weight_variable,&weight);
130                 if (weight==MISSING8) continue;
131                 }
132             ++n;
133             sprintf(sbuf,"%ld ",l); sur_print(sbuf);
134             for (i=0; i<d.m_act; ++i)
135                 {
136                 double x;
137
138                 if (d.v[i]==weight_variable) continue;
139                 data_load(&d,l,d.v[i],&x);
140                 if (x==MISSING8) continue;
141                 ++f[i]; w[i]+=weight; sum[i]+=weight*x;
142                 }
143             }
144         }

At first, the work space is cleared (lines 116-118) and then the rest of the function consists of a loop for active observations (from d.l1 to d.l2). In this loop the function unsuitable checks (line 125) whether the conditions (set by conditions in the main module) are met in the current observation j. If not, the rest of the loop is skipped.
     If the observation is accepted, first the value of the possible weight variable is read by the data_load function (line 129). If weight is missing (line 130), the rest of the loop is skipped. If there is no weight variable, weight=1.0 is selected (line 126).
     Thereafter the number of cases n is increased by one and the order of the current observation is displayed on the screen to indicate that the run is going on (lines 132-133).
     In the inner loop (lines 134-142) all the active variables are scanned and the cumulative sums updated. However, the weight variable is skipped (on line 138). Similarly, possible missing values of active variables are omitted. By comparing n to f[i] we can see the number of missing observations in each variable separately.

The final task of the !MEAN module is to give the results by calling the printout function:

146 printout()
147         {
148         int i;
149         char line[LLENGTH];
150         char mean[32];
151
152         output_open(eout);
153         sprintf(line," Means of variables in %s N=%ld%c",
154                           word[1],n,EOS);
155         if (weight_variable>=0)
156             {
157             strcat(line," Weight=");
158             strncat(line,d.varname[weight_variable],8);
159             }
160         print_line(line);
161         strcpy(line," Variable     Mean     N(missing)");
162         print_line(line);
163         for (i=0; i<d.m_act; ++i)
164             {
165             if (d.v[i]==weight_variable) continue;
166             if (w[i]==0.0)
167                 sprintf(line," %-8.8s            -  %6ld",d.varname[d.v[i]],
168                          n-f[i]);
169             else
170                 {
171                 fnconv(sum[i]/w[i],accuracy+2,mean);
172                 sprintf(line," %-8.8s %s  %6ld",d.varname[d.v[i]],
173                              mean,n-f[i]);
174                 }
175             print_line(line);
176             }
177         output_close(eout);
178         }
179
180 print_line(line)
181 char *line;
182         {
183         output_line(line,eout,results_line);
184         if (results_line) ++results_line;
185         }

At first the output file/device eout is opened by the output_open function. Thereafter lines can be written to eout by the output_line function (called in the function print_line on line 183). The lines are appended to the file. So no previous results are overwritten.
     The SURVO 84C library function output_line writes also lines in the current edit field provided that the third argument (here results_line) gives a valid line number. Remember that the first line for the results was optional in the MEAN operation and we set results_line=0 (on line 42) if that line label was missing.
     print_line (lines 180-185) is only an auxiliary function to keep an eye on the current output line in the edit field.
     It is a practice in SURVO 84C that the numerical accuracy of the printed numbers can be controlled by the user. This happens by using the system parameter accuracy (typically set to the value 7 in SURVO.APU) which gives the desired number of significant digits and such. The writers of the modules must take the current value of accuracy into account when selecting the printout parameters. The library function fnconv is often useful in this task. Here (on line 171) it formats the means. accuracy+2 gives the total length of the resulting string mean; we must have one extra place for sign and one for the decimal point.

These 185 lines constitute the whole !MEAN module in its source form. Since several library functions were employed and there are many `hidden' or optional properties included, the total amount of code after compiling and linking is about 60KB. However, if the module grows, the actual code size is not growing proportionally. For example, !MEAN can be considered a tiny special case of the !CORR module which computes standard deviations and correlations in addition to means, but the size of !CORR is only 6KB more than the size of !MEAN. Thus it is profitable to create modules with several tasks and options.

All SURVO 84C compilands of SURVO 84C modules have to be compiled in the large memory model because the SURVO 84C libraries (SURVO.LIB, SURVOMAT.LIB, etc.) are available in this model only. Thus, the !MEAN.C file is compiled by the command

   CL /c /AL !MEAN.C
and it is linked by
   LINK !MEAN,,NUL.MAP,SURVO /STACK:4000 /NOE .

!MEAN was made and presented only for illustration. Source codes for selected true SURVO 84C modules are available separately.

Each module (as an .EXE file) is normally saved in the SURVO 84C system directory (typically C:\E) and activated by the user as MEAN. During the testing stage, it can be activated from any disk or path. For example, if !MEAN.EXE is on the disk A:,

   A:!MEAN DATA1,11
is a valid command in SURVO 84C.


Previous: SURVO 84C processes
Next: Edit field


Front page of Programming Survo in C