RO / EN

Chapter 19

Estimated time of reading: 12-13 minutes

“The Eternal Algorithm”

After speaking at such length about languages, about dumps, about cardboard folders full of listings, and about nuts and bolts, it is time to show concretely what it means to say “the same thing in five different ways.”

I have chosen a single algorithm. I call it “the eternal algorithm” because it is, in all likelihood, the oldest idea in programming that has functioned unchanged for over sixty years: binary search.

The principle is of a simplicity that makes you suspect something must be hiding: divide the space in two, eliminate the wrong half, repeat. With each step, the problem is halved. In a table of one million elements, you find what you are looking for in at most twenty steps. It is not magic. It is mathematics, and it is beautiful.

I have written the same routine in five languages that I used over the course of the years. All of them search for a key (KEY) in a table sorted in ascending order (TABLE) of N integer elements. They return the index of the element found, or zero if it does not exist. The same truth, five grammars.

1. FORTRAN IV (IBM, 1962)

FORTRAN was the language of my serious beginnings. It was the language of calculations — conceived by engineers, for engineers. Variable names: six characters at most. Arrays indexed from 1, not from 0. There was no DO WHILE; you had IF and GO TO, the only tools for a conditional loop. And the emblematic construct was the arithmetic IF: a single instruction that evaluated an expression and branched along three different paths — negative, zero, positive. Three destinies from a single line. Elegant, cryptic, efficient.

The subroutine call: CALL BSRCH(TABLE, N, KEY, IPOS). Parameters are passed by reference — that is, addresses, not values.

C-------------------------------------------------------------

C     BINARY SEARCH IN A SORTED INTEGER TABLE

C     FORTRAN IV - IBM SYSTEM/360

C

C     PARAMETERS:

C       TABLE  - SORTED INTEGER ARRAY (ASCENDING)

C       N      - NUMBER OF ELEMENTS

C       KEY    - VALUE TO SEARCH FOR

C       IPOS   - RESULT: INDEX (1..N) OR 0 IF NOT FOUND

C-------------------------------------------------------------

     SUBROUTINE BSRCH(TABLE, N, KEY, IPOS)

     INTEGER TABLE(N), N, KEY, IPOS

     INTEGER LOW, IHI, MID

C

C     INITIALIZE BOUNDS

C

     LOW = 1

     IHI = N

     IPOS = 0

C

C     MAIN SEARCH LOOP

C

  10 IF (LOW .GT. IHI) GO TO 30

     MID = (LOW + IHI) / 2

     IF (KEY - TABLE(MID)) 20, 25, 22

C

C     KEY < TABLE(MID): SEARCH LEFT HALF

C

  20 IHI = MID - 1

     GO TO 10

C

C     KEY > TABLE(MID): SEARCH RIGHT HALF

C

  22 LOW = MID + 1

     GO TO 10

C

C     KEY = TABLE(MID): FOUND

C

  25 IPOS = MID

C

  30 RETURN

     END

2. COBOL (IBM, 1960)

COBOL was the other world. If FORTRAN spoke the language of engineers, COBOL spoke the language of accountants and economists. Verbose by nature, but perfectly readable fifty years later — an auditor in 1975 could read the program without any training in programming. It had no natural arithmetic on indices; everything went through COMPUTE. The loop was PERFORM … UNTIL, and the equivalent of switch/case was EVALUATE TRUE, with each WHEN testing a condition. Arrays, as in FORTRAN, were indexed from 1. Data was declared in WORKING-STORAGE, with types described through PIC — picture clause — as though every field had to present its papers at the door.

      IDENTIFICATION DIVISION.

      PROGRAM-ID. BSEARCH.

     *---------------------------------------------------------

     * BINARY SEARCH IN A SORTED TABLE

     * COBOL-68 / IBM OS/360

     *---------------------------------------------------------

      DATA DIVISION.

      WORKING-STORAGE SECTION.

      01  WS-TABLE.

          05  WS-ENTRY        PIC 9(8)

                               OCCURS 1000 TIMES.

      01  WS-TABLE-SIZE       PIC 9(4).

      01  WS-SEARCH-KEY       PIC 9(8).

      01  WS-RESULT-POS       PIC S9(4) VALUE ZERO.

      01  WS-FOUND-FLAG       PIC X(1)  VALUE 'N'.

      01  WS-LOW              PIC 9(4).

      01  WS-HIGH             PIC 9(4).

      01  WS-MID              PIC 9(4).

      01  WS-DONE-FLAG        PIC X(1)  VALUE 'N'.

      PROCEDURE DIVISION.

      BINARY-SEARCH-PARA.

          MOVE 1              TO WS-LOW

          MOVE WS-TABLE-SIZE  TO WS-HIGH

          MOVE ZERO           TO WS-RESULT-POS

          MOVE 'N'            TO WS-FOUND-FLAG

          MOVE 'N'            TO WS-DONE-FLAG

          PERFORM SEARCH-LOOP

              UNTIL WS-DONE-FLAG = 'Y'

          STOP RUN.

      SEARCH-LOOP.

          IF WS-LOW > WS-HIGH

              MOVE 'Y' TO WS-DONE-FLAG

          ELSE

              COMPUTE WS-MID =

                  (WS-LOW + WS-HIGH) / 2

              EVALUATE TRUE

                WHEN WS-SEARCH-KEY <

                     WS-ENTRY(WS-MID)

                  COMPUTE WS-HIGH =

                      WS-MID - 1

                WHEN WS-SEARCH-KEY >

                     WS-ENTRY(WS-MID)

                  COMPUTE WS-LOW =

                      WS-MID + 1

                WHEN WS-SEARCH-KEY =

                     WS-ENTRY(WS-MID)

                  MOVE WS-MID TO WS-RESULT-POS

                  MOVE 'Y'    TO WS-FOUND-FLAG

                  MOVE 'Y'    TO WS-DONE-FLAG

              END-EVALUATE

          END-IF.

3. ALGOL 60 (1960)

ALGOL was the academic language — published in journals, rarely compiled on the same machine twice in the same way. I did not use it in production, but I studied it and respected it. Its syntax of begin/end and := influenced Pascal, C, and everything that followed. The ÷ operator was integer division. The distinction between assignment (:=) and comparison (=) — which FORTRAN ignored, generating bugs for decades on end — was already clear in ALGOL. The function returned its value through the procedure name: bsearch := mid. Pure mathematical elegance.

comment -------------------------------------------------

 BINARY SEARCH IN A SORTED INTEGER ARRAY

 ALGOL 60

 Returns index (1..n) if found, 0 if not found

---------------------------------------------------------;

integer procedure bsearch(t, n, key);

 value n, key;

 integer array t;

 integer n, key;

begin

 integer low, high, mid;

 low  := 1;

 high := n;

 bsearch := 0;

 while low <= high do

 begin

   mid := (low + high) ÷ 2;

   if key < t[mid] then

     high := mid - 1

   else if key > t[mid] then

     low := mid + 1

   else

   begin

     bsearch := mid;

     low := high + 1       comment force exit;

   end

 end

end bsearch;

4. ASSEMBLER IBM System/360 — With Standard OS/360 Linkage

This is the central routine — the one I was speaking about in the preceding chapters. Written in Assembler, callable from both FORTRAN and COBOL, without knowing and without caring who is calling it. It respects only the convention.

The OS/360 linkage convention: on entry, R1 points to the parameter list (four consecutive addresses), R13 to the caller’s Save Area, R14 is the return address, R15 is the entry point address. On exit, R15 contains the return code: 0 means found, 4 means not found. The result is deposited directly into the fullword pointed to by the fourth parameter. The last pointer has its sign bit set — the standard OS/360 marker for “last parameter in the list.”

Key instructions: SRA R10,1 — arithmetic shift right by one bit, that is, division by 2, faster than the D (divide) instruction. BCTR R4,0 — decrements R4 by 1, without branching. SLA R4,2 — shift left by two bits, that is, multiplication by 4 (the size of a fullword). C R7,0(R4,R2) — compares R7 with the fullword at address R2+R4, indexed addressing.

BSRCH    CSECT

*=======================================================

*  BINARY SEARCH - SORTED FULLWORD TABLE

*  IBM SYSTEM/360 ASSEMBLER - OS/360 STANDARD LINKAGE

*

*  CALLABLE FROM FORTRAN IV AND COBOL

*

*  PARAMETERS VIA R1 (STANDARD OS/360):

*    +0  A(TABLE)  - ADDRESS OF SORTED FULLWORD ARRAY

*    +4  A(N)      - ADDRESS OF FULLWORD: ELEMENT COUNT

*    +8  A(KEY)    - ADDRESS OF FULLWORD: SEARCH KEY

*    +12 A(IPOS)   - ADDRESS OF FULLWORD: RESULT INDEX

*                    (1..N IF FOUND, 0 IF NOT FOUND)

*

*  RETURN CODE IN R15: 0=FOUND, 4=NOT FOUND

*=======================================================

*

*--- STANDARD ENTRY LINKAGE ---

*

        STM   R14,R12,12(R13)    SAVE CALLER'S REGISTERS

        LR    R12,R15            ESTABLISH BASE REGISTER

        USING BSRCH,R12

        LA    R11,SAVEAREA       OUR SAVE AREA

        ST    R13,4(,R11)        BACKWARD CHAIN

        ST    R11,8(,R13)        FORWARD CHAIN

        LR    R13,R11            R13 -> OUR SAVE AREA

*

*--- LOAD PARAMETER ADDRESSES ---

*

        L     R2,0(,R1)          R2 = A(TABLE)

        L     R3,4(,R1)          R3 = A(N)

        L     R4,8(,R1)          R4 = A(KEY)

        L     R5,12(,R1)         R5 = A(IPOS) (HI BIT ON)

        LA    R5,0(,R5)          CLEAR HIGH-ORDER BIT

*

*--- LOAD VALUES ---

*

        L     R6,0(,R3)          R6 = N (ELEMENT COUNT)

        L     R7,0(,R4)          R7 = KEY (SEARCH VALUE)

*

*--- INITIALIZE: LOW=1, HIGH=N ---

*

        LA    R8,1               R8 = LOW = 1

        LR    R9,R6              R9 = HIGH = N

*

*--- MAIN LOOP ---

*

LOOP     CR    R8,R9              LOW > HIGH?

        BH    NOTFND             YES: NOT FOUND

*

*        MID = (LOW + HIGH) / 2

*

        LR    R10,R8             R10 = LOW

        AR    R10,R9             R10 = LOW + HIGH

        SRA   R10,1              R10 = (LOW+HIGH)/2 = MID

*

*        COMPUTE TABLE OFFSET: (MID-1)*4

*

        LR    R4,R10             R4 = MID

        BCTR  R4,0               R4 = MID - 1

        SLA   R4,2               R4 = (MID-1) * 4

*

*        COMPARE KEY WITH TABLE(MID)

*

        C     R7,0(R4,R2)        COMPARE KEY : TABLE(MID)

        BL    GOLEFT             KEY < TABLE(MID)

        BH    GORIGHT            KEY > TABLE(MID)

*

*--- FOUND: STORE MID IN IPOS, RC=0 ---

*

        ST    R10,0(,R5)         IPOS = MID

        SR    R15,R15            RC = 0 (FOUND)

        B     RETURN

*

*--- KEY < TABLE(MID): HIGH = MID - 1 ---

*

GOLEFT   LR    R9,R10             HIGH = MID

        BCTR  R9,0               HIGH = MID - 1

        B     LOOP

*

*--- KEY > TABLE(MID): LOW = MID + 1 ---

*

GORIGHT  LR    R8,R10             LOW = MID

        LA    R8,1(,R8)          LOW = MID + 1

        B     LOOP

*

*--- NOT FOUND: IPOS = 0, RC=4 ---

*

NOTFND   SR    R10,R10            R10 = 0

        ST    R10,0(,R5)         IPOS = 0

        LA    R15,4              RC = 4 (NOT FOUND)

*

*--- STANDARD EXIT LINKAGE ---

*

RETURN   L     R13,4(,R13)        RESTORE CALLER'S R13

        L     R14,12(,R13)       RESTORE R14 (RETURN ADDR)

        LM    R0,R12,20(R13)     RESTORE R0-R12

        BR    R14                RETURN TO CALLER

*

*--- CONSTANTS AND WORK AREAS ---

*

SAVEAREA DS    18F                OUR SAVE AREA

*

*--- REGISTER EQUATES ---

*

R0       EQU   0

R1       EQU   1

R2       EQU   2

R3       EQU   3

R4       EQU   4

R5       EQU   5

R6       EQU   6

R7       EQU   7

R8       EQU   8

R9       EQU   9

R10      EQU   10

R11      EQU   11

R12      EQU   12

R13      EQU   13

R14      EQU   14

R15      EQU   15

        END   BSRCH

Calling from FORTRAN:

     INTEGER TABLE(1000), N, KEY, IPOS

     EXTERNAL BSRCH

C     ...

     CALL BSRCH(TABLE, N, KEY, IPOS)

C     IPOS = INDEX (1..N) SAU 0 DACA NU EXISTA

Calling from COBOL:

      01  WS-TABLE.

          05  WS-ENTRY   PIC S9(8) COMP

                          OCCURS 1000 TIMES.

      01  WS-N           PIC S9(8) COMP.

      01  WS-KEY         PIC S9(8) COMP.

      01  WS-IPOS        PIC S9(8) COMP.

     *

      PROCEDURE DIVISION.

          CALL 'BSRCH' USING WS-TABLE

                              WS-N

                              WS-KEY

                              WS-IPOS.

     * WS-IPOS = INDEX (1..N) OR 0

A critical observation regarding dual linkage: COBOL PIC S9(8) COMP is identical to FORTRAN’s INTEGER — a signed binary fullword. Both languages pass parameters by reference. The Assembler routine does not know, and does not need to know, who is calling it.

5. MACRO-11 (DEC PDP-11, 1970)

A different machine, a different world. The PDP-11: a 16-bit architecture, little-endian, eight registers (R0–R7, where R6 is the stack pointer and R7 is the program counter). Instructions operate on words of 2 bytes, not fullwords of 4. There is no Save Area as on the /360 — the stack is used instead. Parameters are passed on the stack; the result is returned in R0.

Key instructions: ASR — arithmetic shift right, the equivalent of SRA on the /360. ASL — shift left, multiplication by 2 (one word, not 4 bytes). CMP R1,(R3) — compares R1 with the contents at the address held in R3; the parentheses mean indirect, that is, dereferencing. MOV R3,-(SP) — push onto the stack with auto-decrement. MOV (SP)+,R0 — pop with auto-increment. And one detail that can drive you to distraction if you are not aware of it: the numbers are in octal. 10(SP) means octal 10, which is decimal 8.

A typical call:

       MOV     #TABLE,-(SP)    ; PUSH ADDRESS OF TABLE

       MOV     N,-(SP)         ; PUSH ELEMENT COUNT

       MOV     KEY,-(SP)       ; PUSH SEARCH KEY

       JSR     PC,BSRCH        ; CALL

       ADD     #6,SP           ; CLEAN STACK (3 PARAMS * 2)

       ; R0 = RESULT (INDEX OR 0)

;=======================================================

;  BINARY SEARCH - SORTED WORD TABLE

;  MACRO-11 / PDP-11 / RT-11

;

;  STACK PARAMETERS (PUSHED RIGHT TO LEFT):

;    6(SP) = ADDRESS OF SORTED WORD ARRAY

;    4(SP) = N (ELEMENT COUNT)

;    2(SP) = KEY (SEARCH VALUE)

;

;  RETURNS: R0 = INDEX (1..N) IF FOUND, 0 IF NOT

;=======================================================

BSRCH:  MOV     R1,-(SP)        ; SAVE REGISTERS

       MOV     R2,-(SP)

       MOV     R3,-(SP)

       MOV     R4,-(SP)

;

; LOAD PARAMETERS (OFFSET ADJUSTED FOR 4 SAVED REGS)

;

       MOV     14(SP),R4       ; R4 = ADDRESS OF TABLE

       MOV     12(SP),R3       ; R3 = N

       MOV     10(SP),R1       ; R1 = KEY

;

; INITIALIZE: LOW=1, HIGH=N

;

       MOV     #1,R0           ; R0 = LOW = 1

       MOV     R3,R2           ; R2 = HIGH = N

;

; MAIN LOOP

;

LOOP:   CMP     R0,R2           ; LOW > HIGH?

       BGT     NOTFND          ; YES: NOT FOUND

;

; MID = (LOW + HIGH) / 2

;

       MOV     R0,R3           ; R3 = LOW

       ADD     R2,R3           ; R3 = LOW + HIGH

       ASR     R3              ; R3 = (LOW+HIGH)/2 = MID

;

; COMPUTE TABLE OFFSET: (MID-1)*2  (WORD SIZE)

;

       MOV     R3,-(SP)        ; SAVE MID ON STACK

       DEC     R3              ; R3 = MID - 1

       ASL     R3              ; R3 = (MID-1) * 2

       ADD     R4,R3           ; R3 = ADDR OF TABLE(MID)

;

; COMPARE KEY WITH TABLE(MID)

;

       CMP     R1,(R3)         ; KEY : TABLE(MID)

       BLT     GOLEFT

       BGT     GORIGHT

;

; FOUND: R0 = MID

;

       MOV     (SP)+,R0        ; R0 = MID (POP SAVED MID)

       BR      DONE

;

; KEY < TABLE(MID): HIGH = MID - 1

;

GOLEFT: MOV     (SP)+,R2        ; R2 = MID (POP)

       DEC     R2              ; HIGH = MID - 1

       BR      LOOP

;

; KEY > TABLE(MID): LOW = MID + 1

;

GORIGHT:MOV     (SP)+,R0        ; R0 = MID (POP)

       INC     R0              ; LOW = MID + 1

       BR      LOOP

;

; NOT FOUND: R0 = 0

;

NOTFND: CLR     R0              ; R0 = 0

;

; RESTORE AND RETURN

;

DONE:   MOV     (SP)+,R4        ; RESTORE REGISTERS

       MOV     (SP)+,R3

       MOV     (SP)+,R2

       MOV     (SP)+,R1

       RTS     PC              ; RETURN

;

       .END

The same truth, five grammars

Each language imposes its own constraints, but the algorithm remains identical: initialise the bounds, calculate the midpoint, compare, adjust, repeat. From FORTRAN’s arithmetic IF to the /360’s SRA, from COBOL’s EVALUATE to the PDP-11’s ASR, the same logic traverses machines and decades, unchanged.

It is, indeed, an eternal algorithm.

See Appendix →

- End of Chapter 19-