Estimated time of reading: 12-13 minutes
“The Eternal Algorithm”
After speaking at such length about languages, about dumps, about cardboard folders full of listings, and about nuts and bolts, it is time to show concretely what it means to say “the same thing in five different ways.”
I have chosen a single algorithm. I call it “the eternal algorithm” because it is, in all likelihood, the oldest idea in programming that has functioned unchanged for over sixty years: binary search.
The principle is of a simplicity that makes you suspect something must be hiding: divide the space in two, eliminate the wrong half, repeat. With each step, the problem is halved. In a table of one million elements, you find what you are looking for in at most twenty steps. It is not magic. It is mathematics, and it is beautiful.
I have written the same routine in five languages that I used over the course of the years. All of them search for a key (KEY) in a table sorted in ascending order (TABLE) of N integer elements. They return the index of the element found, or zero if it does not exist. The same truth, five grammars.
1. FORTRAN IV (IBM, 1962)
FORTRAN was the language of my serious beginnings. It was the language of calculations — conceived by engineers, for engineers. Variable names: six characters at most. Arrays indexed from 1, not from 0. There was no DO WHILE; you had IF and GO TO, the only tools for a conditional loop. And the emblematic construct was the arithmetic IF: a single instruction that evaluated an expression and branched along three different paths — negative, zero, positive. Three destinies from a single line. Elegant, cryptic, efficient.
The subroutine call: CALL BSRCH(TABLE, N, KEY, IPOS). Parameters are passed by reference — that is, addresses, not values.
C-------------------------------------------------------------
C BINARY SEARCH IN A SORTED INTEGER TABLE
C FORTRAN IV - IBM SYSTEM/360
C
C PARAMETERS:
C TABLE - SORTED INTEGER ARRAY (ASCENDING)
C N - NUMBER OF ELEMENTS
C KEY - VALUE TO SEARCH FOR
C IPOS - RESULT: INDEX (1..N) OR 0 IF NOT FOUND
C-------------------------------------------------------------
SUBROUTINE BSRCH(TABLE, N, KEY, IPOS)
INTEGER TABLE(N), N, KEY, IPOS
INTEGER LOW, IHI, MID
C
C INITIALIZE BOUNDS
C
LOW = 1
IHI = N
IPOS = 0
C
C MAIN SEARCH LOOP
C
10 IF (LOW .GT. IHI) GO TO 30
MID = (LOW + IHI) / 2
IF (KEY - TABLE(MID)) 20, 25, 22
C
C KEY < TABLE(MID): SEARCH LEFT HALF
C
20 IHI = MID - 1
GO TO 10
C
C KEY > TABLE(MID): SEARCH RIGHT HALF
C
22 LOW = MID + 1
GO TO 10
C
C KEY = TABLE(MID): FOUND
C
25 IPOS = MID
C
30 RETURN
END
2. COBOL (IBM, 1960)
COBOL was the other world. If FORTRAN spoke the language of engineers, COBOL spoke the language of accountants and economists. Verbose by nature, but perfectly readable fifty years later — an auditor in 1975 could read the program without any training in programming. It had no natural arithmetic on indices; everything went through COMPUTE. The loop was PERFORM … UNTIL, and the equivalent of switch/case was EVALUATE TRUE, with each WHEN testing a condition. Arrays, as in FORTRAN, were indexed from 1. Data was declared in WORKING-STORAGE, with types described through PIC — picture clause — as though every field had to present its papers at the door.
IDENTIFICATION DIVISION.
PROGRAM-ID. BSEARCH.
*---------------------------------------------------------
* BINARY SEARCH IN A SORTED TABLE
* COBOL-68 / IBM OS/360
*---------------------------------------------------------
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-TABLE.
05 WS-ENTRY PIC 9(8)
OCCURS 1000 TIMES.
01 WS-TABLE-SIZE PIC 9(4).
01 WS-SEARCH-KEY PIC 9(8).
01 WS-RESULT-POS PIC S9(4) VALUE ZERO.
01 WS-FOUND-FLAG PIC X(1) VALUE 'N'.
01 WS-LOW PIC 9(4).
01 WS-HIGH PIC 9(4).
01 WS-MID PIC 9(4).
01 WS-DONE-FLAG PIC X(1) VALUE 'N'.
PROCEDURE DIVISION.
BINARY-SEARCH-PARA.
MOVE 1 TO WS-LOW
MOVE WS-TABLE-SIZE TO WS-HIGH
MOVE ZERO TO WS-RESULT-POS
MOVE 'N' TO WS-FOUND-FLAG
MOVE 'N' TO WS-DONE-FLAG
PERFORM SEARCH-LOOP
UNTIL WS-DONE-FLAG = 'Y'
STOP RUN.
SEARCH-LOOP.
IF WS-LOW > WS-HIGH
MOVE 'Y' TO WS-DONE-FLAG
ELSE
COMPUTE WS-MID =
(WS-LOW + WS-HIGH) / 2
EVALUATE TRUE
WHEN WS-SEARCH-KEY <
WS-ENTRY(WS-MID)
COMPUTE WS-HIGH =
WS-MID - 1
WHEN WS-SEARCH-KEY >
WS-ENTRY(WS-MID)
COMPUTE WS-LOW =
WS-MID + 1
WHEN WS-SEARCH-KEY =
WS-ENTRY(WS-MID)
MOVE WS-MID TO WS-RESULT-POS
MOVE 'Y' TO WS-FOUND-FLAG
MOVE 'Y' TO WS-DONE-FLAG
END-EVALUATE
END-IF.
3. ALGOL 60 (1960)
ALGOL was the academic language — published in journals, rarely compiled on the same machine twice in the same way. I did not use it in production, but I studied it and respected it. Its syntax of begin/end and := influenced Pascal, C, and everything that followed. The ÷ operator was integer division. The distinction between assignment (:=) and comparison (=) — which FORTRAN ignored, generating bugs for decades on end — was already clear in ALGOL. The function returned its value through the procedure name: bsearch := mid. Pure mathematical elegance.
comment -------------------------------------------------
BINARY SEARCH IN A SORTED INTEGER ARRAY
ALGOL 60
Returns index (1..n) if found, 0 if not found
---------------------------------------------------------;
integer procedure bsearch(t, n, key);
value n, key;
integer array t;
integer n, key;
begin
integer low, high, mid;
low := 1;
high := n;
bsearch := 0;
while low <= high do
begin
mid := (low + high) ÷ 2;
if key < t[mid] then
high := mid - 1
else if key > t[mid] then
low := mid + 1
else
begin
bsearch := mid;
low := high + 1 comment force exit;
end
end
end bsearch;
4. ASSEMBLER IBM System/360 — With Standard OS/360 Linkage
This is the central routine — the one I was speaking about in the preceding chapters. Written in Assembler, callable from both FORTRAN and COBOL, without knowing and without caring who is calling it. It respects only the convention.
The OS/360 linkage convention: on entry, R1 points to the parameter list (four consecutive addresses), R13 to the caller’s Save Area, R14 is the return address, R15 is the entry point address. On exit, R15 contains the return code: 0 means found, 4 means not found. The result is deposited directly into the fullword pointed to by the fourth parameter. The last pointer has its sign bit set — the standard OS/360 marker for “last parameter in the list.”
Key instructions: SRA R10,1 — arithmetic shift right by one bit, that is, division by 2, faster than the D (divide) instruction. BCTR R4,0 — decrements R4 by 1, without branching. SLA R4,2 — shift left by two bits, that is, multiplication by 4 (the size of a fullword). C R7,0(R4,R2) — compares R7 with the fullword at address R2+R4, indexed addressing.
BSRCH CSECT
*=======================================================
* BINARY SEARCH - SORTED FULLWORD TABLE
* IBM SYSTEM/360 ASSEMBLER - OS/360 STANDARD LINKAGE
*
* CALLABLE FROM FORTRAN IV AND COBOL
*
* PARAMETERS VIA R1 (STANDARD OS/360):
* +0 A(TABLE) - ADDRESS OF SORTED FULLWORD ARRAY
* +4 A(N) - ADDRESS OF FULLWORD: ELEMENT COUNT
* +8 A(KEY) - ADDRESS OF FULLWORD: SEARCH KEY
* +12 A(IPOS) - ADDRESS OF FULLWORD: RESULT INDEX
* (1..N IF FOUND, 0 IF NOT FOUND)
*
* RETURN CODE IN R15: 0=FOUND, 4=NOT FOUND
*=======================================================
*
*--- STANDARD ENTRY LINKAGE ---
*
STM R14,R12,12(R13) SAVE CALLER'S REGISTERS
LR R12,R15 ESTABLISH BASE REGISTER
USING BSRCH,R12
LA R11,SAVEAREA OUR SAVE AREA
ST R13,4(,R11) BACKWARD CHAIN
ST R11,8(,R13) FORWARD CHAIN
LR R13,R11 R13 -> OUR SAVE AREA
*
*--- LOAD PARAMETER ADDRESSES ---
*
L R2,0(,R1) R2 = A(TABLE)
L R3,4(,R1) R3 = A(N)
L R4,8(,R1) R4 = A(KEY)
L R5,12(,R1) R5 = A(IPOS) (HI BIT ON)
LA R5,0(,R5) CLEAR HIGH-ORDER BIT
*
*--- LOAD VALUES ---
*
L R6,0(,R3) R6 = N (ELEMENT COUNT)
L R7,0(,R4) R7 = KEY (SEARCH VALUE)
*
*--- INITIALIZE: LOW=1, HIGH=N ---
*
LA R8,1 R8 = LOW = 1
LR R9,R6 R9 = HIGH = N
*
*--- MAIN LOOP ---
*
LOOP CR R8,R9 LOW > HIGH?
BH NOTFND YES: NOT FOUND
*
* MID = (LOW + HIGH) / 2
*
LR R10,R8 R10 = LOW
AR R10,R9 R10 = LOW + HIGH
SRA R10,1 R10 = (LOW+HIGH)/2 = MID
*
* COMPUTE TABLE OFFSET: (MID-1)*4
*
LR R4,R10 R4 = MID
BCTR R4,0 R4 = MID - 1
SLA R4,2 R4 = (MID-1) * 4
*
* COMPARE KEY WITH TABLE(MID)
*
C R7,0(R4,R2) COMPARE KEY : TABLE(MID)
BL GOLEFT KEY < TABLE(MID)
BH GORIGHT KEY > TABLE(MID)
*
*--- FOUND: STORE MID IN IPOS, RC=0 ---
*
ST R10,0(,R5) IPOS = MID
SR R15,R15 RC = 0 (FOUND)
B RETURN
*
*--- KEY < TABLE(MID): HIGH = MID - 1 ---
*
GOLEFT LR R9,R10 HIGH = MID
BCTR R9,0 HIGH = MID - 1
B LOOP
*
*--- KEY > TABLE(MID): LOW = MID + 1 ---
*
GORIGHT LR R8,R10 LOW = MID
LA R8,1(,R8) LOW = MID + 1
B LOOP
*
*--- NOT FOUND: IPOS = 0, RC=4 ---
*
NOTFND SR R10,R10 R10 = 0
ST R10,0(,R5) IPOS = 0
LA R15,4 RC = 4 (NOT FOUND)
*
*--- STANDARD EXIT LINKAGE ---
*
RETURN L R13,4(,R13) RESTORE CALLER'S R13
L R14,12(,R13) RESTORE R14 (RETURN ADDR)
LM R0,R12,20(R13) RESTORE R0-R12
BR R14 RETURN TO CALLER
*
*--- CONSTANTS AND WORK AREAS ---
*
SAVEAREA DS 18F OUR SAVE AREA
*
*--- REGISTER EQUATES ---
*
R0 EQU 0
R1 EQU 1
R2 EQU 2
R3 EQU 3
R4 EQU 4
R5 EQU 5
R6 EQU 6
R7 EQU 7
R8 EQU 8
R9 EQU 9
R10 EQU 10
R11 EQU 11
R12 EQU 12
R13 EQU 13
R14 EQU 14
R15 EQU 15
END BSRCH
Calling from FORTRAN:
INTEGER TABLE(1000), N, KEY, IPOS
EXTERNAL BSRCH
C ...
CALL BSRCH(TABLE, N, KEY, IPOS)
C IPOS = INDEX (1..N) SAU 0 DACA NU EXISTA
Calling from COBOL:
01 WS-TABLE.
05 WS-ENTRY PIC S9(8) COMP
OCCURS 1000 TIMES.
01 WS-N PIC S9(8) COMP.
01 WS-KEY PIC S9(8) COMP.
01 WS-IPOS PIC S9(8) COMP.
*
PROCEDURE DIVISION.
CALL 'BSRCH' USING WS-TABLE
WS-N
WS-KEY
WS-IPOS.
* WS-IPOS = INDEX (1..N) OR 0
A critical observation regarding dual linkage: COBOL PIC S9(8) COMP is identical to FORTRAN’s INTEGER — a signed binary fullword. Both languages pass parameters by reference. The Assembler routine does not know, and does not need to know, who is calling it.
5. MACRO-11 (DEC PDP-11, 1970)
A different machine, a different world. The PDP-11: a 16-bit architecture, little-endian, eight registers (R0–R7, where R6 is the stack pointer and R7 is the program counter). Instructions operate on words of 2 bytes, not fullwords of 4. There is no Save Area as on the /360 — the stack is used instead. Parameters are passed on the stack; the result is returned in R0.
Key instructions: ASR — arithmetic shift right, the equivalent of SRA on the /360. ASL — shift left, multiplication by 2 (one word, not 4 bytes). CMP R1,(R3) — compares R1 with the contents at the address held in R3; the parentheses mean indirect, that is, dereferencing. MOV R3,-(SP) — push onto the stack with auto-decrement. MOV (SP)+,R0 — pop with auto-increment. And one detail that can drive you to distraction if you are not aware of it: the numbers are in octal. 10(SP) means octal 10, which is decimal 8.
A typical call:
MOV #TABLE,-(SP) ; PUSH ADDRESS OF TABLE
MOV N,-(SP) ; PUSH ELEMENT COUNT
MOV KEY,-(SP) ; PUSH SEARCH KEY
JSR PC,BSRCH ; CALL
ADD #6,SP ; CLEAN STACK (3 PARAMS * 2)
; R0 = RESULT (INDEX OR 0)
;=======================================================
; BINARY SEARCH - SORTED WORD TABLE
; MACRO-11 / PDP-11 / RT-11
;
; STACK PARAMETERS (PUSHED RIGHT TO LEFT):
; 6(SP) = ADDRESS OF SORTED WORD ARRAY
; 4(SP) = N (ELEMENT COUNT)
; 2(SP) = KEY (SEARCH VALUE)
;
; RETURNS: R0 = INDEX (1..N) IF FOUND, 0 IF NOT
;=======================================================
BSRCH: MOV R1,-(SP) ; SAVE REGISTERS
MOV R2,-(SP)
MOV R3,-(SP)
MOV R4,-(SP)
;
; LOAD PARAMETERS (OFFSET ADJUSTED FOR 4 SAVED REGS)
;
MOV 14(SP),R4 ; R4 = ADDRESS OF TABLE
MOV 12(SP),R3 ; R3 = N
MOV 10(SP),R1 ; R1 = KEY
;
; INITIALIZE: LOW=1, HIGH=N
;
MOV #1,R0 ; R0 = LOW = 1
MOV R3,R2 ; R2 = HIGH = N
;
; MAIN LOOP
;
LOOP: CMP R0,R2 ; LOW > HIGH?
BGT NOTFND ; YES: NOT FOUND
;
; MID = (LOW + HIGH) / 2
;
MOV R0,R3 ; R3 = LOW
ADD R2,R3 ; R3 = LOW + HIGH
ASR R3 ; R3 = (LOW+HIGH)/2 = MID
;
; COMPUTE TABLE OFFSET: (MID-1)*2 (WORD SIZE)
;
MOV R3,-(SP) ; SAVE MID ON STACK
DEC R3 ; R3 = MID - 1
ASL R3 ; R3 = (MID-1) * 2
ADD R4,R3 ; R3 = ADDR OF TABLE(MID)
;
; COMPARE KEY WITH TABLE(MID)
;
CMP R1,(R3) ; KEY : TABLE(MID)
BLT GOLEFT
BGT GORIGHT
;
; FOUND: R0 = MID
;
MOV (SP)+,R0 ; R0 = MID (POP SAVED MID)
BR DONE
;
; KEY < TABLE(MID): HIGH = MID - 1
;
GOLEFT: MOV (SP)+,R2 ; R2 = MID (POP)
DEC R2 ; HIGH = MID - 1
BR LOOP
;
; KEY > TABLE(MID): LOW = MID + 1
;
GORIGHT:MOV (SP)+,R0 ; R0 = MID (POP)
INC R0 ; LOW = MID + 1
BR LOOP
;
; NOT FOUND: R0 = 0
;
NOTFND: CLR R0 ; R0 = 0
;
; RESTORE AND RETURN
;
DONE: MOV (SP)+,R4 ; RESTORE REGISTERS
MOV (SP)+,R3
MOV (SP)+,R2
MOV (SP)+,R1
RTS PC ; RETURN
;
.END
The same truth, five grammars
Each language imposes its own constraints, but the algorithm remains identical: initialise the bounds, calculate the midpoint, compare, adjust, repeat. From FORTRAN’s arithmetic IF to the /360’s SRA, from COBOL’s EVALUATE to the PDP-11’s ASR, the same logic traverses machines and decades, unchanged.
It is, indeed, an eternal algorithm.