Estimated time of reading: 12-13 minutes
“The routine that calls itself”
The iterative variant of binary search — the one presented in the chapter — uses a loop: adjust the bounds, recalculate the midpoint, repeat. It is direct, economical, without surprises.
But there is another way, more elegant mathematically: recursion. Instead of adjusting the bounds and returning to the beginning of the loop, the routine calls itself with new bounds. Each call solves a smaller problem, until the answer is trivial: either we have found it, or the space has been exhausted. It is the same algorithm, only instead of a loop that revolves, you have a function that mirrors itself.
In high-level languages, recursion is simple: the compiler takes care of the stack, the context, the return. In Assembler, you must do everything yourself. And that is precisely why it is instructive: you see with your own eyes how each recursive call deposits its small universe on the stack — registers, parameters, return address — and how, upon returning, each level lifts its universe back up, untouched.
MACRO-11: Natural recursion
The PDP-11 might have been designed with recursion in mind. The hardware stack, with auto-decrement on push and auto-increment on pop, means that each recursive call is merely a sequence of MOVs onto the stack followed by a JSR. On return, RTS PC restores everything automatically.
The interface: four parameters on the stack — the table address, the key, the lower bound (LOW) and the upper bound (HIGH). On the first call, LOW is 1 and HIGH is N. The result is returned in R0: the index found, or 0 if the element does not exist.
The base case: if LOW > HIGH, the space has been exhausted; return 0. Otherwise, calculate MID, compare, and call oneself with adjusted bounds. Note that the table address and the key are retransmitted identically at every level — only LOW and HIGH change.
A PDP-11 detail that can drive you to distraction if you are unaware of it: the stack offsets are in octal. 10(SP) means octal 10, that is, decimal 8. 12(SP) is decimal 10, 14(SP) is decimal 12, and so on. Once you count in octal, the stack becomes a perfect map.
The initial call:
; INITIAL CALL:
;
MOV N,-(SP) ; PUSH HIGH = N
MOV #1,-(SP) ; PUSH LOW = 1
MOV KEY,-(SP) ; PUSH KEY
MOV #TABLE,-(SP) ; PUSH TABLE ADDRESS
JSR PC,RBSRCH ; CALL
ADD #10,SP ; CLEAN STACK (4 PARAMS * 2 = 10 OCTAL)
; R0 = INDEX (1..N) OR 0
The recursive routine:
;=======================================================
; RECURSIVE BINARY SEARCH - MACRO-11 / PDP-11 / RT-11
;
; PARAMETERS ON STACK (PUSHED RIGHT TO LEFT):
; 2(SP) = TABLE ADDRESS (pointer to sorted word array)
; 4(SP) = KEY (search value)
; 6(SP) = LOW (lower bound, 1-based)
; 10(SP) = HIGH (upper bound, 1-based)
; (10 octal = 8 decimal)
;
; RETURNS: R0 = INDEX (1..N) IF FOUND, 0 IF NOT
;
; FIRST CALL:
; MOV HIGH,-(SP) ; push HIGH (=N)
; MOV LOW,-(SP) ; push LOW (=1)
; MOV KEY,-(SP) ; push KEY
; MOV #TABLE,-(SP) ; push TABLE address
; JSR PC,RBSRCH
; ADD #10,SP ; clean 4 params (octal 10 = 8)
; ; R0 = result
;=======================================================
RBSRCH: MOV R1,-(SP) ; SAVE REGISTERS
MOV R2,-(SP)
MOV R3,-(SP)
MOV R4,-(SP)
;
; LOAD PARAMETERS
; 4 saved regs + return addr = 5 words = 12 octal offset
;
MOV 12(SP),R1 ; R1 = TABLE ADDRESS
MOV 14(SP),R2 ; R2 = KEY
MOV 16(SP),R3 ; R3 = LOW
MOV 20(SP),R4 ; R4 = HIGH
;
; BASE CASE: LOW > HIGH? => NOT FOUND
;
CMP R3,R4 ; LOW > HIGH?
BGT RNFND ; YES: RETURN 0
;
; MID = (LOW + HIGH) / 2
;
MOV R3,R0 ; R0 = LOW
ADD R4,R0 ; R0 = LOW + HIGH
ASR R0 ; R0 = (LOW+HIGH)/2 = MID
;
; COMPUTE ADDRESS OF TABLE(MID)
;
MOV R0,-(SP) ; SAVE MID ON STACK
DEC R0 ; R0 = MID - 1
ASL R0 ; R0 = (MID-1) * 2 (WORD SIZE)
ADD R1,R0 ; R0 = ADDR OF TABLE(MID)
;
; COMPARE KEY WITH TABLE(MID)
;
CMP R2,(R0) ; KEY : TABLE(MID)
BLT RLEFT ; KEY < TABLE(MID)
BGT RRIGHT ; KEY > TABLE(MID)
;
; FOUND: RETURN MID
;
MOV (SP)+,R0 ; R0 = MID (POP)
BR RDONE
;
; KEY < TABLE(MID): RECURSE WITH HIGH = MID - 1
;
RLEFT: MOV (SP)+,R0 ; R0 = MID (POP)
DEC R0 ; R0 = MID - 1
;
; PUSH PARAMS FOR RECURSIVE CALL (RIGHT TO LEFT)
;
MOV R0,-(SP) ; PUSH NEW HIGH (MID-1)
MOV R3,-(SP) ; PUSH LOW (UNCHANGED)
MOV R2,-(SP) ; PUSH KEY
MOV R1,-(SP) ; PUSH TABLE ADDRESS
JSR PC,RBSRCH ; RECURSE!
ADD #10,SP ; CLEAN 4 PARAMS (OCTAL 10 = 8)
BR RDONE ; R0 = RESULT FROM RECURSION
;
; KEY > TABLE(MID): RECURSE WITH LOW = MID + 1
;
RRIGHT: MOV (SP)+,R0 ; R0 = MID (POP)
INC R0 ; R0 = MID + 1
;
; PUSH PARAMS FOR RECURSIVE CALL (RIGHT TO LEFT)
;
MOV R4,-(SP) ; PUSH HIGH (UNCHANGED)
MOV R0,-(SP) ; PUSH NEW LOW (MID+1)
MOV R2,-(SP) ; PUSH KEY
MOV R1,-(SP) ; PUSH TABLE ADDRESS
JSR PC,RBSRCH ; RECURSE!
ADD #10,SP ; CLEAN 4 PARAMS (OCTAL 10 = 8)
BR RDONE ; R0 = RESULT FROM RECURSION
;
; NOT FOUND: RETURN 0
;
RNFND: CLR R0
;
; RESTORE AND RETURN
;
RDONE: MOV (SP)+,R4 ; RESTORE REGISTERS
MOV (SP)+,R3
MOV (SP)+,R2
MOV (SP)+,R1
RTS PC ; RETURN (R0 = RESULT)
;
.END
ASSEMBLER IBM System/360: Recursion built by hand
On the /360, recursion is not impossible — but no one hands it to you on a plate. There is no hardware stack. There is the Save Area — a block of 18 fullwords in which each routine saves its registers. The Save Areas are chained: each one points to the preceding one (backward) and to the next (forward). It is like a chain of railway wagons, each with its place in the consist.
The problem: in a normal call, each routine has a single, static Save Area. If the routine calls itself, the second invocation overwrites the Save Area of the first — and upon return, the registers are lost. Disaster.
The solution: a simulated stack. I pre-allocated a memory area (STACK) with 20 frames of 96 bytes each — 18 fullwords for the Save Area plus 6 fullwords for local variables and the parameter list. A pointer (STKPTR) keeps track of the current frame. On each recursive call, the pointer advances by 96 bytes; on return, it recedes. Twenty levels are sufficient for 2 to the power of 20, that is, over one million elements — more than any disk in 1972 could hold.
The key instruction is BALR R14,R15 — Branch And Link Register: it saves the return address in R14 and branches to the address in R15. It is the equivalent of JSR on the PDP-11, except that it does not use the stack — it uses a register. Everything else — saving the context, passing parameters, restoration — is done by hand, with care and discipline.
RBSRCH CSECT
*=============================================================
* RECURSIVE BINARY SEARCH - IBM SYSTEM/360
* OS/360 STANDARD LINKAGE WITH DYNAMIC SAVE AREAS
*
* PARAMETERS VIA R1 (STANDARD OS/360):
* +0 A(TABLE) - ADDRESS OF SORTED FULLWORD ARRAY
* +4 A(KEY) - ADDRESS OF FULLWORD: SEARCH KEY
* +8 A(LOW) - ADDRESS OF FULLWORD: LOWER BOUND
* +12 A(HIGH) - ADDRESS OF FULLWORD: UPPER BOUND
* +16 A(IPOS) - ADDRESS OF FULLWORD: RESULT INDEX
*
* RETURN CODE IN R15: 0=FOUND, 4=NOT FOUND
*
* NOTE: EACH RECURSIVE CALL USES A FRAME FROM STACK AREA.
* REGISTER R11 IS THE STACK POINTER, ADVANCING BY
* FRAME SIZE (96 BYTES) ON EACH CALL.
* MAX DEPTH = 20 (ENOUGH FOR 2^20 = 1M ELEMENTS)
*=============================================================
*
FSIZE EQU 96 FRAME SIZE (18F SAVE + 6F WORK)
*
*--- STANDARD ENTRY LINKAGE ---
*
STM R14,R12,12(R13) SAVE CALLER'S REGISTERS
LR R12,R15 ESTABLISH BASE REGISTER
USING RBSRCH,R12
*
*--- ALLOCATE FRAME FROM STACK ---
*
L R11,=A(STKPTR) R11 -> CURRENT STACK PTR
L R10,0(,R11) R10 = CURRENT FRAME ADDRESS
LA R9,FSIZE(,R10) R9 = NEXT FRAME ADDRESS
ST R9,0(,R11) ADVANCE STACK POINTER
*
*--- CHAIN SAVE AREAS ---
*
ST R13,4(,R10) BACKWARD CHAIN
ST R10,8(,R13) FORWARD CHAIN
LR R13,R10 R13 -> OUR FRAME
*
*--- LOAD PARAMETERS ---
*
L R2,0(,R1) R2 = A(TABLE)
L R3,4(,R1) R3 = A(KEY)
L R4,8(,R1) R4 = A(LOW)
L R5,12(,R1) R5 = A(HIGH)
L R6,16(,R1) R6 = A(IPOS)
LA R6,0(,R6) CLEAR HI BIT IF LAST PARM
*
L R7,0(,R3) R7 = KEY VALUE
L R8,0(,R4) R8 = LOW VALUE
L R9,0(,R5) R9 = HIGH VALUE
*
*--- BASE CASE: LOW > HIGH? ---
*
CR R8,R9 LOW > HIGH?
BH RNOTFND YES: NOT FOUND
*
*--- MID = (LOW + HIGH) / 2 ---
*
LR R10,R8 R10 = LOW
AR R10,R9 R10 = LOW + HIGH
SRA R10,1 R10 = MID
*
*--- COMPARE KEY WITH TABLE(MID) ---
*
LR R4,R10 R4 = MID
BCTR R4,0 R4 = MID - 1
SLA R4,2 R4 = (MID-1) * 4
C R7,0(R4,R2) KEY : TABLE(MID)
BL RLEFT KEY < TABLE(MID)
BH RRIGHT KEY > TABLE(MID)
*
*--- FOUND: IPOS = MID, RC = 0 ---
*
ST R10,0(,R6) IPOS = MID
SR R15,R15 RC = 0
B REXIT
*
*--- KEY < TABLE(MID): RECURSE(LOW, MID-1) ---
*
RLEFT LR R5,R10 R5 = MID
BCTR R5,0 R5 = MID - 1
ST R8,72(,R13) STORE LOW IN WORK AREA
ST R5,76(,R13) STORE NEW HIGH (MID-1)
*
* BUILD PARAMETER LIST IN WORK AREA
*
L R1,=A(SAVTBL) RELOAD A(TABLE) FROM SAVE
ST R1,80(,R13) PARM+0: A(TABLE)
L R1,=A(SAVKEY)
ST R1,84(,R13) PARM+4: A(KEY)
LA R1,72(,R13)
ST R1,88(,R13) PARM+8: A(LOW) -> WORK+0
LA R1,76(,R13)
ST R1,92(,R13) PARM+12: A(HIGH) -> WORK+4
*
* NOTE: IPOS ADDRESS STAYS IN R6, PASSED THROUGH
*
B RCALL
*
*--- KEY > TABLE(MID): RECURSE(MID+1, HIGH) ---
*
RRIGHT LA R5,1(,R10) R5 = MID + 1
ST R5,72(,R13) STORE NEW LOW (MID+1)
ST R9,76(,R13) STORE HIGH (UNCHANGED)
*
L R1,=A(SAVTBL)
ST R1,80(,R13) PARM+0: A(TABLE)
L R1,=A(SAVKEY)
ST R1,84(,R13) PARM+4: A(KEY)
LA R1,72(,R13)
ST R1,88(,R13) PARM+8: A(LOW) -> WORK+0
LA R1,76(,R13)
ST R1,92(,R13) PARM+12: A(HIGH) -> WORK+4
*
*--- COMMON RECURSIVE CALL ---
*
RCALL LA R1,80(,R13) R1 -> PARAMETER LIST
* SET LAST PARM INDICATOR
L R5,92(,R13) LOAD LAST PARM ADDR
O R5,=X'80000000' SET HI BIT
ST R5,92(,R13) STORE BACK
*
L R15,=A(RBSRCH) R15 = ENTRY POINT
BALR R14,R15 CALL SELF (RECURSE!)
* R15 = RETURN CODE FROM RECURSIVE CALL
B REXIT
*
*--- NOT FOUND: IPOS = 0, RC = 4 ---
*
RNOTFND SR R10,R10
ST R10,0(,R6) IPOS = 0
LA R15,4 RC = 4
*
*--- STANDARD EXIT: DEALLOCATE FRAME ---
*
REXIT L R11,=A(STKPTR) RESTORE STACK POINTER
L R10,0(,R11)
S R10,=F'96' POP ONE FRAME
ST R10,0(,R11)
*
L R13,4(,R13) RESTORE CALLER'S R13
L R14,12(,R13) RESTORE R14
LM R0,R12,20(R13) RESTORE R0-R12
BR R14 RETURN
*
*--- CONSTANTS AND WORK AREAS ---
*
SAVTBL DS A SAVED A(TABLE) FOR RECURSION
SAVKEY DS A SAVED A(KEY) FOR RECURSION
STKPTR DC A(STACK) CURRENT STACK POINTER
STACK DS 20CL96 STACK: 20 FRAMES * 96 BYTES
* (MAX DEPTH 20 = 2**20 ELEMS)
*
R0 EQU 0
R1 EQU 1
R2 EQU 2
R3 EQU 3
R4 EQU 4
R5 EQU 5
R6 EQU 6
R7 EQU 7
R8 EQU 8
R9 EQU 9
R10 EQU 10
R11 EQU 11
R12 EQU 12
R13 EQU 13
R14 EQU 14
R15 EQU 15
END RBSRCH
Two machines, the same idea
On the PDP-11, recursion flows naturally: the hardware stack makes everything transparent. Each call is merely a JSR, each return merely an RTS. The machine was conceived for this.
On the /360, recursion is like pitching a tent by hand in a storm: it can be done, but you must know where to drive each stake. The simulated stack, the pre-allocated frames, the chain of Save Areas — all are constructions that the PDP-11 had for free in its hardware.
But the algorithm? The algorithm is the same. Divide, compare, call yourself with a smaller problem. The same logic, regardless of whether the stack is made of silicon or of code written by hand.
The eternal algorithm takes no notice of architecture. It simply works.