Estimated time of reading: 5-6 minutes
“Does anyone still remember direct file organisation on disk in FORTRAN?”
Do not imagine I was some devotee of note-taking. With most lecture courses I had a simple relationship: they pursued me, I fled. But the ones related to programming I kept. Not out of piety, but out of self-interest. Programming, in those days, was not merely a “trade.” It was also considered “an art,” and it served one well to be a good artist-programmer.
It was the beginning of 1972. At the institute we ran our work on the Siemens 4004, but my serious schooling had been on the IBM 360 at the Academy of Economic Studies. That matters. In that era, “the art of programming” was not a pretty metaphor for conference speeches. It was a necessity: memory was scarce, disk was expensive, CPU time was rationed, and every mistake was paid for in printed paper and in nerves. Today, when something does not work, you hit refresh. Back then, when something did not work, you waited for the listing. We worked in batch. You left your box of punched cards at “Reception” and waited for the run to see what results you got. And the wait for the listing was, more often than not, longer than one’s patience. That is why I kept not only my notes, but also the listings of programs that had come out right. They were “sources” in the most concrete sense: printed sheets bearing the program, held together with a rubber band and placed in a cardboard folder. There was no Git, no cloud, no “I’ve put it on Drive.” The backup was the folder. If you dropped it on the floor, you had a full-blown merge conflict — only the resolution was done by hand, not with a mouse.
In that folder I had two kinds of treasures.
Some were algorithms: sorting a table in memory, binary search (which sounds like a medical procedure, but was merely the civilised method of not searching blindly).
Others were technological “tricks”: how to link a FORTRAN program to a COBOL routine — that is, how to make two worlds that regarded each other with suspicion work together. Today you call it interfacing. Back then, you said to yourself, under your breath, “let’s hope it doesn’t crash at link-edit.”
And now comes the question I pose as one might display a period photograph:
Does anyone still remember direct file organisation on disk in FORTRAN?
In those days, if you wanted to find a record quickly in a large file, you had no “database,” no “index,” no “search.”
You had the disk.
A disk with tracks and sectors, which spun gently but had no inclination whatsoever to be read like a novel from the first page to the last. I wanted to go directly to a record — say, to a wagon code or to a serial number.
And so I used a trick that is once again in fashion today, only wrapped in more elegant packaging: hashing.
Hashing, in terms everyone can understand, is a rule by which you transform a key — a number, a code, a name — into a position where you should be able to find it. As if you were to take “Popescu, Ion” and, without opening the telephone directory, know from the start on which shelf it sits. Not because you are a sorcerer, but because you use a formula. In the seventies, the classic formula was simple and practical: you divided the key by a prime number and took the remainder.
That remainder was not poetry. It was an address. A real place on the disk, with an almost geographical sense: “here, in this area.” Why a prime number? Because keys have bad habits. They come in series, they have patterns, they end in zero, they are multiples of two or five — exactly like people who, if given a single entrance, all crowd through the same door. A prime number reduces that sort of “congregation.” It does not work miracles, but it distributes more evenly. It is a gentle discipline, like a good doorman: he does not stop you, he merely disperses you.
Collisions arose, of course: two different keys could yield the same remainder. In that case, you had a simple, era-appropriate solution: the overflow area. An annex to the file, a kind of storeroom where you put the “duplicates.” It was not pretty, but it worked.
And in those days, prettiness was optional; functioning was compulsory. Today, when you use a dictionary in Python or a hash map in Java, you are doing the very same thing, only without seeing the mechanism. The key is transformed into an index within a table in memory. The address is no longer physical (track and sector); it is logical (a place within a structure).
If the structure fills up, the system enlarges its own space and moves everything, recalculates, rearranges. Rehashing. In short: a furniture rearrangement carried out in the night, without asking your permission.