Together, Krapivin (now a graduate student at the University of Cambridge), Farach-Colton (now at New York University) and Kuszmaul demonstrated in a January 2025 paper(opens a new tab) that this new hash table can indeed find elements faster than was considered possible. ln so doing, they had disproved a conjecture long held to be true.
“It’s an important paper,” said Alex Conway of Cornell Tech in New York City. “Hash tables are among the oldest data structures we have. And they’re still one of the most efficient ways to store data.” Yet open questions remain about how they work, he said. “This paper answers a couple of them in surprising ways.”
Hash tables have become ubiquitous in computing, partly because of their simplicity and ease of use. They’re designed to allow users to do exactly three things: “query” (search for) an element, delete an element, or insert one into an empty slot. The first hash tables date back to the early 1950s, and computer scientists have studied and used them ever since. Among other things, researchers wanted to figure out the speed limits for some of these operations. How fast, for example, could a new search or insertion possibly be?
The answer generally depends on the amount of time it takes to find an empty spot in a hash table. This, in turn, typically depends on how full the hash table is. Fullness can be described in terms of an overall percentage — this table is 50% full, that one’s 90% — but researchers often deal with much fuller tables. So instead, they may use a whole number, denoted by x, to specify how close the hash table is to 100% full. If x is 100, then the table is 99% full. If x is 1,000, the table is 99.9% full. This measure of fullness offers a convenient way to evaluate how long it should take to perform actions like queries or insertions.
Researchers have long known that for certain common hash tables, the expected time required to make the worst possible insertion — putting an item into, say, the last remaining open spot — is proportional to x. “If your hash table is 99% full,” Kuszmaul said, “it makes sense that you would have to look at around 100 different positions to find a free slot.”
In a 1985 paper, the computer scientist Andrew Yao, who would go on to win the A.M. Turing Award, asserted that among hash tables with a specific set of properties, the best way to find an individual element or an empty spot is to just go through potential spots randomly — an approach known as uniform probing. He also stated that, in the worst-case scenario, where you’re searching for the last remaining open spot, you can never do better than x. for 40 years, most computer scientists assumed that Yao’s conjecture was true.
Krapivin was not held back by the conventional wisdom for the simple reason that he was unaware of it. “I did this without knowing about Yao’s conjecture,” he said. His explorations with tiny pointers led to a new kind of hash table — one that did not rely on uniform probing. And for this new hash table, the time required for worst-case queries and insertions is proportional to (log x)2 — far faster than x. This result directly contradicted Yao’s conjecture. Farach-Colton and Kuszmaul helped Krapivin show that (log x)2 is the optimal, unbeatable bound for the popular class of hash tables Yao had written about.
“This result is beautiful in that it addresses and solves such a classic problem,” said Guy Blelloch(opens a new tab) of Carnegie Mellon.
“It’s not just that they disproved [Yao’s conjecture], they also found the best possible answer to his question,” said Sepehr Assadi of the University of Waterloo. “We could have gone another 40 years before we knew the right answer.”