I've been working on a telemetry tool for analyzing applications. Specifically, this is for game engines, but realistically it could be applied to any software. One of the plugins I'm working on analyzes memory usage patterns in real time. It does so by getting basic heap and memory allocation information from the client and rebuilding the data on the server (the telemetry tool.)
To store the pointer information, I initially used a non-recursive AVL tree as a test. AVL trees are O(log(n)) because they're always balanced, so inserting, removing, and finding the data will be fast, right?
After running the application with a heavy client (the game) memory allocation count, I found that the AVL trees performance was pretty bad. In fact, it was unusable. As the count of allocations increased, the time taken to rebalance the tree grew and grew. In short, the telemetry tool simply couldn't keep up with the game.
So I replaced the AVL tree with a simple hash map. It's a non-optimal implementation; it has a statically set block count, it's not lock-free, and is a simple first pass quicky algorithm that I implemented for something else. It's definitely not built for speed or efficiency.
But it was faster. Not only was it faster, it literally blew away the AVL implementation. My CPU usage went from 100% to 27% and the telemetry tool was back to waiting for data to arrive, not trying to process it. I could visually see a massive difference by just watching the application, not even doing timing tests on it.
The reason that the hash map, even in a non-optimal implementation, is faster is the way that memory allocations typically occur. A standard memory allocator creates a large block of memory and then hands out pieces of it. This typically happens linearly, handing out allocations from the start to the end of the block. AVL trees perform best when having random key values inserted into them, not organized and linear values. When the AVL tree inserts the data based on the addresses, it rebalances the tree very often, shuffling entire branches of the tree around.
The hash map, on the other hand, doesn't care about allocations being in order, out or order, or randomly chosen. It masks out portions of the key, jumps directly to the correct insertion list, and pushes the data into the list. Searches and removals are then slow linear searches through the keyed lists.
So the moral of the story is, don't assume that because your algorithm is efficient that your use of it is also efficient. Know your data; Applying the right algorithm depends on what your data is doing, not how awesome your O notation is.