I'm actually using 3 data structures to perform add, cancel, match efficiently:
orders_by_id: map<u64, order>
can also be split by side
levels: BST keyed by price
e.g. a red-black binary tree
holding references to orders in a FIFO deque (growable ring buffer) + total order size for level
cancel operation:
1. find order by ID in map, mark as cancelled
That's it, looking for the orders in levels is far too inefficient and since the order is held by reference it's already marked as cancelled and can be popped from the queue when encountered.
add:
if not fully filled then:
1. Add to orders_by_id
2. Find price level in levels (add if doesn't exists) and add to the deque
match:
1. find price level, iterate through deque as needed. Pop off matched orders, and remove from orders_by_id
An alternative to a BST for price levels could be a pre-allocated contiguous array if you place some limits like: min tick size 0.1 and max price 1 million = allocate 10 million price levels. This would allow for instant lookup and not take up much RAM, though when matching you would have to iterate over empty price levels (probably very fast and orders will be clustered close to the best bid/ask).
Another note: best to multiply your doubles/floats up to unsigned ints as you don't want any loss of precision doing arithmetic in the orderbook.