You’re right the basics are quite easy. Having just written a virtual grid (horizontal virtualisation too) here’s a tip: use transform: translate(x,y) to offload work to the GPU, makes a huge difference for animating e.g on sort and filter but probably unanimated too.
As you said things become significantly harder with variable sized items. As you guessed the approach that react-window and react-virtualized use is to lazily cache size and offset (cumulated). I measure until the largest offset required, then do an exponential search backwards, then an inner binary search to find the other side. Or if already measured then a binary search to that largest offset. (They do pretty much the same).
I have also implemented pinned left/right columns as well as pinned top/bottom rows. And multi column and row cell spanning. That’s some more added complexity. At first I pinned the items by using the scroll left and top values to give the illusion of static-ness and this works perfectly on desktop but looks messed up on mobile due to scroll acceleration not being accurately reflected in the scroll event so items jitter or move in a wave pattern. So I ended up having to render into 9 possible containers and use position sticky for pinned ones.
Also you should support RTL scroll behaviour and it’s handled differently in different browsers. See react-window for normalisation technique on scrollLeft value.
So adding all that together it becomes a less than trivial task!