Thanks for building this tool, and I found it very interesting and useful.
I got a question about how the underlying calculation works.
In particular, I found in intel/*.c, the memory accesses are profiled using
PMU counters OFFCORE_RESPONSE_0 and OFFCORE_RESPONSE_1 for memory node 0
and 1. With that, I can calculate the number of local and remote accesses
of a process if it is constantly running on the same CPU socket.
However, how does this work if the process is not pinned on a particular
CPU socket, or it is a multi-threaded program and the threads span both the
Thank you and looking forward to hear from you!