On Wed, May 09, 2018 at 03:41:44PM +0000, Stephen Bates wrote:
> Interesting point, give me a moment to check that. That finally makes
> all the hardware I have standing around here valuable :)
Yes. At the very least it provides an initial standards based path
for P2P DMAs across RPs which is something we have discussed on this
list in the past as being desirable.
BTW I am trying to understand how an ATS capable EP function determines
when to perform an ATS Translation Request (ATS TR). Is there an
upstream example of the driver for your APU that uses ATS? If so, can
you provide a pointer to it. Do you provide some type of entry in the
submission queues for commands going to the APU to indicate if the
address associated with a specific command should be translated using
ATS or not? Or do you simply enable ATS and then all addresses passed
to your APU that miss the local cache result in a ATS TR?
On GPU ATS is always tie to a PASID. You do not do the former without
the latter (AFAICT this is not doable, maybe through some JTAG but not
in normal operation).
GPU are like CPU, so you have GPU threads that run against an address
space. This address space use a page table (very much like the CPU page
table). Now inside that page table you can point GPU virtual address
to use GPU memory or use system memory. Those system memory entry can
also be mark as ATS against a given PASID.
On some GPU you define a window of GPU virtual address that goes through
PASID & ATS (so access in that window do not go through the page table
but directly through PASID & ATS).