Zhiyuan and Intel folks,
This is a follow-up on the va-api h264 hardware accelerated encoding
performance gap on igvt-g kernel in the host OS.
We ran the same perf test with 2016Q2 release and the performance
degradation is about the same.
kernel: built from igvt-g 2016Q2 release
CPU: skylake i7-6700K
motherboard: asus z170-e
va-api stack: libva 1.7.2 with libva-intel-driver-1.7.2 stable release
perf test tool: h264encode in libva-1.7.2/test/encode
perf test command line: sudo ./h264encode --intra_period 30
--idr_period 90 --ip_period 1 --entropy 1 --rcmode CQP --initialqp 28
--minqp 22 --profile MP -srcyuv /run/shm/test.yuv -f 25 -w 1920 -h
1080 -n 0 --syncmode
Note: test.yuv is copied to ramdisk (/run/shm/test.yuv) to rule out
disk read ops.
Performance summary with vgt turned on:
PERFORMANCE: Frame Rate : 44.46 fps (300 frames, 6747 ms
(22.49 ms per frame))
PERFORMANCE: Compression ratio : 38821:1
PERFORMANCE: UploadPicture : 3967 ms (13.22, 58.80% percent)
PERFORMANCE: vaBeginPicture : 0 ms (0.00, 0.00% percent)
PERFORMANCE: vaRenderHeader : 2 ms (0.01, 0.03% percent)
PERFORMANCE: vaEndPicture : 1913 ms (6.38, 28.35% percent)
PERFORMANCE: vaSyncSurface : 850 ms (2.83, 12.60% percent)
PERFORMANCE: SavePicture : 3 ms (0.01, 0.04% percent)
PERFORMANCE: Others : 12 ms (0.04, 0.18% percent)
Performance summary with vgt turned off:
PERFORMANCE: Frame Rate : 183.82 fps (300 frames, 1632 ms
(5.44 ms per frame))
PERFORMANCE: Compression ratio : 38821:1
PERFORMANCE: UploadPicture : 557 ms (1.86, 34.13% percent)
PERFORMANCE: vaBeginPicture : 0 ms (0.00, 0.00% percent)
PERFORMANCE: vaRenderHeader : 0 ms (0.00, 0.00% percent)
PERFORMANCE: vaEndPicture : 715 ms (2.38, 43.81% percent)
PERFORMANCE: vaSyncSurface : 346 ms (1.15, 21.20% percent)
PERFORMANCE: SavePicture : 4 ms (0.01, 0.25% percent)
PERFORMANCE: Others : 10 ms (0.03, 0.61% percent)
This is about ~4 times slower. As shown by the performance breakdown,
the major slowdown is contributed by UploadPicture, vaEndPicture and
vaSyncSurface.
One thing we noticed with vgt on is the DVFS on GPU is rather
conservative, the GPU stays at min (350MHz) during the perf test while
with vgt off the GPU boosts up to 1150 almost immediately. After we
force the GPU to run at max speed, vaEndPicture and vaSyncSurface
behave the same as w/out vgt. However, UploadPicture (copy input data
from CPU mem to GPU mem) is as slow and becomes the key performance
problem.
Performance summary with vgt turned on and GPU forced to run at max (1150MHz)
PERFORMANCE: Frame Rate : 59.46 fps (300 frames, 5045 ms
(16.82 ms per frame))
PERFORMANCE: Compression ratio : 38821:1
PERFORMANCE: UploadPicture : 3950 ms (13.17, 78.30% percent)
PERFORMANCE: vaBeginPicture : 0 ms (0.00, 0.00% percent)
PERFORMANCE: vaRenderHeader : 0 ms (0.00, 0.00% percent)
PERFORMANCE: vaEndPicture : 729 ms (2.43, 14.45% percent)
PERFORMANCE: vaSyncSurface : 353 ms (1.18, 7.00% percent)
PERFORMANCE: SavePicture : 1 ms (0.00, 0.02% percent)
PERFORMANCE: Others : 12 ms (0.04, 0.24% percent)
My questions based on the data are:
1. Does the GPU DVFS algorithm work as intended w/ vgt on? If so, can
we tune any parameter so DVFS works the same as when vgt is off?
2. Why does the data transfer from CPU mem to GPU mem become so much
slower (~7 times slower)? Is there a way to fix it?
Best,
Kristine
On Mon, Jun 20, 2016 at 7:44 PM, Zhiyuan Lv <zhiyuan.lv(a)intel.com> wrote:
Hi Kristine,
Sorry for the late reply! We have been thinking to reproduce your
performance data first then come back to you, but did not yet do that.
So reply your below questions in line:
On Wed, Jun 08, 2016 at 10:18:40PM -0500, Kristine Ferrell wrote:
> Hi Zhiyuan,
>
> Thank you for providing the details. Will the next release include the
> optimization you mentioned?
This has been included in our previous release. In code, it is
controlled by an option "spt_out_of_sync" which can be found in vgt.c.
So it should not be the root cause of the performance gap. We will try
to run your case and see.
>
> Regarding MSDK, I'm quite curious as the latest MSDK release is based
> on an earlier version linux kernel than ivgt-g 2016Q1 and some work is
> needed to merge the MSDK kernel patches into igvt-g kernel. So is the
> media performance test with MSDK done with an internal codebase or did
> I miss something?
igvt-g release is mainly about the host kernel implementation of virtual
gpu. When running MSDK, we do that inside a guest VM, and its kernel
is not necessarily the same as host. The only thing needed is to back
port our guest kernel patches to the MSDK kernel. Thanks!
Regards,
-Zhiyuan
>
> Best,
> Kristine
>
> On Wed, Jun 8, 2016 at 6:27 PM, Zhiyuan Lv <zhiyuan.lv(a)intel.com> wrote:
> > Hi Kristine,
> >
> > Internally we test media performance with MSDK, not the open source
> > media driver, and the MSDK data is close to native on Haswell. If
> > there is three time slower, we need to investigate the reason.
> >
> > According to our earlier experience, there are some factors impacting
> > the encoding performance:
> >
> > - semaphore feature. It has bigger performance impact than native if
> > it is disabled.
> > - The frequency of PPGTT page table modifications. We have an
> > optimization for that.
> > - The frequency of MMIO operations. If there are very frequent
> > small workload submission, the performance will be impacted.
> >
> > Thanks!
> > -Zhiyuan
> >
> >
> >
> > On Tue, Jun 07, 2016 at 11:53:12AM -0500, Kristine Ferrell wrote:
> >> Hi Terrence,
> >>
> >> With vgt turned off (i915.vgt) the performance numbers are the same as
> >> with 4.2.0-27-generic #32~14.04.1-Ubuntu kernel. The numbers are shown
> >> below:
> >>
> >> 4.3.0-rc6-vgt-g3f44e6a with vgt turned on:
> >>
> >> PERFORMANCE: Frame Rate : 45.69 fps (360 frames, 7880 ms
> >> (21.89 ms per frame))
> >> PERFORMANCE: Compression ratio : 118:1
> >> PERFORMANCE: UploadPicture : 4278 ms (11.88, 54.29% percent)
> >> PERFORMANCE: vaBeginPicture : 0 ms (0.00, 0.00% percent)
> >> PERFORMANCE: vaRenderHeader : 4 ms (0.01, 0.05% percent)
> >> PERFORMANCE: vaEndPicture : 86 ms (0.24, 1.09% percent)
> >> PERFORMANCE: vaSyncSurface : 3487 ms (9.69, 44.25% percent)
> >> PERFORMANCE: SavePicture : 14 ms (0.04, 0.18% percent)
> >> PERFORMANCE: Others : 11 ms (0.03, 0.14% percent)
> >>
> >> 4.3.0-rc6-vgt-g3f44e6a with vgt turned off:
> >>
> >> PERFORMANCE: Frame Rate : 181.82 fps (360 frames, 1980 ms
> >> (5.50 ms per frame))
> >> PERFORMANCE: Compression ratio : 118:1
> >> PERFORMANCE: UploadPicture : 445 ms (1.24, 22.47% percent)
> >> PERFORMANCE: vaBeginPicture : 0 ms (0.00, 0.00% percent)
> >> PERFORMANCE: vaRenderHeader : 3 ms (0.01, 0.15% percent)
> >> PERFORMANCE: vaEndPicture : 38 ms (0.11, 1.92% percent)
> >> PERFORMANCE: vaSyncSurface : 1477 ms (4.10, 74.60% percent)
> >> PERFORMANCE: SavePicture : 6 ms (0.02, 0.30% percent)
> >> PERFORMANCE: Others : 11 ms (0.03, 0.56% percent)
> >>
> >> The test utility is libva-1.7.0/test/encode/h264encode.
> >>
> >> Best,
> >> Kristine
> >>
> >> On Mon, Jun 6, 2016 at 1:48 AM, Xu, Terrence <terrence.xu(a)intel.com>
wrote:
> >> > Hi Ferrell,
> >> >
> >> >
> >> >
> >> > Can you also provide the 4.3 Native data for comparison? (You can boot
up
> >> > host igvt-g code with set i915.vgt=0 in grub)
> >> >
> >> >
> >> >
> >> > Thanks
> >> >
> >> > Terrence
> >> >
> >> >
> >> >
> >> > From: iGVT-g [mailto:igvt-g-bounces@lists.01.org] On Behalf Of
Kristine
> >> > Ferrell
> >> > Sent: Monday, June 06, 2016 10:10 AM
> >> > To: igvt-g(a)lists.01.org
> >> > Subject: [iGVT-g] Fwd: va-api h264 hw accelerated encoding performance
issue
> >> > with igvt-g 2016Q1 release kernel/driver
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > ---------- Forwarded message ----------
> >> > From: Kristine Ferrell <dallasbixin(a)gmail.com>
> >> > Date: Sunday, June 5, 2016
> >> > Subject: va-api h264 hw accelerated encoding performance issue with
igvt-g
> >> > 2016Q1 release kernel/driver
> >> > To: igvt-g(a)lists.01.org
> >> >
> >> >
> >> > Hi Guys,
> >> >
> >> > I noticed when using va-api to accelerate h264 encoding on intel
> >> > platform with i7-4790K (HD Graphics 4600), for 1080P video frames, per
> >> > frame average encoding time is 3 times of the time tested under
> >> > 4.2.0-27-generic #32~14.04.1-Ubuntu kernel.
> >> >
> >> > The latest libva and intel vaapi driver (1.7.0) are used in both
tests.
> >> >
> >> > Is this a known side effect caused by the igvt-g feature? If yes, may
> >> > I know when this is expected to be fixed?
> >> >
> >> > Best,
> >> > Kristine
> >> >
> >> >
> >> >
> >> > --
> >> > Sent from Gmail Mobile
> >> _______________________________________________
> >> iGVT-g mailing list
> >> iGVT-g(a)lists.01.org
> >>
https://lists.01.org/mailman/listinfo/igvt-g