Case Study: Optimizing Cyberlink PowerDVD to improve battery life on Intel devices

Case Study: Optimizing Cyberlink PowerDVD to improve battery life on Intel devices
HIGHLIGHTS

This case study demonstrates how one can improve battery life on Intel based devices by optimizing applications using Intel tools

Introduction

Low battery life is one of the most serious issues currently plaguing mobile devices in general and Ultrabook™ devices and tablets specifically. Users have become accustomed to streaming multimedia content to their mobile devices “on-demand” from content servers in the cloud. Because these devices have limited battery capacity, energy efficiency is important. Cyberlink PowerDVD 10* (PowerDVD*) is one of the top players in the industry for HD, and 3D movie playback. This app is often included as a pre-bundled application from OEMs. In this case study, we showcase how Intel and Cyberlink collaborated to optimize the PowerDVD* application to give best-in-class experience on Intel devices.

First, we'll talk about the challenges that Cyberlink encountered when adding content streaming features to PowerDVD and the tools and techniques Intel used to improve the power consumption of PowerDVD.

Then, we'll discuss the power consumption profile of a Cyberlink PowerDVD streaming media application and its impact on battery life for mobile devices. We also provide an analysis of PowerDVD behavior to identify issues such as decoding on CPU, large numbers of context switches, high interrupt rates, etc., causing increased power consumption. Finally, we'll provide the data that shows the reduced power consumption following optimization.

The optimization was a huge success. The Intel team was able to make the following improvements to PowerDVD:

  • Package C0 reduced to 20% from 100% during media playback
  • Reduced SoC power from ~6 W to ~1.8W using Intel® Power Gadget
  • Intel® VTune™ analyzer reported CPU utilization of 25% down from 70%
  • The Windows* Performance Analyzer showed frequent wakeups (5 Msec) vs. 10 msec wake up frequency for local or streaming media playback frequency of 10%.

Definitions

Acronym

Definition

BLA

Battery Life Analyzer

GPU

Graphics processing unit

WPA

Windows Performance Analyzer

DLNA Server

Digital Living Network Alliance Server

HD

High density

SoC

System on Chip

FPS

Frames per second

SDK

Software development kit

SKU

Stock Keeping Unit

The Challenges of Optimizing Battery Life

PowerDVD offers new features for organizing, streaming media, mobile devices, and social media. In addition to functioning on a client, the latest software can turn a device into a DLNA server and stream multimedia content from a PC across a network to other devices. It can also stream content from external content servers. Adding content streaming came with a price, however. New capabilities, such as HD streaming, required running more processes, consuming much more memory and CPU cycles. This took a toll on battery life. We needed to answer the following questions:

  1. What is the power consumption from PowerDVD during a 1080p streaming media playback?
  2. Why was PowerDVD able to playback only an hour of media on a fully charged battery?

After two months and three iterations of analysis and validation, the engineering teams improved battery life by making the following changes:

  • Offloaded graphics to the GPU (using the Intel® Media SDK)
  • Removed the sleep loop calls from two threads
  • Used an overlay to reduce extra memory copies

The following describes the process and tools that resulted in the optimized version of PowerDVD.

Optimization of Cyberlink PowerDVD for Power Consumption

Test System Configuration:

  • 4th generation Intel® Core™ i7 processor
  • Lenovo Yoga* 2 Pro
  • CPU speed : 1.4 GHz non-turbo frequency
  • Memory 4 GB display : 1920x1080p HD panel
  • Cyberlink PowerDVD 10 and Cyberlink PowerDVD 12

Validation and analysis showed:

  • Package C0 was pegged 100% during media playback, while we expected it to be at 20%.
  • Intel Power Gadget showed SoC power to be ~6 W. It should be ~1.7 W on a 4th generation Intel processor.
  • Intel VTune results revealed no offloading of graphics to the GPU and high CPU utilization of 70% (we expected about 10%)
  • The Windows Performance Analyzer tests revealed frequent wakeups (5 msec). The normal frequency is 10 msec with audio playback.

First Step – Validation

To understand and address PowerDVD's impact on battery life, we used Intel Power Gadget and Battery Life Analyzer (BLA) to validate the application's SoC power usage. Figure 1 shows the Intel Power Gadget's UI on a Windows platform.

 


Figure 1. Intel® Power Gadget UI on Windows* Platform

As part of our validation of PowerDVD, we used Intel Power Gadget to determine power impacts during playback. Figure 2 shows the power output Intel Power Gadget recorded.

PowerDVD's power usage was ~6 W of SoC power during playback. Intel recommends a maximum of ~2.0 W on 4th generation Intel processors (low power processors typically used in Ultrabook devices).


Figure 2. Processor Power Usage during PowerDVD* Playback

To gain deeper insight into what other activities were affecting power, we used the Battery Life Analyzer (BLA) tool to understand the impact of media playback on residencies. Understanding residency is important as changing the SoC SKU can impact power.

BLA is a power management analysis tool developed by Intel to identify issues that impact battery life. BLA helps to identify a wide range of issues during software analysis such as:

  • Software CPU utilization
  • OS timer resolution changes
  • Frequent C state transitions
  • Excessive ISR/DPC activity

Figure 3 shows package residency during 1080p HD video playback using Cyberlink PowerDVD.


Figure 3. Package Residency during 1080p HD Video Playback using PowerDVD*

The package residency includes CPU, Graphics, and UnCore events. More time in package C0 results in higher SoC power. Expected package C0 for Cyberlink PowerDVD 1080p playback is ~20% on 4th generation U-Processor. As we can see from Figure 3, package residency is far higher than it should be.

Both Intel Power Gadget and BLA confirmed higher power usage and ~4 hrs. of battery life on 42 Whr (Watt-hours) battery capacity with ~6 W SoC+3 W of display and 2+ W for other components.

Our next step was to analyze the application for power optimization.

Second Step – Analysis

For the analysis phase, we used two tools:

The following tables summarize the results of the analysis, which showed definite room for improvement.

Table 1. Intel® Power Gadget and BLA Results

Actual Results

Expected Results

Package C0 is pegged at 100% during media playback

Package C0 should be at 20% during media playback

SoC power using Intel® Power Gadget is ~6 W

SoC power should be ~1.7 W on 4th generation Intel processor

 

Table 2. Intel® Vtune™ and WPA Results

Analysis Tool

Observations

Intel VTune results

  1. Since the app had no codecs, there was no offloading to graphics
  2. High CPU utilization (70% vs. the expected 10%)

Windows Performance Analyzer

Frequent wakeups (5 msec) occurred- expected frequency is 10 msec with audio playback

The next figures provide a walkthrough of some of the important screenshots from our analysis.

Intel VTune analyzer was used to validate the PowerDVD application for the presence of spin waits, the presence of hardware acceleration, and hotspots (a micro-architecture issue). Figure 4 shows the steps for collecting the graphics call stacks.


Figure 4. VTune™ UI for Analyzing DirectX* Pipeline Events

Figure 5 shows the VTune summary with significant time spent in spin loop. GPU Usage shows no codec usage. Most of the time spent in the GPU is for display and other pre-processing algorithms during playback.


Figure 5. VTune™ Summary showing Spin Loop time

Digging deeper into the analysis, Intel VTune shows high CPU utilization during media playback, and instances where VSync (the red highlights in Figure 5) and GPU software queue are not occurring every ~33 msec (30 FPS playback). This analysis shows software glitches during media playback.


Figure 6. VTune™ Summary Report

Looking at Figure 7, the summary report confirms an inconsistent frame rate over time. The FPS varies for 30 FPS movie playback between 0-60 FPS. The chart shows the total number of frames executed in an application with a specific frame rate. A high number of slow or fast frames signals a performance bottleneck. The goal is to optimize the code to keep the frame rate constant, for example, from 30 to 60 FPS.


Figure 7. VTune™ analysis of Frame Rates

Next, we used the Windows Performance Analyzer (WPA) tool to analyze the application for wakeup activities, interrupts, and context switches. Figure 8 shows using CPU-based Intel® SSE instructions for H264 decode. It is more efficient to offload this work on to the GPU than to run it on the CPU.


Figure 8. WPA Analysis of Wakeup Activities, Interrupts, and Context Switches

WPA also shows wakeup activities from PowerDVD during playback. Figure 9 displays the two PowerDVD threads, both running at 10 msec. The two threads are not coalesced, which causes the overall system to wake up at a 5 msec timer interval. Figure 10 shows the call stack with sleep loop Win32* API being called every 10 msec interval.


Figure 9. WPA thread analysis


Figure 10. WPA call stack with sleep loop analysis

Table 3 reveals significant reduction in package residency after optimization.

Table 3. Validating Package Residency after Optimization

C-state Counters

Average (%) Before Optimization

Average (%) After Optimization

PackageC0-C1

100%

20.18%

PackageC0-C2

0%

8.29%

PackageC0 C3

0%

0%

0.19%

PackageC0 C6

0%

1.91%

PackageC0 C7

0%

69.43%

Optimization Results/Validation

The following tables show the “before” and “after” results:

Table 4. Intel® Power Gadget and BLA: Before and After1

Before Optimization

After Optimization

Package C0 is pegged at 100% during media playback

Package C0 is reduced to 20%

SoC power is ~6 W

SoC power reduced to ~1.8 W on test system

Table 5. Intel® VTune™ Amplifier and WPA Results: Before and After1

 

Before

After

Intel® VTune™ Amplifier

  • Since the app had no codecs, there was no offloading to graphics
  • High CPU utilization (70% vs. the expected 10%)
  • Video codecs now reported
  • CPU utilization decreased by 25%

Windows Performance Analyzer

Frequent wakeups (5 msec) – expected frequency is 10 msec with audio playback

Sleep thread removed – reduced wakeups by 2x (5 msec to 10 msec)

Battery Life Analyzer

Package residency 100%

Package residency ~20%

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance

We optimized by:

  1. Offloading to Intel® HD Graphics using Intel Media SDK
  2. Optimizing Win32 API calls that cause periodic wakeup on CPU
  3. Using an overlay to save one memory copy per frame

The first task was to use the Intel Media SDK for offloading decode to graphics which will provide better efficient/watt usage of Intel HD graphics. The pseudo code in Figure 11 provides an example of a simple use of Intel Media SDK to offload a stream of frame to graphics.


Figure 11. Intel® Media SDK code snippet – offloading a frame to graphics.

Once we offloaded to graphics using the Intel Media SDK, we ran PowerDVD and measured the results using Intel VTune Amplifier. Compared to Figure 5 where we didn't see any codec usage, we now see Video Enhancement in the summary (Figure 12).


Figure 12. Intel® VTune™ Amplifier Summary result

Examining other Intel VTune graphics views, we verified that by using Intel Media SDK [to do what?] use of frame decoded on the GPU vs. on the CPU. Figure 13 shows a batch of frames being decoded after ~20 msec on GPU. Offloading the decode work to the GPU helped to reduce CPU utilization by ~25% on the test system.


Figure 13. Frame decoding after ~20 msec on the GPU

To verify our optimization of offloading graphics, we ran Intel Power Gadget. Compared to the baseline result shown in Figure 2, we saw ~2 W of power saving just by performing graphics offloading (Figure 14).


Figure 14. Power Savings resulting from Graphics Offload

We made some good progress, but ~4 W was not low enough. As stated earlier, the goal for streaming media 1080p playback is ~1.7 W of SoC/package power.

The next step was to find other CPU-based optimizations. Initial analysis showed sleep loop calls from two threads (non-coalesced) waking the CPU every 5 msec. CyberLink engineers needed to remove the sleep threads from their application. However, this was one of the most difficult changes since it required modifying the structure of the application. Figure 15 shows wakeup activities increase to 10 mse after periodic activities were removed.


Figure 15. Optimized Cyberlink PowerDVD* after removing periodic activities

Removing periodic activities revealed a ~800 mW saving. With current optimizations, 1080p HD streaming playback SoC power went from ~6 W to 2.8 W, but additional optimizations still had to be done to reach the 1.7 W goal seen in best-in-class applications.


Figure 16. Power Optimizations down to ~2.8 W

The next step was to reduce extra memory copies using an overlay. With the overlay, the overall package power was reduced by ~400 mW. Figure 17 shows power was reduced to ~1.8 W from ~6 W.


Figure 17. Cyberlink PowerDVD* at final Power Consumption (1.8 W)

With that, the most important optimization goals had been achieved, and Intel and Cyberlink engineers deemed the project a success.

Close collaboration between Cyberlink and Intel helped to complete the optimization in two months with full validation. The final product with all optimizations was released to OEMs six months from when we started.

Conclusion

The Intel and PowerDVD engineers used several tools including Intel VTune and Microsoft Windows Performance Analyzer to reach the optimum low-power playback. The collaboration included knowledge sharing on tools with weekly analysis/meetings to meet the battery life goal before the release deadline.

Several iterations were completed before the team was satisfied with their results (PowerDVD consumes ~1.8 W down from ~6 W.) Intel and Cyberlink engineers faced the challenge of keeping the quality of playback the same before and after optimization. Each optimization required a validation and analysis process before it could pass the Cyberlink team's internal quality tests. Thus, every change was tracked and user experience metrics (power and performance) were evaluated.

The following optimizations were found to work the best for achieving the optimization goals, but as noted above, these were accomplished over several iterations:

  • Offloading graphics to the GPU (using the Intel Media SDK)
  • Removing sleep loop calls from two threads
  • Using an overlay to reduce extra memory copies

The combined efforts between the Intel and CyberLink PowerDVD team resulted in optimizing their streaming media playback application to reach the best-in-class goal.

Source: https://software.intel.com/en-us/articles/optimizing-cyberlink-powerdvd-10-improves-battery-life

For more such windows resources and tools from Intel, please visit the Intel® Developer Zone

Promotion
Digit.in
Logo
Digit.in
Logo