Measuring Performance

February 22, 2021 16:36

Using the Storyboard logger plugin it is possible to capture metrics detailing various aspects of a Storyboard applications performance. These metrics include screen, layer and control redraw times, action execution times and general event processing times. If a performance log file is captured as and saved with the file extension .plog (for performance log) then Storyboard Designer will automatically recognize it and open up a log file viewer that provides an organized display and base analysis of the performance events.

For more information on options to configure and control the performance monitoring of the engine, refer to the Logger plugin section of this document and the gra.perf_state action.

The Storyboard Embedded Engine runtime also provides a number of API functions that can be used at runtime to extract and display performance information.

grd_fps (string, 1s0)

The frame rate of display updates averaged over the last 5 seconds of display. This variable is only created and filled in if the -oscreen_mgr,fps option is passed to the Storyboard Engine.

Storyboard display updates are entirely event driven, so unless the application that is being run is continuously changing content or generating redraw events such as is frequently done by benchmarking applications, this value may not reflect the true drawing performance of the system.

gre.env("mem_stats") (Lua)

On systems where this information is available this returns the amount of process and heap memory that the Storyboard Engine is using.

collectgarbage("collect") (Lua)

This is a Lua API call and will identify how much memory the Lua script interpreter is consuming. This will be a subset of the information returned by gre.env().

This sample demonstrates how a periodic Lua script can be used to extract the local FPS The FPS value is stored as a string variable and may not exist until enough frames of data have been generated to derive a value.

-- Take a snapshot of the current execution metrics
function snapshot_metrics()
    local fps = gre.get_value("grd_fps") or 0 
    local mem = gre.env("mem_stats")
    local lua_mem = collectgarbage("count")
    
    --Normalize, not all systems have all data
    if(fps ~= nil) then
        print(string.format("FPS  : %d", tonumber(fps)))
    end
    if(mem.process_used) then
        print(string.format("MEM  : %d", mem.process_used))
    end
    if(mem.heap_used) then
        print(string.format(" HEAP: %d", mem.heap_used))
    end
    print(string.format(" LUA : %d", lua_mem))
end

-- Set up a periodic timer to snapshot execution metrics every 1s
function init_metric_snapshot(mapargs)
    gre.timer_set_interval(snapshot_metrics, 1000)
end

Action Execution Performance Considerations

All actions are executed within the context of an event delivery and as such their execution will have an impact on the overall throughput and responsiveness of the system. In particular with Lua scripts, it is important to limit the length of time that functions take to perform their work or to separate lengthy operations into separate tasks, threads or processes depending on the operating environment being used. The performance logs will provide a detailed account of how long various actions take to execute, but within your Lua scripts a profiling tool such as Lua Profile can provide additional insight into your script execution.

The screen manager listens for data changes and checks the state of controls to determine when the display needs to be refreshed. In general practice the screen manager throttles data updates in a way that batches changes together to prevent visual flicker and excessive update of the display (in particular in situations where the display is not double buffered). However, there may be sequences of events or data changes that do not occur atomically and may result in excessive work to be performed and consume un-necessary CPU cycles. In these situations when multiple events are going to be changing data values, moving controls, or generating more events which would cause the display to be updated, and they are known to occur in a particular sequence, it is advisable to hold the screen manager updates until all changes have been made. Once modifications are complete the screen manager can be released and the display updated is needed. The actions are as follows:

gra.screen.hold
gra.screen.release

Choosing the Right Image Format(s) Bit Depth

When creating an application the developer must define the target system screen resolution and color depth. This color depth information is used internally to decide how to create and render display elements in an efficient manner. When adding images to the user interface it is always preferable to create them in the desired color depth. If the application will be running in 16bit color then the most efficient image to render will be a 16bit image. If alpha blending/transparency is not required when this image is rendered then it is advisable to create images in the application color depth or at least remove the alpha channel in the image.

Framerate (Frames Per Second)

Selecting an appropriate framerate for your screen transitions and animations will depend on your target system. You may think that selecting a higher framerate will make your animations run smoother, however, if your system can’t keep pace with the selected framerate, Storyboard Engine will drop the frames it can’t display in a timely manner. This will result in the engine having to do more work to achieve a lower framerate than intended and will look worse than originally setting a lower framerate that the target could handle.

A framerate of 14 frames per second will look good for the majority of simple animations. The results may vary, though, depending on what is being animated, how long it is being animated for, and what the content beneath the animated element is composed of. The best plan is to evaluate your design and animations on your target hardware, and tune your settings appropriately.

Scaling Images

If you are only ever going to load an image once in you application don't scale the image, this is a performance hit at image render time. It's far better to use you favorite image editor and resize the image to exact size you intend to use it and turn the scale flag off.

Reducing Output Verbosity

Increasing the verbosity on sbengine is insightful when trying to track down behavioral issues and to gain a better understanding of the system behavior. However, don't forget to turn off the verbosity for release since the process of outputting diagnostic messages to a console or serial terminal can cause significant slowdown due to the limited bandwidth of the output devices.

Adjusting Engine Rendering Options

The Storyboard Engine provides a number of different global rendering defaults that can be adjusted via command line options at execution time.

If your application contains a number of rotated images, then the -orender_mgr,quality option can be used to trade between higher execution performance (0) and a better visual interpolation (2)

If your application is using an OpenGL renderer, then the -orender_mgr,multisample option can be adjusted to favour less GPU consumption with less anti-aliasing (0) or choose a smoother visual presentation but longer to render (4 or more).

Managing Resource Memory

By default sbengine uses as much memory as it requires to load all the assets that the application requires (images, fonts, scripts,...) but this can be tuned to save memory. Here are some options to help with this.

Remove any unused plugins from the plugins directory if you are simply setting a directory for the SB_PLUGINS environment variable. The plugins that are available and being loaded will be shown by passing the -i option to the sbengine command line utility.
Set the resource_mgr options for image and font cache to appropriate values. Remember the caches must be large enough to fit all the images and fonts for your most resource intensive screen.
Use the Load Scaled flag in image render extension options if you are loading a scaled version of an image (e.g., an image thumbnails screen). If you are only ever loading the image once you should resize the image before deployment to avoid the runtime cost of image scaling.

OpenGL Scene Graph Optimization

The OpenGL scene graph introduced in Storyboard 6.0 turns the old synchronous immediate mode OpenGL render manager into an asynchronous deferred OpenGL renderer. The new rendering paradigm allows the Engine to re-sequence, sort and combine GL calls in ways that are more efficient for both the CPU and GPU. For example, with the ability to sort, the depth buffer can be leveraged for all rendering elements on the current screen, which when combined with separate opque and alpha operation ordering allows the engine to use hardware accelerated depth testing to greatly reduced overdraw on the GPU. This is functionality that should improve an application's performance without any change to the application itself.

The OpenGL scene graph also adds support for batching for some common elements and state transitions. This allows for previously separate elements with each their own expensive GL draw calls to be combined together based. This has the effect of greatly reduced the overall number of GL draw calls in particular. Currently fills can be batched, along with glyphs and images for which we leverage our image atlas, that allows us to access multiple images or glyphs from a single texture.

To maximize the benefit of batching, an application design should try to keep as many common classes of rendering elements together within the Z-order of the design to help improve the probability that they will be batched together. For example try to avoid a scenario where controls with single render extensions are organized (in a stacked Z order) as image, fill, image, fill a more ideal scenario would be (where possible): image, image, fill, fill.

In order to offer some additional configurability for design scenarios where the new rendering optimizations are not aligned with the application design, it is possible to control the batching and sorting behaviour with the engine options -orender_mgr,no_batch or -orender_mgr,no_z_sort.

The other configuration option that can play a significant role in rendering performance are the resource manager options that control the size of the font and image cache. The size of this cache plays a significant role in how much content can be batched together for draw operations. The -oresource_mgr,image_block_size={-1|0|X} and -oresource_mgr,font_block_size={-1|0|X} options can be use to configure the size and behaviour of the caches generally the larger the block sizes, the great chance things will be batched with a large potential gain on performance at the cost of a potential increase memory usage.