Directx 11 Renderer For Windows 7

admin 12.04.2020 No Commentson Directx 11 Renderer For Windows 7

appgeorgia.netlify.app › Directx 11 Renderer For Windows 7 ▄ ▄ ▄

Directx 11 Renderer For Windows 7 9,7/10 7206 votes

Abstract

Rendering is usually the main performance bottleneck of PC games on the CPU; multithreaded rendering is an effective way to eliminate the bottleneck. This article investigates the performance scalability of DirectX* 11 multithreaded rendering, discusses two basic methods for multithreaded rendering, and introduces the case of traditional multithreading deferred shading pipelines in a large-scale online game, Conqueror's Blade*.

Background

Microsoft has releases Directx 11 / Direct3D 11 / DirectX 11 with Direct3D 11, which is specifically for Windows 7. This collection of SDK tools are particularly for Programmers who are looking forward to creating applications or 3D games that can run on Windows 7.Download a copy of Directx 11 for Windows 7 yourself.

Over the past 10 years, CPU chips in the PC market have shown great improvements. According to a software and hardware investigation by Steam*², 4-core processors (usually 8 logical cores) have become mainstream in the current PC game market. The 6-core processor (usually 12 logical cores) is already on its way to become the mainstream next-generation CPU. For example, the Intel® Core™ i7-8700K processor with 8 or more physical cores has been available since late 2017. We expect this trend to continue. In the next few years, 6-core and 8-core CPUs will become the most popular processors for gamers.

In many PC games, rendering is usually single-threaded and easily becomes the biggest performance bottleneck. This makes it difficult for games to utilize extra idle cores in a multicore processor to improve game performance or enrich game content. Although DirectX 12* has been around a few years, most of the games currently under development—especially the most popular online games — are still using DirectX 11. DirectX 11 is designed to support multithreading from the beginning¹. Therefore, investigating the performance scalability of DirectX 11 multithreaded rendering on current mainstream multicore platforms, and studying the methods of making full use of this feature have important reference value for the development and optimization of the majority of games.

DirectX* 11 Multithreaded Rendering Model

First, let's briefly review the DirectX 11 multithreaded rendering model (see Figure 1). DirectX 11 supports two types of rendering— immediate and deferred, based on two Direct3D* 11 device contexts — the immediate context and the deferred context. Immediate rendering calls draw APIs through immediate context, and the generated command is immediately sent to the graphics processing unit (GPU). Deferred rendering calls draw APIs through deferred context, but only records the draw commands in a command list that is submitted to the GPU by the immediate context at another time point. DirectX 11 supports the use of different deferred simultaneous contexts in multithreading. This strategy allows the rendering of complex scenes to be divided into multiple concurrent tasks; that is, multithreaded rendering.

Figure 1. DirectX* 11 multithreaded rendering model.

Evaluate DirectX 11 Multithreading Performance Scalability

Based on the hardware and software configuration of Table 1, we evaluate the performance scalability of DirectX 11 multithreaded rendering on multicore CPUs.

Table 1. Hardware and software configurations for performance scalability evaluation.

Configuration	Description
CPU	Intel^® Core^™ i7-6950X processor @ 3.00GHz (10 Cores)
Memory	2 x 16 GB RAM
GPU	NVIDIA GeForce* GTX 1080	AMD Radeon* RX Vega 64
Driver Version	22.21.13.8494	22.19.677.257
Operating System	Windows^® 10 Professional 64-bit
Test Program	Microsoft DirectX* SDK (June 2010) Sample: MultithreadedRendering11.exe

The evaluation uses the Intel Core i7-6950X processor (10 physical cores; that is, 20 logical cores) to simulate CPUs with different numbers of cores. To ensure that the GPU does not become a performance bottleneck for the test program, the test uses two high-performance discrete GPUs: NVIDIA GeForce* GTX 1080 and AMD Radeon* RX Vega 64. The test program uses the MultithreadedRendering11 routine in the Microsoft DirectX SDK*⁴, which is based mainly on the following considerations. First, the program performance is CPU-bound, and it is developed to demonstrate the DirectX 11 multithreaded rendering feature, which is conducive to maximizing the potential of performance scalability. Second, the main function of the program is rendering (each frame contains more than 4,000 draw calls), and there is no impact of animation, physical load, and so on. It can make scalability a result of DirectX 11 multithreaded rendering as much as possible. In addition, the program's scene complexity and rendering technology are pretty common in games, so that the test results are of representativeness. Last, but not least, the source code of the program is open, making it easy to analyze and understand the DirectX 11 multithreaded rendering methods, and the impact on scalability performance.

Figure 2. Test program

When running the test program, we chose the MT Def/Chunk mode, because the scalability in this mode is not limited by the number of game rendering passes (or scenes), but only by the number of CPU cores. The workload of each thread is relatively balanced, which can make full use of the computing power of the multicore CPU. During the test, we adjusted the CPU's active core number through the BIOS and tested the program's frame rate at each of these different core numbers. In order to compare the effects of different GPUs on DirectX 11 multithreaded rendering scalability, we divided the multithreaded frame rate on the same GPU by the single-threaded frame rate (immediate mode) under the same configuration, to obtain a normalized relative performance metric. The test results are shown in Figure 3.

Figure 3. Multicore performance scalability of DirectX* 11 multithreaded rendering.

As we can see from Figure 3, with two CPU cores, no matter which GPU we use, multithreaded rendering (MT Def/Chunk mode) performance is lower than single-threaded rendering (immediate mode). What leads to this result? According to the source code of the test program, the number of working threads is the number of CPU physical cores minus one. In other words, on a two-core CPU, in multithreaded rendering mode, only one working thread processes all scene draw calls based on deferred rendering, while the main thread does not assume any scene draw calls. In the single-thread rendering mode, all draw calls are processed by the main thread based on immediate rendering. This means that the overhead of deferred rendering is slightly larger than that of immediate rendering on the basis of handling an equal number of draw calls.

However, when the number of CPU cores is greater than two, the DirectX 11 multithreaded rendering performance is significantly better than that of single-threaded rendering, regardless of which GPU is used, and the performance increases as the number of cores increases. When paired with the NVIDIA GeForce GTX 1080, multicore performance scales very well; performance increase is almost linear from 2 to 6 cores. Even from 6 to 10 cores, the performance increase is significant. When paired with AMD Radeon RX Vega 64, the scalability is worse than that; especially when the number of CPU cores exceeds 4, the performance increase is almost negligible.

Why does the test program have such a large scalability difference for multicore performance on different GPUs? We used Microsoft GPUView* to capture the multithreaded activities of the test program (see Figure 4), and find that the bottleneck of the test program is on the CPU with either the NVIDIA GeForce GTX 1080 or the AMD Radeon RX Vega 64 GPU. However, multithreaded concurrency is better with the NVIDIA GPU, and the main thread blocking working threads is significantly longer with AMD graphics cards.

Figure 4. DirectX* 11 multithreaded rendering parallelism with different GPUs.

From the source code, we know that each working thread has a deferred context, and all draw calls for scene rendering are called by deferred context. The main thread contains an immediate context that is responsible for submitting the commands list generated in the deferred context to the GPU. Using Windows* Performance Analyzer to further analyze the module called by the working thread, we find that, on the NVIDIA GPU, all the working threads call the graphics driver module (see Figure 5), which means that a number of deferred context operations share some of the driver load, and make the immediate context operations bear less driver load, thereby shortening the occurrences of the main thread blocking the working threads. On the AMD GPU, the graphics driver module does not appear in the working thread but is concentrated on the main thread (see Figure 6), which means that a single immediate context bears a large amount of driver load, thus increasing the time of the working threads waiting for the main thread.

Figure 5. Working thread (deferred context) represents some of the NVIDIA driver load.

Figure 6. The main thread (immediate context) represents a large amount of the driver load.

By checking the GPU driver support for DirectX 11 multithreaded rendering features³ (see Figure 7) through the DirectX Caps Viewer, we learn that the NVIDIA GPU driver supports driver command lists, while the AMD GPU driver does not support them. This explains why the driver modules on different GPUs appear in different contexts. When paired with the NVIDIA GPU, working threads can build driver commands in parallel in a deferred context; while when paired with the AMD GPU, the driver commands are all built in serial in the immediate context of the main thread.

Figure 7. Support for DirectX* 11 multithreaded rendering by different GPU drivers.

Based on the above tests and analysis, we can draw the following conclusions:

Although the indirect load of deferred rendering is larger than that of the immediate rendering, the performance of the DirectX 11 multithreaded rendering can be significantly higher than that of single-threaded rendering, especially on current mainstream 4-core or more-core CPUs when using the appropriate rendering task division method — evenly distributing draw calls to contexts of more than two Direct3D* devices.
The performance scalability of DirectX 11 multithreaded rendering is GPU-related. When the GPU driver supports the driver command list, DirectX 11 multithreaded rendering can achieve good performance scalability, whereas performance scalability is easily constrained by the driver bottleneck. Fortunately, the NVIDIA GPU², with the largest share of the current game market, supports driver command lists.

Multithreaded Rendering Method

The performance scalability evaluation of the above DirectX 11 multithreaded rendering shows that on the current mainstream multicore CPUs and GPUs, multithreaded rendering on DirectX 11 games may achieve significant performance improvement. So, how do you effectively use the performance potential of multithreaded rendering? The MultithreadedRendering11 routine demonstrates two basic methods for dividing a rendering task into multiple threads:

1) Assign each thread a rendering Pass.

2) Assign each thread an equal amount of Chunk.

It should be noted that the multithreaded rendering method described here is not only suitable for DirectX 11 but also for DirectX 12. In fact, we can take the DirectX 11 deferred context as a DirectX 12 command list, and the DirectX 11 immediate context as a combination of the DirectX 12 command list and the command queue.

Figure 8 shows a multithreaded rendering method that divides the rendering task by Pass. Pass is a relatively independent rendering task. The typical Pass includes the generation of pre-Z buffers, shadow maps, reflection maps, G buffers, UI, and the main Pass generating the final frame buffer. With this method, each Pass is assigned with a working thread. A command list of this Pass is built into this working thread. The main thread is responsible for distributing Pass and orderly submitting the command list completed by the working threads. In the MultithreadedRendering11 routine, the main thread will orderly submit after all the working threads complete the command list. Figure 8 shows a better way: When a command list is completed, it should be immediately submitted to the GPU as long as the rendering order permits. Since the submitted command list is usually serial and associated with some overhead, the earlier the submission, the more serial time that can be shielded, which allows GPU processing in advance.

Figure 8. Divide the rendering task by Pass.

Dividing rendering tasks by Pass is easy to apply to the multiple-pass rendering technology commonly used in modern games. As long as Pass contains a relatively large amount of rendering load (draw call number), using this method in games is usually effective in improving performance. The shortcoming is that the performance scalability is limited by the number of Passes, and it is not easy to achieve load balance between Passes.

Figure 9 shows a multithreaded rendering method that divides rendering tasks by Chunk. Chunk is a granularity rendering task that is smaller than Pass. A typical Chunk can be a set of draw calls, a mesh, or a larger rendering unit such as a separate rendering object containing multiple meshes. In this method, each Pass is divided into Chunks, which are evenly distributed by the main thread to multiple working threads. Each working thread is responsible for building a command list. After each command list is completed, the main thread is responsible for submitting them in order. The number of working threads is determined based on the number of physical cores, rather than the number of logical cores, in order to avoid excessive command list submissions resulting in excessive overhead. The Pass as the unit of submitting the command list is conducive to unifying the rendering status of the command list, advancing GPU processing, and multiplexing the command list between Passes.

Figure 9. Dividing rendering tasks by Chunk.

The multithreaded rendering method that divides rendering tasks by Chunk can achieve a significant performance improvement, and the performance is not affected by the number of passes and increases with the increase of the number of CPU cores. The shortcoming is that for certain situations that require orderly rendering (such as rendering semi-transparent objects), the strategy of distributing Chunks is limited, and it is easy to lose the load balance among the threads, thereby affecting the performance scalability.

No matter which of the above multithreaded rendering methods is used, the following points should be noted:

Since the submitted command list is serial and with a certain amount of overhead, the command list should be executed immediately after it is completed and allowed by rendering order, rather than waiting for the other command lists. The former helps shield the serial time and relieves the GPU from burst load pressure.
To shield the overhead of using deferred contexts, each deferred context, whether for Pass or Chunk, should contain enough draw calls. If the number of draw calls processed by the deferred context is too small, you should consider handling these draw calls in the immediate context or combining them.
Try to balance the load between different contexts to maximize the advantages of multithreaded rendering.

Case Study

Here we introduce multithreaded rendering methods and effects achieved in a real DirectX 11 game. Conqueror's Blade⁵ is a large-scale online game developed by NetEase*. The game has large-scale outdoor battle scenes, a large number of characters on the same screen, and rich visual effects. These characteristics make the game demand more CPU resources. To enable players on the low-end CPU platforms to have a smooth gaming experience, developers continue to optimize the game engine for multithreading optimization in order to improve performance by fully utilizing CPU resources, or to improve game details.

Figure 10. Single-threaded rendering causes CPU performance bottlenecks.

Before the performance optimization of the game, the engine has achieved a certain amount of multithreading: some CPU-intensive tasks, such as game logic, physics, particles, animation, and other calculations use separate threads for execution. The rendering thread is mainly responsible for visibility detection and running the entire rendering pipeline. Nevertheless, the rendering thread is still a performance bottleneck for the game (see Figure 10). A typical combat scene with more than 5,000 draw calls per frame also results in considerable Direct3D runtime and driver overhead. The game uses the DirectX 11 API and a typical deferred shading pipeline. The task pipeline of the rendering thread is shown in Figure 11.

Figure 11. The game's task pipeline of rendering thread.

Based on some considerations such as limitation of game legacy code and implementation time, the game chooses a multithreading optimization method that divides rendering tasks according to Pass, which is a relatively easier implementation choice in a limited time. The specific implementation scheme is shown in Figure 12.

In the optimization scheme, Visibility is removed from the rendering thread and divided into two jobs: eye visibility and light visibility. The GBuffer generation, the shadow map generation, and the forward and transparent Passes that have or may dynamically have a large number of draw calls are also moved out of the rendering thread and encapsulated as a job that dispatches working threads. GBuffer generation has been further divided into three jobs: GBuffer Terrain, GBuffer Static, and GBuffer Dynamic, because there are too many draw calls. The rendering thread only retains the Scaleform UI. Deferred Shading and Post Process Passes must use immediate context rendering or have only a few draw calls.

Figure 12. Multithreaded rendering flowchart after game optimization.

During the operation process, the working thread first processes the two visibility check jobs in parallel, then these two jobs derive six jobs rendering passes, and the working thread builds a related Pass DirectX 11 command list in the deferred context. The rendering thread orderly executes the Passes left in the rendering thread and the command list that has been completed by the working threads in the immediate context.

After the multithreading optimization is complete, the bottleneck of the rendering thread is eliminated, the multicore utilization is significantly improved (see Figure 13), and the frame rate is increased by an average of 1.7 times than before the optimization. It has achieved the set performance target.

Figure 13. Eliminate bottlenecks after multithreaded rendering to improve multicore utilization.

Although the current solution significantly improves performance, there is still a lot of room for improvement in CPU utilization due to an uneven load between Passes. Therefore, in order to make the idle CPU further enhance performance details in games, the developers will rebuild the rendering code of the game engine and try to divide the multithreading optimization of the rendering task by Chunk.

Summary

On the multicore CPUs and GPUs with the largest share of the current game market, and for those DirectX 11 games on CPUs with performance bottlenecks in rendering, achieving multithreaded rendering may help realize significant performance improvements. Although the multicore performance scalability of DirectX 11 multithreaded rendering is limited for some GPUs with limited driver support, under the condition of reasonable implementation the performance of multithreaded rendering will be better than that of single-threaded rendering. The key to the advantage of multithreaded rendering is the division and scheduling of rendering tasks. For this reason, this article introduces the methods based on Pass and Chunk. These two methods are not only applicable to DirectX 11, but also to DirectX 12, so that multithreaded rendering optimization of DirectX 11 games can be easily ported to future DirectX 12 games. In the game Conqueror's Blade, a Pass-based multithreaded method is successfully applied to the traditional deferred shading pipeline, proving the effectiveness of DirectX 11 multithreaded rendering.

Footnotes

1. Introduction to Multithreading in Direct3D 11

2. Steam Hardware and Software Survey

3. How To: Check for Driver Support

4. Microsoft DirectX SDK (June 2010)

5. Conquerors' Blade official website

For more complete information about compiler optimizations, see our Optimization Notice.

-->

Windows Mixed Reality is built on DirectX to produce rich, 3D graphical experiences for users. The rendering abstraction sits just above DirectX and lets an app reason about the position and orientation of one or more observers of a holographic scene, as predicted by the system. The developer can then locate their holograms relative to each camera, letting the app render these holograms in various spatial coordinate systems as the user moves around.

Note: This walkthrough describes holographic rendering in Direct3D 11. A Direct3D 12 Windows Mixed Reality app template is also supplied with the Mixed Reality app templates extension.

Update for the current frame

To update the application state for holograms, once per frame the app will:

Get a HolographicFrame from the display management system.
Update the scene with the current prediction of where the camera view will be when render is completed. Note, there can be more than one camera for the holographic scene.

To render to holographic camera views, once per frame the app will:

For each camera, render the scene for the current frame, using the camera view and projection matrices from the system.

Create a new holographic frame and get its prediction

The HolographicFrame has information that the app needs in order to update and render the current frame. The app begins each new frame by calling the CreateNextFrame method. When this method is called, predictions are made using the latest sensor data available, and encapsulated in CurrentPrediction object.

A new frame object must be used for each rendered frame as it is only valid for an instant in time. The CurrentPrediction property contains information such as the camera position. The information is extrapolated to the exact moment in time when the frame is expected to be visible to the user.

The following code is excerpted from AppMain::Update:

Process camera updates

Back buffers can change from frame to frame. Your app needs to validate the back buffer for each camera, and release and recreate resource views and depth buffers as needed. Notice that the set of poses in the prediction is the authoritative list of cameras being used in the current frame. How to link riven mod in chat warframe. Usually, you use this list to iterate on the set of cameras.

From AppMain::Update:

From DeviceResources::EnsureCameraResources:

Get the coordinate system to use as a basis for rendering

Windows Mixed Reality lets your app create various coordinate systems as needed, such as the attached reference frame and the stationary reference frame, that track locations in the physical world. Your app can then use these coordinate systems to reason about where to render holograms each frame. When requesting coordinates from an API, you will always pass in the SpatialCoordinateSystem within which you want those coordinates to be expressed.

From AppMain::Update:

These coordinate systems can then be used to generate stereo view matrices when rendering the content in your scene.

From CameraResources::UpdateViewProjectionBuffer:

Process gaze and gesture input

Gaze and hand input are not time-based and thus do not have to update in the StepTimer function. However this input is something that the app needs to look at each frame.

Process time-based updates

Any real-time rendering app will need some way to process time-based updates; we provide a way to do this in the Windows Holographic app template via a StepTimer implementation. This is similar to the StepTimer provided in the DirectX 11 UWP app template, so if you already have looked at that template you should be on familiar ground. This StepTimer sample helper class is able to provide fixed time-step updates, as well as variable time-step updates, and the default mode is variable time steps.

In the case of holographic rendering, we've specifically chosen not to put too much into the timer function. This is because you can configure it to be a fixed time step, in which case it might get called more than once per frame – or not at all, for some frames – and our holographic data updates should happen once per frame.

From AppMain::Update:

Position and rotate holograms in your coordinate system

If you are operating in a single coordinate system, as the template does with the SpatialStationaryReferenceFrame, this process isn't different from what you're otherwise used to in 3D graphics. Here, we rotate the cube and set the model matrix relative to the position in the stationary coordinate system.

From SpinningCubeRenderer::Update:

Note about advanced scenarios: The spinning cube is a very simple example of how to position a hologram within a single reference frame. It's also possible to use multiple SpatialCoordinateSystems in the same rendered frame, at the same time.

Update constant buffer data

Model transforms for content are updated as usual. By now, you will have computed valid transforms for the coordinate system you'll be rendering in.

From SpinningCubeRenderer::Update:

What about view and projection transforms? For best results, we want to wait until we're almost ready for our draw calls before we get these.

Render the current frame

Rendering on Windows Mixed Reality is not much different from rendering on a 2D mono display, but there are some differences you need to be aware of:

Holographic frame predictions are important. The closer the prediction is to when your frame is presented, the better your holograms will look.
Windows Mixed Reality controls the camera views. You need to render to each one because the holographic frame will be presenting them for you later.
Stereo rendering is recommended to be accomplished using instanced drawing to a render target array. The holographic app template uses the recommended approach of instanced drawing to a render target array, which uses a render target view onto a Texture2DArray.
If you want to render without using stereo instancing, you will need to create two non-array RenderTargetViews (one for each eye) that each reference one of the two slices in the Texture2DArray provided to the app from the system. This is not recommended, as it is typically significantly slower than using instancing.

Get an updated HolographicFrame prediction

Updating the frame prediction enhances the effectiveness of image stabilization and allows for more accurate positioning of holograms due to the shorter time between the prediction and when the frame is visible to the user. Ideally update your frame prediction just before rendering.

Render to each camera

Loop on the set of camera poses in the prediction, and render to each camera in this set.

Set up your rendering pass

Windows Mixed Reality uses stereoscopic rendering to enhance the illusion of depth and to render stereoscopically, so both the left and the right display are active. With stereoscopic rendering there is an offset between the two displays, which the brain can reconcile as actual depth. This section covers stereoscopic rendering using instancing, using code from the Windows Holographic app template.

Each camera has its own render target (back buffer), and view and projection matrices, into the holographic space. Your app will need to create any other camera-based resources - such as the depth buffer - on a per-camera basis. In the Windows Holographic app template, we provide a helper class to bundle these resources together in DX::CameraResources. Start by setting up the render target views:

From AppMain::Render:

Use the prediction to get the view and projection matrices for the camera

The view and projection matrices for each holographic camera will change with every frame. Refresh the data in the constant buffer for each holographic camera. Do this after you updated the prediction, and before you make any draw calls for that camera.

From AppMain::Render:

Here, we show how the matrices are acquired from the camera pose. During this process we also obtain the current viewport for the camera. Note how we provide a coordinate system: this is the same coordinate system we used to understand gaze, and it's the same one we used to position the spinning cube.

From CameraResources::UpdateViewProjectionBuffer:

The viewport should be set each frame. Your vertex shader (at least) will generally need access to the view/projection data.

From CameraResources::AttachViewProjectionBuffer:

Render to the camera back buffer and commit the depth buffer:

It's a good idea to check that TryGetViewTransform succeeded before trying to use the view/projection data, because if the coordinate system is not locatable (e.g., tracking was interrupted) your app cannot render with it for that frame. The template only calls Render on the spinning cube if the CameraResources class indicates a successful update.

To keep holograms where a developer or a user puts them in the world, Windows Mixed Reality includes features for image stabilization. Image stabilization helps hide the latency inherent in a rendering pipeline to ensure the best holographic experiences for users; a focus point may be specified to enhance image stabilization even further, or a depth buffer may be provided to compute optimized image stabilization in real time.

For best results, your app should provide a depth buffer using the CommitDirect3D11DepthBuffer API. Windows Mixed Reality can then use geometry information from the depth buffer to optimize image stabilization in real time. The Windows Holographic app template commits the app's depth buffer by default, helping optimize hologram stability.

From AppMain::Render:

Note

Windows will process your depth texture on the GPU, so it must be possible to use your depth buffer as a shader resource. The ID3D11Texture2D that you create should be in a typeless format and it should be bound as a shader resource view. Here is an example of how to create a depth texture that can be committed for image stabilization.

Code for Depth buffer resource creation for CommitDirect3D11DepthBuffer:

Draw holographic content

The Windows Holographic app template renders content in stereo by using the recommended technique of drawing instanced geometry to a Texture2DArray of size 2. Let's look at the instancing part of this, and how it works on Windows Mixed Reality.

From SpinningCubeRenderer::Render:

Each instance accesses a different view/projection matrix from the constant buffer. Here's the constant buffer structure, which is just an array of 2 matrices.

From VertexShaderShared.hlsl, included by VPRTVertexShader.hlsl:

The render target array index must be set for each pixel. In the following snippet, output.viewId is mapped to the SV_RenderTargetArrayIndex semantic. Note that this requires support for an optional Direct3D 11.3 feature, which allows the render target array index semantic to be set from any shader stage.

From VPRTVertexShader.hlsl:

From VertexShaderShared.hlsl, included by VPRTVertexShader.hlsl:

If you want to use your existing instanced drawing techniques with this method of drawing to a stereo render target array, all you have to do is draw twice the number of instances you normally have. In the shader, divide input.instId by 2 to get the original instance ID, which can be indexed into (for example) a buffer of per-object data: int actualIdx = input.instId / 2;

Important note about rendering stereo content on HoloLens

Windows Mixed Reality supports the ability to set the render target array index from any shader stage; normally, this is a task that could only be done in the geometry shader stage due to the way the semantic is defined for Direct3D 11. Here, we show a complete example of how to set up a rendering pipeline with just the vertex and pixel shader stages set. The shader code is as described above.

From SpinningCubeRenderer::Render:

Important note about rendering on non-HoloLens devices

Setting the render target array index in the vertex shader requires that the graphics driver supports an optional Direct3D 11.3 feature, which HoloLens does support. Your app may be able to safely implement just that technique for rendering, and all requirements will be met for running on the Microsoft HoloLens.

It may be the case that you want to use the HoloLens emulator as well, which can be a powerful development tool for your holographic app - and support Windows Mixed Reality immersive headset devices that are attached to Windows 10 PCs. Support for the non-HoloLens rendering path - and therefore, for all of Windows Mixed Reality - is also built into the Windows Holographic app template. In the template code, you will find code to enable your holographic app to run on the GPU in your development PC. Here is how the DeviceResources class checks for this optional feature support.

From DeviceResources::CreateDeviceResources:

To support rendering without this optional feature, your app must use a geometry shader to set the render target array index. This snippet would be added afterVSSetConstantBuffers, and beforePSSetShader in the code example shown in the previous section that explains how to render stereo on HoloLens.

From SpinningCubeRenderer::Render:

HLSL NOTE: In this case, you must also load a slightly modified vertex shader that passes the render target array index to the geometry shader using an always-allowed shader semantic, such as TEXCOORD0. The geometry shader does not have to do any work; the template geometry shader passes through all data, with the exception of the render target array index, which is used to set the SV_RenderTargetArrayIndex semantic.

App template code for GeometryShader.hlsl:

Present

Enable the holographic frame to present the swap chain

With Windows Mixed Reality, the system controls the swap chain. The system then manages presenting frames to each holographic camera to ensure a high quality user experience. It also provides a viewport update each frame, for each camera, to optimize aspects of the system such as image stabilization or Mixed Reality Capture. So, a holographic app using DirectX doesn't call Present on a DXGI swap chain. Instead, you use the HolographicFrame class to present all swapchains for a frame once you're done drawing it.

From DeviceResources::Present:

By default, this API waits for the frame to finish before it returns. Holographic apps should wait for the previous frame to finish before starting work on a new frame, because this reduces latency and allows for better results from holographic frame predictions. This isn't a hard rule, and if you have frames that take longer than one screen refresh to render you can disable this wait by passing the HolographicFramePresentWaitBehavior parameter to PresentUsingCurrentPrediction. In this case, you would likely use an asynchronous rendering thread in order to maintain a continuous load on the GPU. Note that the refresh rate of the HoloLens device is 60hz, where one frame has a duration of approximately 16 ms. Immersive headset devices can range from 60hz to 90hz; when refreshing the display at 90 hz, each frame will have a duration of approximately 11 ms.

Handle DeviceLost scenarios in cooperation with the HolographicFrame

DirectX 11 apps would typically want to check the HRESULT returned by the DXGI swap chain's Present function to find out if there was a DeviceLost error. The HolographicFrame class handles this for you. Inspect the HolographicFramePresentResult it returns to find out if you need to release and recreate the Direct3D device and device-based resources.

Note that if the Direct3D device was lost, and you did recreate it, you have to tell the HolographicSpace to start using the new device. The swap chain will be recreated for this device.

From DeviceResources::InitializeUsingHolographicSpace:

Once your frame is presented, you can return back to the main program loop and allow it to continue to the next frame.

Hybrid graphics PCs and mixed reality applications

Windows 10 Creators Update PCs may be configured with both discrete and integrated GPUs. With these types of computers, Windows will choose the adapter the headset is connected to. Applications must ensure the DirectX device it creates uses the same adapter.

Most general Direct3D sample code demonstrates creating a DirectX device using the default hardware adapter, which on a hybrid system may not be the same as the one used for the headset.

To work around any issues this may cause, use the HolographicAdapterId from either HolographicSpace.PrimaryAdapterId() or HolographicDisplay.AdapterId(). This adapterId can then be used to select the right DXGIAdapter using IDXGIFactory4.EnumAdapterByLuid.

From DeviceResources::InitializeUsingHolographicSpace:

Code to update DeviceResources::CreateDeviceResources to use IDXGIAdapter

Hybrid graphics and Media Foundation

Using Media Foundation on hybrid systems may cause issues where video will not render or video texture is corrupt. This can occur because Media Foundation is defaulting to a system behavior as mentioned above. In some scenarios, creating a separate ID3D11Device is required to support multi-threading and the correct creation flags are set.

When initializing the ID3D11Device, D3D11_CREATE_DEVICE_VIDEO_SUPPORT flag must be defined as part of the D3D11_CREATE_DEVICE_FLAG. Once the device and context is created, call SetMultithreadProtected to enable multithreading. To associate the device with the IMFDXGIDeviceManager, use the IMFDXGIDeviceManager::ResetDevice function.

Code to associate a ID3D11Device with IMFDXGIDeviceManager:

Abstract

Background

DirectX* 11 Multithreaded Rendering Model

Multithreaded Rendering Method

Case Study

Summary

Footnotes

Update for the current frame

Create a new holographic frame and get its prediction

Process camera updates

Get the coordinate system to use as a basis for rendering

Process gaze and gesture input

Process time-based updates

Position and rotate holograms in your coordinate system

Update constant buffer data

Render the current frame

Get an updated HolographicFrame prediction

Render to each camera

Important note about rendering stereo content on HoloLens

Important note about rendering on non-HoloLens devices

Present

Enable the holographic frame to present the swap chain

Handle DeviceLost scenarios in cooperation with the HolographicFrame

Hybrid graphics PCs and mixed reality applications

See also