When a system is not rendering as fast as expected it can be due to many different reasons.
In our experience, a common mistake is to suppose that "it must be for the same reason as the previous time", as in this particular cause a same symptom can be related to many different causes. We strongly recommend to approach this type of problem with an open mind and not to pre-judge the cause, as this has demonstrated to create long delays to find solutions.
There is always one component limiting the render speed, but it can be different on each render job. First thing is to identify the category where it belong:
A - Hardware resources:
- Storage access
- CPU
- GPU
- RAM Memory
- Power supply & Temperature
B - Software related issues and operation
- Image codec
- Configuration Settings
- Others (operational issues...)
Recommended procedure to find which category is limiting the render speed
On persistent problems first thing that is recommended to reboot the system to start in a clean situation. Problems related with RAM swapping , GPU memory fragmentation, and some others may produce a permanent slow down of the system once they have been triggered and whatever you try later, so it is important to start diagnostics with a fresh boot.
Also check for obvious error messages.
- Launch Mistika in a console and keep it visible while rendering to check for abnormal errors.
- Check the content of the /var/log/mesages file for hardware errors. You can also track the hardware feedback during rendering, by opening a console and executing this:
su
tail -f /var/log/messages
Any posterior hardware issue happening while rendering may provide useful feedback in that console
If the problem cause is not evident, then we recommend to follow these steps:
1 - Keep it simple. Reduce the timeline to the minimum items as necessary to reproduce the problem, both in terms of the number of clips and the number of effects applied to it.
- Remove effects one by one to check their contribution to the render speed slowdown. If you find the that is causing major slow downs then go to point 2.
- If it still too slow with no effects applied, then the problem must be related with the source media files or the rendered media files. You can find it as follows:
* If the playback is equally slow, then the probable cause is related to the source codec or its storage unit.
* Otherwise (if it is only slow at render time) then the probable cause is related to the rendered codec or its storage unit
For both cases, you can also use a simple format both as media source and render format to confirm the cause: We recommend to progressively substitute the media clips and the render format for Mistika .js format, as it is the most predictable format. It does not require any GPU or CPU resources, so it is only related to storage resources. Then:
* If it is still too slow after changing all media and render format to Mistika .js then the problem is clearly related with the storage (or networked drives, if that is what is used). Use disk benchmark tools to confirm, and contact support for your storage or network devices. At this point the diagnostics is finished, next points will not apply.
* Otherwise (it the render is fast with Mistika format), then the problem is related to the specific codec. If it is a compressed codec go to mConfig and try to increase PerformanceOptions ->PipeUnits /Threads, or if if it was only slowing down at render time increase the PerformanceOptions->RenderUnits and try again. If this does not solve the speed problem, the next point may also help to better study the problem
2 - Check CPU, GPU, and RAM resources.
At this point we should have discarded storage & network hardware issues, and we should have a simple timeline to study the problem.
Now open the following diagnostic tools and keep them visible
- In Mistika, open Edit->Extras->SystemMonitor->SystemLoad. It will tell you the CPU activity and RAM usage
,
- In Mistika, open Edit->Extras->NVidiaSettings. Among other settings, it will tell you the GPU percentage activity, GPU memory usage and temperature,
Now start the render, and when the render speed is too slow for your expectations then check the following values on those tools.
- RAM usage: It should never go over 100%, and the swap indicator should remain zero or very low. Otherwise the problem is related with RAM, and you should check these points:
* If the RAM usage curve always advance in a smooth growing diagonal while rendering a same clip then it sounds like a memory leak (a potential software bug, in this case please contact Mistika support and provide an example timeline).). In general, the RAM usage should grow on a staircase shape, and then stay more or less constant for the render duration of a same clip
* Otherwise, if the RAM usage curve grows in staircase shape until it goes too close to 100% or over this value (swapping), then the timeline is to complex for the amount of RAM on your system. You can still apply one or more of the following strategies:
* Put more RAM in your system
* Decrease parameters in mConfig->PerformanceOptions (each one has a description, all of them have an impact in RAM usage)
* Use the background render instead of foreground render: For example send the render job to BatchManager and exit from Mistika. In this way, when Mistika is closed more RAM is available for render)
* Reduce the complexity of he timeline (render by parts, and decrease unnecessary effect parameters related with RAM )
- GPU Activity. All Mistika effects are based on GPU (the CPUs are dedicated to compressed codecs ). So this is the limiting factor for most Mistika internal processing. Keep looking at the GPU usage percentage and graphics memory usage (both are visible in the nvidia-settings tool), if any tohse values goes too often over 90% then the only solution is to install a faster GPU or one with more memory (except for the considertions in the next point about GPU memory). Also check quality parameters on the effects to see if you can do the same job with simpler settings.
Special attention to the mConfig->Codecs->RED R3D, the Cudanew is the fastest one but it can use a lot of GPU memory if the GPUframes and GPUmemory are high.
- CPU activity curves: The CPU activity curves (one curve per core) can produce many different patterns and it may be difficult to analyse if it is normal or not, but some patterns are very indicative:
* No CPU core should be at 100% for long period of time. If that is the case it will be creating a bottleneck to the whole render process. If you see only one or few cores at top usage but not the others then the problem cause is a lack of paralelization. Probably a camera source or render format that is not optimal at all, and there is not too much that you can do (apart for using a different format whenever is possible..). Using CPUs with more cores will not help at all.
* A wave pattern with most ores going into high peaks on each wave. That is the normal behaviour for compressed codecs. If the peaks are too high (over 80%) and no cores remain inactive then the only way to accelerate it is to use faster CPUs.,as it means that you have reached the limits of the system for that particular codec. Also check that hyper-threading is active in the BIOS.
* Even when using CPU based codecs well parallelized it is rare to see total usage values over 70% or so. This is because the System Threads (on hyperthreading) are not true CPU cores but virtual and they share some resources. So if you can not get more than 70% or so it is perfectly normal.
3 - GPU Memory and RAM memory management
In complicated projects, if a same part of the timeline sometimes crashes and sometimes not (playbacks or rendering) then it points to a memory management issue:
This is because Mistika manages memory in a special way to avoid memory fragmentation, which can reduce the performance a lot (this applies both to RAM and GPU memory)
- When it allocates memory, it will not return this memory until exiting the application.
- A memory buffer that has been allocated for a particular image size and bit deph will only be re-utilised for images of the same size at bit deph.
Mistika needs to do this to avoid memory fragmentation, which is critical in order to maintain performance with high resolutions. But if you use several different resolutions of big sizes in the same session (that is, if the record monitor or a same render process passes trough the various image formats ) then you could end running out our of graphics memory or out of RAM, at which point Mistika will experience an extreme slow down or it will directly crash. (in case of doubt, you can track the memory allocation actions in the mistika console output, and in the case of BatchManager in the .log files).
When working with big resolutions it is important to keep an eye on both memory values (as explained in the previous point), and as a rule of thumb:
- If you are running out of memory during an interactive session, just exit and load mistika again, and try to avoid evaluating different parts of the timeline using different formats in the same session
- Always render using the BatchManager, and always do it in Split by segments mode. This techinque will execute one different render per shot, so all the memory is freed and reallocated between different segments. For complex projects Mistika is designed to render in a per shot basis. Doing long renders in one go, with many consecutive shots based on complex formats is not efficient at all, and it will create system instabilities.
4 - Other potential hardware issues that can produce render performance problems and crashes
If none of the above was conclusive, then check for these points.
- Temperature problems in the GPU or CPU, and other hardware issues. This can happen due to inadequate ventilation, but also due to damaged hardware. In general temperature issues can manifest themselves with system crashes, but also with very obvious performance problems without crashing..
* Use the mentioned nvIdia-settings tool to check for temperature problems while heavy rendering. It should not go into the red zone, but yellow zone is normal.
* Check the /var/log/messages file. Check for temperature issues on the CPUs, and also for other hardware warnings appearing under heavy rendering.
- Insufficient electrical power is also a known cause of both severe performance issues and crashes. . SGO turnkey systems are tested to work with all components at 100%, but if you have added extra components or your workstation did not come from SGO then we would recommend to test by reducing the power usage.. If all diagnostics have failed, then remove all the unnecessary boards, install a smaller GPU and test again.
If there is an UPS, do a test bypassing the UPS and using a reliable power plug. Most UPS degrade over the time and some models can not provide all the power that is required, which is known to affect the workstation performance.
- Electrical issues: Earth levels & phase differences. Another well known issue that can cause a high impact in performance or stability is a difference on earth levels or a difference in electrical phase between equipment (workstation, monitors (GUI and SDI), external hard disks, etc. Try to organise a test with everything connected to the same power stripe (only the equipment connected trough optical fiber does not need to be taken into account, as these cables do not carry electricity ).