Totem Configuration Procedure.


Introduction


Totem is only available on Linux platforms.


Totem is a cluster render technology that permits to use several render nodes to render a same clip. 


It also permits to use two GPUs on a same sytem to render a same clip, by sending different frame packets to each one.  


The main purpose is to control all the available systems (or GPUs) from a hero suite during client attended sessions, to "render one clip as fast as possible" for realtime playback purposes. This tool permits to use all the available hardware resources at the same time. 

The Totem special design also permits to playback the clip while it is being rendered. 


Please note that the Mistika Totem tool is different to Mistika BatchManager, which is render dispatcher:

 

-  BatchManager will get independent render jobs from render queues and send each one to next available node.  As a difference to totem It supports all render formats and also audio, and it is optimized for best global render speed (best  total render time for the total number of clips). To do this, each render job is only sent to one node.  BatchManager can work as a background service (unattended). while Totem will only render a single clip and only at user request.


- Meanwhile, Totem only supports Mistika .js image format (and a few enumerated formats) and it does not support to render audio, but in exchange it can use all GPUs of all nodes (including local system) to collaborate on a same clip at the same time. 


Totem can only be launched at user request from the Mistika interface:


 -  Render->Totem:  


In Background mode it uses all GPUs from all systems except the GPUs in the local system. The background mode is typically used when you want to playback the clip while it is being rendered by other systems, or when you want to send the job to other nodes while you continue working in the local system unaffected


In Foreground mode it will also add the local GPUs to the render pool, including the one used by your Mistika session. This is logically the fastest way to render a clip, but depending on the job nature the local system may be much less responsive and slower until the render is finished.


 - Edit->PlaybackCache->RenderWithTotem (provide similar functions, but to render a Playback Cache (temporal .js file) for the current effect (where the monitor mark is located), rather than rendering a new user defined clip. 

 


Notes


A "totem render node" is not necessarily a computer, although it is the normal case for single GPU systems. But if one computer has more than one GPU, then the totem interface permits to configure each GPU  as a separate render node (recommended).


This document is focused on the Totem configuration, but it is recommended to configure BatchManager first. This is because BatchManager will help you to configure  all the basic network settings (/etc/hosts, ssh service, path shares...), so it will be  easier to configure Totem later 


Totem can accelerate  render speed in linear manner ( For example, using 5 render nodes can really render a clip up to 5 times faster), but only if the storage can provide enough bandwidth for all nodes. The storage access is the most important point when planning a Totem infrastructure because you will have many computers reading and writing uncompressed files at the same time.


In general, MistikaConfig->Totem will configure most aspects automatically. So we recommend to just try to use it and see what happens.  Then, if it does not work or if you need a customized configuration you will need to check these aspects:

The next points are only for diagnostics and for deeper understanding, they should not be necessary on turnkey installations.



Check Totem license


The totem license is installed as a separate line in the same license file as mistika  ( /var/flexlm/sgoLicenseV5.dat ), clearly identified with the "TOTEM" feature. If you do not have it this line, then it is the first thing to solve.



Check hostname


The hostname of a render node (or local system) can not have a domain on it (no dot "." characters permitted in the hostname). You can execute "hostname" in a console to see what is the current hostname of a system. 


For example:


mistika-1  is a correct hostname


mistika-1.sgo.es is not a correct hostname, because it has the domain (sgo.es) as part of the hostname 


Instead, if you need to resolve a hostname like that we recommend to let the hostname to be mistika-1 and  just put both names in a same line in the  /etc/hosts file:


 185.114.227.322 mistika1 mistika1.sgo.es



Check Hosts file.


In all computers (both Mistika lcoal system and render nodes)  edit the file /etc/hosts file as follows:


Check the local hostname and local IP with the command hostname and hostname -i


Edit the file /etc/hosts file to ensure you find the line to match the IP with the hostname. For example, if the hostname is mistika1 and the ip is 185.114.227.322, all the systems will need to have this line in the /etc/hosts:


185.114.227.322 mistika1


And make sure that mistika1 does not appear in any other line.


Add a new line for each remote machine. 


NOTE: The assignment of IPs should be fixed and not by using DHCP.


Those steps have to be done in all systems, with exactly the same lines in the /etc/hosts. (Each computer hostname must appear with identical name in the /etc/hosts of the other computers)


Check that all computers can "ping" the others. In the previous exampe, this command:


ping mistika1


must work from all computers (including mistika1).



ssh configuration.


Once the network is working at "ping" level and all the hostnames are resolved, then you need to make sure that the "ssh" service permits to connect the main mistika user to all the other computers in order to login on them and launch remote commands. 


For example, if you want to use Totem in a mistika session runing in mistika1 system, and you want to use render1 and rende2 nodes to render, these commands need to work without error when executed from mistika1:


ssh render1

ssh render2


If they don't, then the easiest way to fix it is to configure BatchManager in all the systems, which will create ssh keys for all of them and pass the keys between the systems, so they can send render commands between them 


To do that, in mConfig open the BatchManager tab and activate the Use Batch Manager checkbox. This action will trigger an automatic process to enable the communication between the render machines. You need to do it in all computers, indicating the same folder for the render queues root folder.  This will prepare all the computers to act as render nodes and create ssh keys for them. 


 Once this is finished, then you will need to open mConfig a second time in all systems.  In this second round,  mConfig will realise that there are new render nodes available (when opening mConfig, a dialog will pop up indicating the new render nodes that are found), and it will get the ssh keys from them .


Note: For more details check BatchManager or ssh documentation.



Totem settings in mConfig.


Open the TOTEM tab and activate the Use Totem checkbox.


Click on Manage Totem Nodes


Add the new nodes typing the hostname, choose GPU unit, parallel Instances, activate Use and press Add.


Note: If  a system has two GPUs, add one totem node for each one, both with the same hostname, but selecting  GPU_0 for one if them and GPU_1 for the other one.


The system will check the status of that node. In case of errors, a See log button permits to anaylise what happened in more detail


Note: There are two parameters affecting render performance.


-  ClusterRenderUnits is the number of frames ordered to each node together (default is 4).  Using a few frames helps to optmise disk IO performance and optical flow effects. But setting it too high is not efficient at all, as each node may receive a very different load (specially with small clips), and because you will need more time to start a playback  (Totem permits to playback while the clip is still rendering, but all nodes need to have finished a packet of frames before you can go trough the whole segment that they form)


-  Instances. This is normally set to 1, and very rarely changed. The instances is the number of independent render nodes created in a same computer. In a sense, it is like having several virtual render nodes on a same computer. Using more than one may only help in systems with many CPU cores, and only if the codec used by the source images is  not well paralleled. However, parallel instances  need to share the GPU at the same time, so this value should not be increased without a reason. In most other cases doing it will reduce the performance significantly, rather than the opposite.



Using Totem in Mistika.


In the render panel, choose Mistika .js as the render format (some enumerated formats will also work, but they will be slower. And movie formats will not work at all)


The Totem tab on the left side of Output panel shows the list of nodes and theirs status (one per instance). They should appear as "Ready"


To use the available Totem nodes, activate the Totem button on the right side of Output->Render panel. If it is active, posterior renders will try to use all the active totems in this way:


- When doing  a foregorund render all the totem nodes will be used, including the GPUs of the local node where mistika is running


- When doing  a background render,  the GPUs of all nodes are used except the ones in the local node (the one where mistika is running).  


In addition to render processes launched with the render panel, Totem can also be used for rendering PlaybackCache files (using totem submenu options under the Edit->PlaybackCache menu) . However, it is recommended to test Totem with the render panel before trying to make it work with the playback cache, as it is easier more interactive to see if it works or what is going on.