informationto improve game engine

May be someone could use this information to improve the bge.

/uploads/default/original/4X/a/1/e/a1e41479e76934613804294e27407d40cce5c403.pngstc=1
(Source Image from xkcd)

Threading is not a simple problem. Reading a single stackexchange post won’t make it any simpler.

Attachments


You read the whole thing.

It would help if you could point out what exactly you mean. I read it and don’t know what you are actually referring to.

I did read the whole thing. In summary:
“Threading is hard.”
“There are three competing ways to do it depending on the situation”
“Make sure the overhead from context switching doesn’t outweigh the performance gains”
“How the heck can we get data from thread A to thread B without things blowing up?”

On a side note, the only time I’ve ever managed to segfault python (not BGE, plain ol’ python) was doing some threading stuff. Threading is the fastest way to lead to program instability, program lockups, logic loops and race conditions and a whole host of other nasty things.

A final thing: Information is different to actually doing it. The engineer who designs an aircraft probably doesn’t know enough about a milling machine to make the parts he’s designed.

I am not coding anything with this.
Here is some neat parts.

However, sometimes it’s just easiest, due to politics, existing code, or other frustrating circumstances, to give each subsystem a thread. In that case, it’s best to avoid making more OS threads than cores for CPU heavy workloads (if you have a runtime with lightweight threads that just happen to balance across your cores, this isn’t as big of a deal). Also, avoid excessive communication. One nice trick is to try pipelining; each major subsystem can be working on a different game state at a time. Pipelining reduces the amount of communication necessary among your subsystems since they don’t all need access to the same data at the same time, and it also can nullify some of the damage caused by bottlenecks. For example, if your physics subsystem tends to take a long time to complete and your rendering subsystem ends up always waiting for it, your absolute frame rate could be higher if you run the physics subsystem for the next frame while the rendering subsystem is still working on the previous frame. In fact, if you have such bottlenecks and can’t remove them any other way, pipelining may be the most legitimate reason to bother with subsystem threads.

It can take a bit of mind twisting, but if you can break things up as a job queue with a set of worker threads it will scale much better in the long run. As the latest and greatest chips come out with a gazillion cores, your game’s performance will scale along with it, just fire up more worker threads.

So basically, if you’re looking to bolt on some parallelism to an existing project, I’d parallelize across subsystems. If you’re building a new engine from scratch with parallel scalability in mind, I’d look into a job queue.

I would suggest designing each system as a self contained module that you could thread if you wanted. This usually means having very clearly defined communication pathways between the module and the rest of the engine. I particularly like Read-Only processes like Rendering and audio as well as ‘are we there yet’ processes like reading player input for things to be threaded off. To touch on the answer given by AttackingHobo, when you are rendering 30-60fps, if your data is 1/30th-1/60th of a second out of date it really is not going to detract from the responsive feel of your game. Always remember that the main difference between application software and video games is doing everything 30-60 times a second. On that same note however, input may be one of the things you want to keep on the main thread so the rest can react to it as soon as it appears :slight_smile:

If you design your engine’s systems well enough any of them can be moved from thread to thread to load balance your engine more appropriately on a per-game basis and the like. In theory you could also use your engine in a distributed system if need be where entirely separate computer systems run each component.

I create one thread per logical core (minus one, to account for Main Thread, who incidentally is responsible for Rendering, but otherwise acts as a Worker Thread too).

I collect input device events in realtime throughout a frame, but don’t apply them until the end of the frame: they will have effect in the next frame. And I use a similar logic for rendering (old state) versus updating (new state).

I use atomic events to defer unsafe operations until later in the same frame, and I use more than one event queue (job queue) in order to implement a memory barrier that gives an iron-clad guarantee regarding order of operations, without locking or waiting (lock free concurrent queues in order of job priority).

It is noteworthy to mention that any job can issue subjobs (which are finer, and approach atomicity) to the same priority queue or one that is higher (served later in the frame).

Given I have three such queues, all threads except one can potentially stall exactly three times per frame (while waiting for other threads to complete all outstanding jobs issued at the current priority level).

This seems an acceptable level of thread inactivity!

My original post still stands:
/uploads/default/original/4X/a/1/e/a1e41479e76934613804294e27407d40cce5c403.pngstc=1

I suggest you go and play with python’s multithreading module and see how much havoc that causes. It’s not something that will take a person a day to do, or even a week or month. Adding threading to a game engine takes man-years of work.

Things seem simple when presented in a well written manner, but in reality it is often far harder.

@Lostscience, that’s not very unusual information. What @sdfgeoff wrote is more than true in this context. It is well known that the Blender Game Engine has some architectural issues and it is also well known that this is not a usable basis for multithreading. Multithreaded code is far more delicate to deal with and very complicated to debug. At the moment, this is simply not a realistic to achieve goal.

More generally, finding some abstractly related code and hoping that it will improve the game engine is misguided. Code isn’t itself that useful - most people can write it, and few ideas are novel.

I thought the issue of using threads in Python was its unusable implementation rather than the concept of threading itself. It is well known that the Python developers do not see threads (or even high performance) as a priority and are not planning to do any work on it.

The main thing I see, according to what I’ve gleaned on threading, is that a separate thread always has to reach a condition where it can come to a stop. You also have to remember that you can only do efficient threading if there’s clear parts of the program that can be computed separately or in parallel (before sending the result to the main thread). In most cases, multithreading in game engines now days are used in the areas that can easily be divided such as physics (each thread having its own batch of objects to calculate).

@acedragon:
WHen I was working with python threads, I used the producer->consumer model. Some things would create data and pass it on. Other things would take and consume data. This way threads don’t have to come to a stop. They may stop processing their data and sit there doing nothing (Eg no data has been passed to the consumer). This works well with external operations such as networking, calling system programs etc, but in a game engine, you have the issue that Things have to synchronize every frame. It is this synchronization that makes life hard, as suddenly the threads are no longer independent processes, but depend heavily on data from other threads. As a result, scheduling becomes an issue.
In game engines, there is also the problem that many things cannot be broken up into small chunks easily. One for physics, one for rendering, maybe 1 or 2 for logic and then what? My i7 has 8 logical cores. What do we do with them all?

Originally the problem with python threading (the threading module) was that it was ‘fake’ threading. It still operated in a single OS thread, but allowed blocking operations (such as networking waits and raw_input()) to execute without blocking the other “thread(s)”. In the BGE, this meant that you could run parallel threads, but all the threads still had to finish inside a single logic tick, and because you generally aren’t using blocking operations inside a logic tick anyway, there was no advantage.
This was solved with the python ‘multithreading’ module, whereby the threads became actual OS threads that were independent. This is what I used, and they worked fine, though you still hit the normal logistical issues of threads. With the advent of queues a lot of the issues with lock bits can be avoided.
There is a great slideshowover herewhich is what introduced me to multi-threading in python.

I fear that using the collection of mumbo-jumbo written in that stack exchange discussion will do more harm than good.
Some part are exquisitely laughable (especially the networking stuff)
The answer to both questions “how many threads should I have” and “for what” is in the Amdahl’s law, (also known as the “it depends on what the heck you’re doing” law).
Game engines spend most of their time in the rendering stage, which is not parallelizable, due to the way opengl and directx are/were implemented.
Because of that, throwing threads around can only have a significant impact if the game is not really rendering much stuff.
That’s what the participants to that discussion that claimed miraculous benefits to their engine experienced: not enough polygons to saturate the rendering pipeline.
It’s also the reason why Vulkan, despite begin a godawful thing to look at, is good news: it allows the parallel execution of the most expensive part of the computation.

Tangent note: the threading issue with python exists not in the language but in the cpython implementation. That virtual machine has a lock (the infamous Global Interpreter Lock) that prevents two threads from executing bytecode at the same time. Because of that, cpython threads offers zero parallelism. It is often cited that you can still achieve parallelism for I/O bound tasks, which is an obscenity. Jython and IronPython (JVM and .NET implementation of the 2.7 version of the same language) do not share the same issue.