Emulator
Started by carroacelera




79 posts in this topic
montcer9012
Guest


 
11-11-2012, 06:46 PM -
#21
(11-11-2012, 05:59 PM)Runo Wrote: CPU technology really advanced exponentially in the past decade.
Agree.

Thanks for the info anyway!


Runo
Guest


 
11-12-2012, 04:29 AM -
#22
(11-11-2012, 06:22 PM)virgil94 Wrote: You're right.Great response,anyway I know that this emu will become faster in future,however it will be harder for rpcs3 because only one person is working at it now.I didn't know about interpreter and recompiler.

It is at early stages, emu cores normally arent coded by large groups (partially because you have to be a little insane to do it :p) once it is working somewhat other devs will cheer up and come to work on stuff like sound, video, input etc. The core is a bit punk Big Grin

Just one last example about what a recompiler can do: If I turn off the JIT on dolphin (GC and Wii emu) most games I could run at steady 60fps come down to 5-15fps. A recompiler can be very powerful for emulation, and it is interesting to notice how we've been able to run the Tetris homebrew on RPCS3's interpreter at somewhat playable speeds.

A proper JIT would give us dozens more fps. Of course the simplest ps3 game will be much, much heavier than that, not to mention we are still emulating mere fractions of the core, which should be much heavier when done, but this actually makes me feel more hopeful about where this is going.
hlide
Guest


 
11-12-2012, 11:36 PM -
#23
(11-07-2012, 04:39 PM)montcer9012 Wrote: No, i just use the emulators and that's why i don't understand why PCSX2 gave all the work to the CPU.

The GPU have been improved a lot since a few years ago so yes, i guess it could help to don't overload the CPU with process that the GPU can handle. However, there should be a reason why emu developers don't do it; maybe it can't be used; i am not sure.

For old graphics cards GPU is too much dedicated and specific to be use as a general purpose GPU (aka GPGPU).

Keep in mind GPU is a general term to speak about a set of graphics processors working in parallel so you cannot simply decide to recompile a mono-core program into a myriad of small independent programs to run through a GPGPU due to their different nature and goal. It would be totally inefficient because of the overhead implied by calling GPU kernel functions one after one in a CPU function since it's the way a mono-core program is running. So you'd better have a dedicated thread to interpret or to execute recompiled code. Having another thread dedicated to a sound synthesizer, another thread to emulate a GPU using OpenGL/DirectX, another thread to do IO operations, etc.
Having those threads be handled by all the cores of the CPU so you make their dedicated tasks as much in parallel as possible.

The efficiency of multi-tasked program is another challenge because it is not an easy job to find the best way to run smoothly all the threads with efficient synchronization (for instance, abusing mutexes tends to be counter-productive; that is the reason why I mostly avoid them). Now try to be portable in several platforms, your will lose your hair.
Runo
Guest


 
11-13-2012, 12:45 AM -
#24
Quote:you cannot simply decide to recompile a mono-core program into a myriad of small independent programs to run through a GPGPU due to their different nature and goal. It would be totally inefficient because of the overhead implied by calling GPU kernel functions one after one in a CPU function since it's the way a mono-core program is running.

I never heard of this way of accessing the GPU, is it even practical? The closest thing I know for direct GPU programming is libraries like Physx, that must be of course supported by the GFX driver.

Quote:The efficiency of multi-tasked program is another challenge because it is not an easy job to find the best way to run smoothly all the threads with efficient synchronization (for instance, abusing mutexes tends to be counter-productive; that is the reason why I mostly avoid them). Now try to be portable in several platforms, your will lose your hair.

Some things can be multithreaded relatively easily, tough. PCSX2, for example, has a multithreaded recompiler for its VU chip, which is kinda simple since the VU itself is a multicore chip. You just create a thread for each core, and since the real hardware is made of separate cores, they don't need synchronization. The sync problem comes when you try to split one emulation task in two or more threads.
montcer9012
Guest


 
11-13-2012, 01:40 AM -
#25
(11-12-2012, 11:36 PM)hlide Wrote: Now try to be portable in several platforms, your will lose your hair.
Hahahahaha; is true.

Well i guess all emu dev do what they do because enjoy it; in that order make a multi thread emu is just a matter of time and how much it will take, at the end it will be done. I remember that long time ago people was saying that make PCSX2 a multi thread emu won't be possible and on that time that emu ran my games at 30~40 FPS; after a few months PCSX2 start using multi thread functions and now i can run games at constant 50~60 FPS with same computer.
hlide
Guest


 
11-13-2012, 01:45 AM -
#26
(11-13-2012, 12:45 AM)Runo Wrote:
Quote:you cannot simply decide to recompile a mono-core program into a myriad of small independent programs to run through a GPGPU due to their different nature and goal. It would be totally inefficient because of the overhead implied by calling GPU kernel functions one after one in a CPU function since it's the way a mono-core program is running.

I never heard of this way of accessing the GPU, is it even practical? The closest thing I know for direct GPU programming is libraries like Physx, that must be of course supported by the GFX driver.
Using OpenCL or DirectCompute, you can write your kernel functions in a shader-like file, compile it and have it run by GPGPU. OpenGL 4.3 has compute shader now which is similar to OpenCL general purpose shader.

(11-13-2012, 12:45 AM)Runo Wrote:
Quote:The efficiency of multi-tasked program is another challenge because it is not an easy job to find the best way to run smoothly all the threads with efficient synchronization (for instance, abusing mutexes tends to be counter-productive; that is the reason why I mostly avoid them). Now try to be portable in several platforms, your will lose your hair.

Some things can be multithreaded relatively easily, tough. PCSX2, for example, has a multithreaded recompiler for its VU chip, which is kinda simple since the VU itself is a multicore chip. You just create a thread for each core, and since the real hardware is made of separate cores, they don't need synchronization. The sync problem comes when you try to split one emulation task in two or more threads.
You are wrong when telling they do not need synchronization. There is always a need of synchronization between threads (usually something like a command/event queue for instance to tell to the thread what to do in batch). It is not easy to find the most efficient mechanism to sync between threads, especially when portability is concerned. When you want to be portable, you will use generic synchronization which may have major drawbacks (abusing mutexes for instance instead of using well-designed command queue). The point is : not all emulator coders are pro in the multi-tasking paradigm.
Runo
Guest


 
11-13-2012, 10:29 AM -
#27
(11-13-2012, 01:45 AM)hlide Wrote: Using OpenCL or DirectCompute, you can write your kernel functions in a shader-like file, compile it and have it run by GPGPU. OpenGL 4.3 has compute shader now which is similar to OpenCL general purpose shader.

Meh, should have thought about shader language Big Grin Still, I can't see how this can be used to emulate hardware like it was suggested before.. GPU is designed for intensive parallel processing, and emulation is very much a serialized process, like you said before yourself, there are big limits for what can be ran in parallel. All I can think of is using it for texture decoding and some other small stuff like that.
Still, you fired my curiosity sir, I'll dig into this a bit.

(11-13-2012, 01:45 AM)hlide Wrote: You are wrong when telling they do not need synchronization. There is always a need of synchronization between threads (usually something like a command/event queue for instance to tell to the thread what to do in batch). It is not easy to find the most efficient mechanism to sync between threads, especially when portability is concerned. When you want to be portable, you will use generic synchronization which may have major drawbacks (abusing mutexes for instance instead of using well-designed command queue). The point is : not all emulator coders are pro in the multi-tasking paradigm.

Hmm I see. Still, I'm pretty sure there is some serious problem with splitting an emulated core into more than one thread, and this had to do with syncing problems. From what I recall syncronization between those threads would have to be so tight you wouldn't gain speed at all.. While multithreading different chips or cores into different threads seems to get you some speedups if done properly. I don't remember details, multithreading has never been my area of expertise Tongue (aka I suck hard at multithreaded programming Big Grin)


hlide
Guest


 
11-13-2012, 02:02 PM -
#28
(11-13-2012, 10:29 AM)Runo Wrote: Hmm I see. Still, I'm pretty sure there is some serious problem with splitting an emulated core into more than one thread, and this had to do with syncing problems. From what I recall syncronization between those threads would have to be so tight you wouldn't gain speed at all.. While multithreading different chips or cores into different threads seems to get you some speedups if done properly. I don't remember details, multithreading has never been my area of expertise Tongue (aka I suck hard at multithreaded programming Big Grin)
That is precisely why I say "not all emulator coders are proficient in the multi-tasking paradigm".

For what I looked at pcsx2, they massively use shaders for the graphics part. The plugin under OpenGL (not sure if it works) has very complex generated shaders.


Runo
Guest


 
11-13-2012, 03:07 PM -
#29
But then you're saying someone with the right skills could multithread a single chip emulation without losing performance over cores? Cause I never heard of someone that did this (A dev from Dolphin emulator team tried a while back, he wanted to write a multithreaded JIT that split the emulated CPU thread into two logical threads, but he gave up, and he told us over the IRC channel he wasn't gaining much speed even if it was using all of his CPU cores, because of the need for tight thread syncing)

The emulated CPU is single cored, so it works in serialized manner. Even if you parallelize the emulation one thread will need to wait for the other, I don't see how that could be optimized.
gid15
Guest


 
11-13-2012, 05:17 PM -
#30
(11-13-2012, 03:07 PM)Runo Wrote: But then you're saying someone with the right skills could multithread a single chip emulation without losing performance over cores? Cause I never heard of someone that did this (A dev from Dolphin emulator team tried a while back, he wanted to write a multithreaded JIT that split the emulated CPU thread into two logical threads, but he gave up, and he told us over the IRC channel he wasn't gaining much speed even if it was using all of his CPU cores, because of the need for tight thread syncing)

The emulated CPU is single cored, so it works in serialized manner. Even if you parallelize the emulation one thread will need to wait for the other, I don't see how that could be optimized.
I'm planning to add real multi-threading for the PSP threads in Jpcsp. Jpcsp is already using multiple cores for the emulation of the PSP hardware, but the PSP threads are currently always running one at a time (like on a real PSP). Most of the applications coded for the PSP do however already use thread synchronization (Mutex/Sema/EventFlag/...) internally. But they were never tested on a real hardware multi-threading, as the PSP has a single MIPS CPU Wink. This feature would be available as an option, for PSP applications programmed in a thread-safe way Smile.
The most difficult part for this approach will be to make the Jpcsp HLE calls thread-safe (or they could be sequentialized automatically). The implementation of sceKernelCpuSuspendIntr/sceKernelCpuResumeIntr which are often used by PSP applications to protect a critical section is also a challenge. On a PSP, these 2 calls are lightweight and do not involve much overhead. On a truly multi-threaded system, they would imply to stop all the other running threads, which is not obvious without adding too much overhead.

The point is, when going beyond the capabilities of the emulated system, a solution doesn't have to be 100% perfect. It can be offered as an option or can work only under some conditions.


Forum Jump:


Users browsing this thread: 1 Guest(s)