Are you asking that, people should de-compile game code to something workable and then recompile to x86? That is an enormous task. And it is not generic either,you have to work for every game. Basically porting a huge collection ps3 games.
(05-17-2014, 07:33 PM)gamenoob Wrote: [ -> ]Are you asking that, people should de-compile game code to something workable and then recompile to x86? That is an enormous task. And it is not generic either,you have to work for every game. Basically porting a huge collection ps3 games.
Isn't this what the SPU
recompiler already does? I see a ton of .log files with x86 assembler in them
(05-17-2014, 09:53 PM)ssshadow Wrote: [ -> ] (05-17-2014, 07:33 PM)gamenoob Wrote: [ -> ]Are you asking that, people should de-compile game code to something workable and then recompile to x86? That is an enormous task. And it is not generic either,you have to work for every game. Basically porting a huge collection ps3 games.
Isn't this what the SPU recompiler already does? I see a ton of .log files with x86 assembler in them
I think he meant statically (and I doubt RPCS3 is going to suddenly gain a static PPU recompiler), whereas the SPU recompiler is a JIT.
(05-18-2014, 12:01 AM)derpf Wrote: [ -> ] (05-17-2014, 09:53 PM)ssshadow Wrote: [ -> ] (05-17-2014, 07:33 PM)gamenoob Wrote: [ -> ]Are you asking that, people should de-compile game code to something workable and then recompile to x86? That is an enormous task. And it is not generic either,you have to work for every game. Basically porting a huge collection ps3 games.
Isn't this what the SPU recompiler already does? I see a ton of .log files with x86 assembler in them
I think he meant statically (and I doubt RPCS3 is going to suddenly gain a static PPU recompiler), whereas the SPU recompiler is a JIT.
I grab the executable, let it run and analyze the ppu-spu instructions, data pointers . I got all instructions, then I trace it back to something high level language aka decompiling. Then depending upon runtime platform(x86, arm etc), I recompile these. And finally run it to host hardware. Yes it is static thing. The problem is: every game will have different instructions. So you have to do it for every game. The catch is: a system of similar raw power or less should be able to emulate ps3 pretty nicely.
There is another "probable" problem. I said "probable" because I am not sure of it. Lets consider an unity3d game, on which there is a script. Consider the pseudo code snippet of this script:
Code:
if(boolA){assemblyFuncA();}
else if(boolB){assemblyFuncB();}
else{assemblyFuncC();}
Here there are three function. For simplicity, lets consider the inside content of function is pure assembly. So depending upon situation(the bools), different set of instruction will run. When game engine compile the whole game project, one or multiple executable is formed. Lets consider you are analyzing this game. Which one can you do?
Case 1: we grab the executable, we can decompile it. We can get all the "function's inside instructions".
Case 2: We grab the executable, we let it run. When appropriate condition is met, proper set of instruction will run. For example: if "boolB" never happens in an instance of emulator(with respective game title running), instructions within "assemblyFuncB()" will never be collected either. So how can you decode in this case 2?
If you can do case 1, then prior decoding will be great. If not, then I guess we have to do something else.
Wouldn't AOT compilation enable PS3 games to be much faster through the emulator? Because if it's already compiled to a targetable execution format that needs no decoding it would make sense that the games would be almost like native executables on the target system ... and since it's all taken care of ahead of time all the emulator has to do is use a light abstraction layer-like engine to handle graphics, emulated RAM ranges, sound, etc.? Or do I have this wrong?
So... what will get is some cool ass decoder or some stuff like that? Since I get kinda lost without tutorials, I hope we get some "click here load this" button, so people like me can just test games without knowing about this decoding magic
(06-07-2014, 01:19 AM)Threule Wrote: [ -> ]So... what will get is some cool ass decoder or some stuff like that? Since I get kinda lost without tutorials, I hope we get some "click here load this" button, so people like me can just test games without knowing about this decoding magic
That's what's already in place. Anyone who
cannot figure it out probably should not be using it at this point in time, though.
mhmm, nice discussion, guess devs should know best
btw. did anybody of you ever looked at the cellbe "simulator"? if the developper
of the cell or a team near them with full insight of the dox do it aot or jit or maybe
some mix - this should be the "golden" way as they should know best.
back in 2007 (= time when i played around with this nice peace of software),
as far as i remember, ydl ran at decent speed - i programmed some little speed
testing stuff in it (no gfx/audio certainly) but i was impressed how fast it was - ok, 7 years ago, so far my recalls...
Hey all ...
Well.. let me first tell that I already read all stuff about words on the air and about doing some POC before post
.
But I'm with that idea on my mind for a good amount of time, and I don't had already read too much line of code from rpcs3. This is more about a brain storm, where I don't know exactly what to do;
Okay... so. All those discussions were about how to fetch, transcode, interpret etc etc from Cell instruction set to x86. Good... If I'm right .. all that workload is done by CPU, and GPU is used only for OpenGL as backend. Is that right?
What I'm raising here is just an organization and architecture idea that comes ... anyway... Direct to the point, if a part of ISA is *executed* on the GPU, majority(or only or part of) the SIMD ones dispatched to SPE? It can be cached until X number of instructions then dispatched to GPU.
I'm not saying that it's better or easily done... I'm just raising the idea: "If some workload is done by OpenCL on the GPU. Releasing or reducing the workload on CPU". And I know all the problems to access memory outside GPU, the gap between dispatch and execute per instruction or per block of instructions.
I have little HPC experience on hybrid cluster, using OpenMP and OpenCL(for the fun..even it were a financed project by my university :}) , and *maybe* some thing can be reached dividing the workload on CPU and GPU for emulation purpose.
Right now I'm reading "Emu / CPU /", "Emu / Cell /" and "Emu / Memory" .. so I'm posting that before actually understand all the code... because I will do that on my extra free time ^^...and it can take some days.
Well, thanks
As far as I remember Dolphin had their experiences with OpenCL and even doing something so fitting for it like texture decoding did not yield significant speed-ups and more often than not slowed things down.
I don't think the GPU is going to help us a lot with CPU emulation for the time being. I can imagine doing game specific "spu kernel replacements" later on. Similar to the way that for example PPSSPP detects some copy functions by hashing them and then doing the copying natively.
But I can't imagine a general purpose SPU to OpenCL translation working out too well, but you're free to try.
As a side-note if anyone has gotten some real-world compute-heavy SPU kernels you could post them here for s8box to do some preliminary tests (just hand-write a cpu and an opencl implementation outside of rpcs3 and benchmark those to see if the CPU-time spend copying and retrieving the data is even worth it)