(04-24-2014, 10:59 PM)fenix0082 Wrote: [ -> ]Amm...i bad know english, and know little bit about Rpcs3, but if I understand this text, this guy wants to help Rpcs3?
(Sorry, for this terrible english)
He just tells devs about his ideas. But i guess devs knows about this methods and what they are doing more better.
1. A PPU recompiler (or two or three?) is planned, and an SPU recompiler is already in place. I am very certainly no expert on the matter, but going through an IL will just be extra indirection when you could optimize the target assembly anyway.
2. The only reason to do this would be for an academic research project. If you feel you are an academic, feel free to try doing it, but that is not the goal of most or all existing emulation projects.
3. Inline assembly is almost never worth it. rpcs3 already uses SSE intrinsics on some platforms. Assembly just convolutes the code while simultaneously making it unportable, for extremely little to no benefit whatsoever. You will have a much better time increasing cache coherence or, even better, fixing the larger architectural or implementation issues (like the memory manager as Bigpet points out.)
Thanks for the write-up. Anything else in that head of yours?
But building your own recompiler from scratch like PCSX2 did seem really,extremely legit. In some cases, even though the PS2 is different from the Wii, PCSX2 runs better on most amd cpu's then on Dolphin. And this time, AMD can really be helpful with their AMD FX 6300 / 8350. Intel Core i7-4960X 6-core cpu and developers wont just go around saying " Go get an I7-4960X with a Geforce GTX 750 / Radeon HD 6850 ( something that possibly wont bottleneck the i7-4960X ) " to people with $500-$600 budgets.
PS2 has several dedicated cores, so I guess PCSX2 is exploiting the cores of a PC to make the emulator running faster. There is a small difference, though. CPU frequencies between PS2 and PS3 have a huge gap and PS3 can be considered as real 7/8 cores (it is not clear whether the 7th SPU can be used by a game) whereas AMD FX and i7 (4-core) only offers real 4 physical cores (8 logical processors which not fully runnable in parallel).
Yeah, multiple things are slow, it was just when I was profiling the memory reads were pretty much obscuring any other slow instructions, it was spending 80-95% of its time in the loop from "isMyMemory" or whatever it's called but I don't even remember whether that was release or debug where I pulled that profiling info from (but if I remember right these reads were also triggered per CPU instruction read).
(05-13-2014, 08:11 AM)Bigpet Wrote: [ -> ]Yeah, multiple things are slow, it was just when I was profiling the memory reads were pretty much obscuring any other slow instructions, it was spending 80-95% of its time in the loop from "isMyMemory" or whatever it's called but I don't even remember whether that was release or debug where I pulled that profiling info from (but if I remember right these reads were also triggered per CPU instruction read).
I got the same when profiling in release mode -- and yeah, every memory block read/write uses IsMyMemory/IsInRange etc. They are triggered per instruction decode.
Ok, I just thought of new idea for RPCS3 for speed-ups. You continue down the same path of dynamic recompilation for speed improvements, of course, but a new thought:
You pre-decode the entire binary step-by-step, either during execution whilst running itself or decode the entire game into an intermediate language before running. By doing this, you can cache the decoded "trigger instructions" that can directly modify state info of RPCS3. You are essentially caching and/or storing the instructions, however, the "instructions" will be in a form that the emu will read them and immediately take action without having to decode them.
Alas, you can of course run the game while doing this but it will be slower, so it can decode the binary in pieces and load it in completed segments.
Or you can just decode it all and wait a while (depends on some games, some over 10 GB I think and can take a while), but it would be worth it because you'll notice speed. Of course RPCS3 will need to be optimized for this, but nonetheless I would love to see what others think of this, as I really feel it could make a difference better than just dynamic recompilation.
On another note, I've tried this stuff before and while I did so in something much, much simpler than a PS3's architecture I notice performance increase because less clock cycles are needed, but much more RAM could be(RPCS3 may need to cache gigabytes of data in this fashion to see big speed-ups).
I know this all is probably too much to read and my English isn't quite perfect yet, but if you could see this the way I am seeing it working I think you'd at least see some benefit in this. I feel that this is one of the only methods where high-end games will see closer to full-speed on high-end computers on the market today and in the next 5 years.
(05-14-2014, 05:15 PM)Ontakeio Wrote: [ -> ]Ok, I just thought of new idea for RPCS3 for speed-ups. You continue down the same path of dynamic recompilation for speed improvements, of course, but a new thought:
You pre-decode the entire binary step-by-step, either during execution whilst running itself or decode the entire game into an intermediate language before running. By doing this, you can cache the decoded "trigger instructions" that can directly modify state info of RPCS3. You are essentially caching and/or storing the instructions, however, the "instructions" will be in a form that the emu will read them and immediately take action without having to decode them.
Alas, you can of course run the game while doing this but it will be slower, so it can decode the binary in pieces and load it in completed segments.
Or you can just decode it all and wait a while (depends on some games, some over 10 GB I think and can take a while), but it would be worth it because you'll notice speed. Of course RPCS3 will need to be optimized for this, but nonetheless I would love to see what others think of this, as I really feel it could make a difference better than just dynamic recompilation.
On another note, I've tried this stuff before and while I did so in something much, much simpler than a PS3's architecture I notice performance increase because less clock cycles are needed, but much more RAM could be(RPCS3 may need to cache gigabytes of data in this fashion to see big speed-ups).
I know this all is probably too much to read and my English isn't quite perfect yet, but if you could see this the way I am seeing it working I think you'd at least see some benefit in this. I feel that this is one of the only methods where high-end games will see closer to full-speed on high-end computers on the market today and in the next 5 years.
So you are basically saying that rpcs3 should create some kind of bytecode that can be executed faster? Really this is sort of like what a recompiler would do, only slower.
I think the fastest way is an optimized recompiler that translates the game assembly in to
optimized x86 assembly. Optimized is the hard part though, but it would in principle give you native speed on the code execution. (There is of course a fair bit of other stuff going on that makes things slower.)
Unknown, I think you are mistaking game size with RAM size factors. Certainly almost every PS3 game is larger than the maximum PS3's RAM, since game size is not 1:1 with RAM, and even many PS1 games were more than 256 MB of size on CD-ROMs.
For example, Final Fantasy XIII-2's image is roughly 14 GB on a single-layer BD; however, upon execution, it will be limited to 256 MB of main RAM.
Much of that is likely data of the game and not just pure code as well; everything of the game binary's code is not all at once loaded to RAM anyways.
And if I understand Ontakeio correctly, they are trying to mean that the whole game can be decoded to some calling functions that can change RPCS3's state and update without having to go through the decoding phase at all, or skip some of it.
(05-15-2014, 09:24 PM)[Unknown] Wrote: [ -> ]I'm sorry, but it's pretty clear you did not understand my post. I discussed binaries, dynamic loading, and also the decoding stage.
-[Unknown]
Well, stepping by what you wrote it's obvious that what you're saying is wrong:
Quote:Certainly no game binary is 10GB.
http://www.examiner.com/article/file-siz...d-xbox-360
Perhaps we have a different idea of "game binary" here. Game "binary" is everything that makes up the game on the medium, not the max capacity of the medium itself. You also have a skewed description by thinking that RAM size must be 1:1 with the size of a binary itself, which couldn't make any sense.
Final Fantasy 7 for PSX takes up over 700 MB for disc 1's total binary image, including all code and data of the entire game; PSX has 2 MB of RAM total. Based on your assumption that a binary can't be larger than the size of RAM, how could the Playstation run the game then?