Not all homebrew has to be emulators. Not all homebrew we talk about has to come from here.
A very interesting project has fired up over the past 3 weeks. Starting on IRC as a small project and growing into a large organization over github. Being that the Diablo 3 beta was leaked very early to the public a strong desire to play was fired up in many. As with many games today and particularily games of the Diablo genre almost everything in the game requires a server to communicate with. So for those unlucky few who got shafted on the beta invites and wish to indulge in some Diablo 3 sights check out Mooege. It is a private server emulator for Diablo 3. No it won’t be the real game, no you can’t do quests yet. You can run around the Diablo 3 universe, kill random mobs you spawn for 9001 damage and play with some of the skills for each class. Surely this is enough to satiate your desires for Diablo 3 or exacerbate them, until the release date that is.
See Mooege and read the FAQ for installation info.
Over the past year I was on the intern team for a Canadian research network called SurfNet. SurfNet primarily concerns itself with interactive surfaces and things you can touch like a touch table or SMART boards. The intern team gets projects submitted to them by professors that are part of the research network. It was fun to work on the intern team but unfortunately it was not a lot of team work. The majority of the projects came from my local university and I worked alone. Even after finishing with them September 1, 2011 they are still keeping me on the payroll to work unofficially for them. They wish to put the entire team on a project I have been working on over the past few months.
Every year they hold a conference and gather all the people involved in SurfNet. This year it was held in Calgary, Alberta at the University of Calgary. Excellent establishment, great people and an all around fun time. This year they offered students to submit ideas for workshops. I decided to submit a workshop on Android development. Albeit not to related to SurfNet it is still an interactive surface and almost everyone is beginning to pick up on Android as the best way to developer quick research projects targeting mobile devices. So I gave a 1.5hr talk on Android and it generated some pretty good interest. Here are the slides purely for interests sake.
This summer I’ve been working on campus as some-sort-of-research-assistant. When I showed up for my first day, my supervisor asked me what specific topics in Computer Science I was interested in. Naturally I replied “Gaming, Graphics and Artificial Intelligence”. She said I could use this summer as a learning experience in XNA (Microsoft’s Object Oriented Direct X Libraries, used by PCs, Xboxs and Windows Phones). Basically, I stumbled upon the greatest job ever. I could work independently on a project of my choice, which may or may not end up being used in a study about how games “reward” players.
The next problem was the standard: “What the hell am I going to make?”. Halsafar suggested an Asteroids Clone, “Whenever I need to learn a new language or library, I make Asteroids”.
In a week I had a white triangle firing orange bullets at white circles. A week later I had a ship with rainbow colour particles flying out the back of it. I spent a 2 week period developing a modular “ParticleLibrary” in the Factory Pattern. As expected, it not only had a ton of problems, but wasn’t even close to being in the Factory Pattern. However the learning experience was invaluable, and I am now spending time finalizing ParticleLibrary2.0. Which will probably end up on Homebrew Cafe as well.
Here is what I have come up in ~3 months of XNA self-teaching, and a lil bit of help from the Halsafar.
With the blessing of the author of libgambatte I am able to publically announce GambatteDroid. Gambatte is a cycle accurate open source Gameboy and Gameboy Color emulation library. As a developer I must say that libgambatte is very well written. Kudo’s to all the authors involved in the project (Project Home) As a Gameboy fan nothing is more nostalgic than playing it on a mobile device. Go ahead and visit the forums if you have any comments to make. Please rate or comment on the market page!
To continue the last post about GENPlusDroid real time rewind. NESDroid rewind uses the exact same method. Required only some small changes to the emulation core, nothing major just how it loads/saves its state. The FCEU state saves are around 75Kb per frame. We are using a 20MB heap for the NES version which only provides 15-20s of rewind as you see in the final segment of the video.
Certainly more fun to real time rewind NES that Genesis in my honest opinion.
Not to long ago I adopted a method created by Maister into the genplus emulator that allows for real time rewinding (think Prince of Persia or Braid). The method is somewhat like video compression P-Frames where you store the diff of the last frame. Emulators like FCEU use state saving to disk to pull this off which is very expensive. The individual state frames are often large ranging anywhere from 75Kb to 300Kb. This method of storing the diffs in memory allows us to provide almost 30 seconds of rewind (varies depending on the game). You can do the math to get a rough estimate how much space 30 seconds at 60fps worth of frames would take in its raw form. The only real performance hit is the serialization and unserialization of the state itself, this proves fairly negligible. This is really just a for fun feature it is however neat none the less.
One of my code projects is RSound, a simple and lightweight networked audio system. The main focus is providing a fully documented protocol, client library and server for transfering audio over LAN to other computers, on-the-fly.
It was originally designed for music and video players, but turned out to be reasonably suited for games as well if used correctly.
Unlike many networked audio systems, RSound solely focuses on the LAN part. There is no mixing component, as that’s what the backend driver is supposed to do.
If possible, audio data will pass through completely unaltered to the backend sound driver. If desired by the user, or absolutely required by the audio driver, audio will be resampled and/or converted before being sent.
Since the whole thing is somewhat specialized, it does not add itself to the pleathora of sound APIs. I’ve written drivers for MPlayer, VLC and MPD, as I use these quite often, but there’s no way other programs will care much about this API.
I’ve put down a lot of work attempting to hook under existing APIs instead:
There is a reasonably mature ALSA plugin. It does work well with music and video players, but is useless for gaming due to some quirks with the plugin API.
OSS has a userspace emulation via CUSE (FUSE). To my knowledge, CUSE is only supported on Linux, but it works quite well. Even PulseAudio can be rerouted this way via the OSS sink, although doing that is surely overkill and pointless. As long as mmap() isn’t used, games should be somewhat feasible to play over it.
This covers most applications on Linux.
General support exists for Linux, OSX, Windows and BSD systems. There is also additional client support on the PlayStation 3, used in most of the PS3 emulators.
I wanted to enable applications in Windows to take advantage of RSound. Since getting direct support for it is not going to happen, I wanted to take a jab at emulating a sound API, similiar to ALSA and OSS in Linux.
Hooking APIs in Windows
In Windows there would be three ways (that I know of) to achieve RSound rerouting without the application having explicit support for it.
Hook into the mixed stream of the audio card, and pass that over the network, without any application knowing of it. Either the driver has to provide this as a virtual capture stream or you can use stuff like Virtual Audio Cable.
Use DLL injection hooks on applications, overriding calls to DirectSound/XAudio/whatever, and reroute these to RSound.
Use a clean implementation of a system .dll.
The first alternative relies on either driver support or installing commercial software, which isn’t acceptable unless it’s the only way out. There is also a high possibility this will add lots of additional latency which cannot be compensated for by applications, since it cannot know about this extra layer of buffering. The result would be video horribly out of sync, which is bad :[
Second alternative is novel, but perhaps too hacky to really implement in practice.
I chose to use a clean implementation, and the easiest and most widely target would be DirectSound. DirectSound is a stone age API widely used for audio applications. It's today deprecated by WASAPI and XAudio2, but support for DS still remains strong due to the massive amount of programs using it.
DirectSound
Applications using DSound load a DLL, dsound.dll, call DirectSoundCreate() in it, which gives you an IDirectSound COM interface back, which is then used to create buffers (IDirectSoundBuffer), etc. Essentially, if you can override DirectSoundCreate, you can pass it a custom implementation, which here is ... RSound, and take it from there. This is certainly fine and dandy. How can we force programs to load our custom dll rather than the one found in system32?
Lucky for us, Windows searches in the same directory as the .exe for .dlls. We just pop the custom .dll into the main directory of the program we want to hook and voila. (Did anyone mention security issues with this? :])
Next part is actually emulating this API in a reasonable way. DirectSound is a highly hardware centric API, thinking in terms of a raw ring buffer, found typically on actual hardware, or very close to it. This made a lot of sense in the 90s where sound cards were often able to mix audio in hardware, and CPUs were too slow. Sadly, this model caused a lot of bluescreens of death due to misbehaving drivers, so more modern audio stacks do a lot more work in software instead, making the SW <=> HW interaction less error prone.
Implementation
In Windows 7 and Vista, DirectSound doesn’t talk directly to hardware like it used to, but is emulated in WASAPI, losing its hardware accelerated features, aww. However, it’s still a rather straight path to the HW compared to going over the network. Since DirectSound really wants that ring buffer model the audio chain will look like this:
Once everything is set up, DirectSound is quite simple in concept. It’s a ring buffer where you can query HW pointers for play and write, then you write into the ring buffer as you see fit. The HW reads in a circular fashion, so if you stop writing it will play the same sound over and over again. No automatic underrun handling for you, sir!
It’s clear that librsound has to run on its own, in a thread, constantly eating from the ring buffer once started pushing the read pointer forward. librsound has a callback mode which matches this use case quite well. It’s also possible to set latency target so it will add a certain amount of milliseconds of latency to the audio (default is 128 ms) so it doesn’t fill up the whole network buffer (1-2 seconds depending on OS and setup). This is achieved with timers and async messages from the server back and forth. If the network is stable enough, this works quite well.
If we know the latency, we can compensate for it. As the librsound callback is pulling data from our ring buffer, we steadily increment the play pointer. When the application calls GetCurrentPosition(), it expects to get the position played right now, so it can figure out the latency.
We can “rewind” the pointer, say, 128ms, and give the application that. However, we cannot do this if the buffer is too small, say, only 64ms long. It doesn’t make sense to rewind more than the buffer can handle anyways. It will just wrap around twice and end up where we started.
DirectSound defines a write pointer. This write pointer allows the driver to communicate to the program that a certain part of the ring buffer is currently off-limits, i.e. writing between the play pointer and write pointer is a bad idea, since it might currently be playing on the DSP, and writing to this region would probably give funny and/or broken results. This is pretty good since we are now able to lock off a certain region in the ring buffer! 128ms of our audio data is on the network and/or buffered up in the server audio card. That is, inbetween the rewound read pointer, and the actual read pointer the callback is eating from. It does not make sense to write to that buffered region since the latency you expect will not be correct at all. So we set the write pointer just after the callback read pointer.
However, there is a problem here. Some programs simply don’t care about the write pointer at all. They just happily write without checking if they can safely write to this region. The DirectSound API does not seem to allow drivers to enforce this behavior, and I’m sure I’ll break tons of applications by enforcing it. It might not have been a real issue for most hardware devices out there, but at the very least DirectSound thought about this possibility. Too bad that not all applications care.
At any rate, there are fortunately some well behaved applications out there. I’ve been testing some, and here are some applications that seem to work well, with correct latency measurement (video sync! :]):
Media Player Classic – Home Cinema
MPlayer
VLC
Foobar 2000
pSX
ZSNES
Spotify
I’m sure there are more applications working, but my DirectSound implementation surely isn’t good enough to cover all applications. Only the basic functionality is implemented, and it might not be implemented properly even :] If the application forces use of WinMM, XAudio2 or whatever, it will not be hooked :[
Binaries
As a reward for reading (or tl;dr'in) your way down here, I'll put up some test binaries for this.
DirectRSound 0.01 - Win32 binaries. Includes RSD server.
Let's say you have Foobar running on machine A, and you want to send that audio to machine B, which could be for example a media box connected to some beefy speakers.
You start up the RSD server (rsd.exe if Windows) on machine B.
On machine A you put dsound.dll and rsound.dll in the same folder as your Foobar executable, typically found in C:\Program Files\foobar2000 or something like that.
Figure out the IP address of machine B on the network. ipconfig in command line on machine B will tell you. Or ifconfig if the machine is Unix based.
On machine A, create a new environmental variable RSD_SERVER which you set to the IP address of machine B. Hopefully, machine B has a static IP on the network so you only have to set this once ...
Start Foobar. Look at output drivers, you should see "DS: RSound networked audio" somewhere listed as an available driver. Select that.
Play some songs, it should hopefully work. :]
Other environmental variables to try
If sound is choppy you can try to tweak the latency added, by setting the RSD_LATENCY environmental variable to the amount of milliseconds. The default is 128ms if RSD_LATENCY isn’t set. Going under 64ms isn’t recommended.
If the application for some reason crashes or misbehaves, you can generate a log by setting RSD_LOG_PATH.
With the release of bSNES v064, libsnes was born. libsnes is essentially a quite simple API that exposes the functionality of a SNES emulator, allowing various new front-ends to be made. A libsnes library has been developed for, bSNES obviously, and SNES9x.
SSNES is one such front end, which I feel is now somewhat mature. It does sport some features not often found in other emulators, such as a complex multipass shader implementation in Cg and GLSL (XML), and frame-by-frame rewind.
Now, there’s one interesting aspect of libsnes. It is just a streamlined interface to perform common emulator things such as:
Init console
Load ROM
Set callbacks for video, audio and input
Run main loop
This is practically how every emulator works. The difference lies in the details. The question now is, can we implement a libsnes which emulates a different system than SNES for shits and giggles? It’s not like SSNES verifies the ROM image as a SNES game at any rate. It just calls snes_load_cartridge_normal(). It’s essentially emulator agnostic, which we’ll use to our advantage.
Now obviously, libsnes is SNES specific. For example it only has concepts of things that the SNES (emulator) can do. However, there is a console which is quite similar to the SNES in terms of interfacing, the GBA.
GBA fits quite easily into the model of SNES. It has the exact same buttons, sans the X and Y buttons, so that maps perfectly over from a configuration standpoint. Graphics are 15-bit XBGR, just like SNES, 240×160 resolution (except for the weird mode 5), etc. Audio is also signed 16-bit. We have to resample the audio on the GBA to ~32kHz to match SNES, but hey.
I mocked up a VBANext build which is essentially built as a libsnes shared library. Very incomplete, but it runs the two games I care about on GBA, Golden Sun, and Golden Sun 2. Then add in support for rewind and multipass shaders and have a blast. DSP audio effects are nice too, which might help the godawful sound quality of the GBA.
(click for full size)
One added bonus is that we’re now able to test VBANext progress and regressions easily on the PC, since we actually have, you know, decent debuggers on the PC.
I do plan on making a GBA specific libgba at some point, but I’ll see how deep the libsnes rabbit hole goes.
If you want to test VBANext/libsnes, you can build it from our git repo in platforms/libgba, and just hit make. Makefile is only set up for *nix as of writing, but hey. People compiling themselves probably use *nix anyways. :p
Since I did a similar post for the Xbox 360 XDK a while back and since this is intended to be mostly a developer-oriented blog, I thought it would be in keeping with that spirit to let people have a general idea of what to expect from a debug PS3 – which recently arrived.
Photo album: All of the screenshots linked to below (and more) can be viewed separately in this photo album here.
Profiling/debugging
There are some limitations as to what you can do with regards to profiling – even on a Debugging Station. Code instrumentation through gcov is entirely possible, but this was not very appealing to me anymore given that we have been able to do this more or less even on Retail with Themaister’s net-stdio implementation. Gprof seems to have been made unavailable as Sony started moving more and more away from the open source GNU toolchain they originally based their development environment on. What’s left is a proprietary profiler that will not work on Debugging Stations but requires a Reference Tool instead – it has 256MB extra main RAM, bumping up the main system RAM in total to 512MB (plus 256MB RAM for the RSX) – which, together with the unit’s Communication Processor, gives it the extra horsepower it needs to do real-time profiling on-the-fly amongst some other things like realtime video capturing and graphical analysis through GPAD – which I’ll touch on in a moment.
LibGCM HUD
A very nice feature which seems to be only available on debug PS3s is the ability to run code with an RSX profiling tool called ‘GCM HUD’. As the name would imply, this is a Heads-Up Display overlaid on top of the application you’re running that neatly provides you with a point-and-click interface giving you access to features such as as:
Fragment/Vertex program debugger (with the ability to set breakpoints and step through the code line by line from the console itself – with no PC having to be involved or connected to the PS3.
RSX Performance Counters (telling you how effectively you’re utilizing the RSX).
RSX Command Buffer log (showing you the workload of the RSX in real-time)
The whole interface is mouse-driven – so to interact with it, you have to hook up an USB mouse to be able to control it. This is not where the usefulness of this tool ends, however – GPAD (Graphics Performance Analyzer and Debugger) allows you to interface your PC with this HUD and dump all the performance data to your PC. However, again, some options – such as live video capturing – can only be done with the Reference Tool.
GPAD on a Windows PC interfacing with a debug PS3 running GCM HUD
Samples
A brief rundown of the samples that struck my eye -
Deferred shading
Deferred shading lends itself well to the architecture of the PS3 where you have a comparatively humble GPU (RSX) and up to 6 SPEs each running at 3.2GHz – the idea is to essentially subdivide the result of a shading algorithm into multiple parts that can be spread across different render targets/CPUs only to combine them at the end into one composite whole. Using this approach, the RSX can simply offload a lot of the vertex and fragment computations that are done in shaders to the SPEs which in turn crunch through the calculations (the original shader algorithm having been subdivided into parts for each SPE to chew through) only for the RSX to combine all these separate parts and render the picture.
Commercial game developers like DICE have started using this approach for games like Battlefield 3 to achieve graphical results and performance which normally would have been unattainable if all that was available to them was the RSX alone. A link to a slideshow presentation is available here.
Deferred shading sample pics
Yes, the FPS count starts to drop heavily once you start adding extra spot lights.
PSGL samples
At one point, Sony was asking developers whether they would be interested in having PSGL conform to the OpenGL ES 2.0 specs (link here). This has unfortunately never happened however, as developers seem to have mostly preferred to go with libGCM as their main graphics API of choice on PS3. This has meant that the development environment has started becoming more libGCM-centric over the years with PSGL eventually becoming a second-class citizen – in fact, new features like 3D stereo mode is not even possible unless you are using libGCM directly.
However, this has not deterred them from still making available a couple of nice samples that illustrate what PSGL is capable of in conjunction with Cg shaders (which is mostly what we use with the homebrew emulators up until now). Some nice examples show Cg being applied to render hair, skin mapping, normal mapping, parallax mapping, and sophisticated water effects (the latter ones definitely being a cut above our own water.cg shader – made by Themaister).
Below you can see some screenshots illustrating nice tech demos using Cg and SPUs in tandem -
Cg being used for hair specular highlights
Cg being used for water ripple effects
Conclusion
What the debug PS3 will allow us to do is to finally start getting rid of bugs – memory leaks that were simply impossible to flush out with a retail box because of the inherent limitations of printf debugging. There are quite a few memory leaks that can now be tracked down in FBANext PS3 through crash dumps and live TTY logging.
Most ambitious of all, it will finally allow us to start writing code for SPUs – which I didn’t even want to attempt doing before because of the pain that was the retail environment. It remains to be seen whether we can really offload much if anything in the emulators to the SPUs – however, I have seen many creative uses for PPU-to-SPU offloading in some samples already, including function/virtual function call offloading to specific SPEs amongst a host of other things I didn’t even consider before.
Expect to see some real solid progress on MAMENext PS3 very soon, and for the emulators to improve immeasurably overall. Still, there are some inherent limitations even to the Debugging Station – the proprietary profiler will only allow me to do static analysis right now with a Debugging Station – if we wanted real-time profiling, we would have to be in possession of a Reference Toolkit, which is simply above anybody’s budget to be honest. This means that it might be impossible for any homebrew dev to ever do a serious attempt at porting a demanding emulator such as PCSX2 or Dolphin even with a debugging station because libgcov will only carry them so far – definitely forget it altogether on the retail units and the limited development environment they have available to them (whether it be the PS3 SDK or PSL1GHT).
The debugger that comes with it however is a seriously powerful tool and can carry us a long ways since it shows us live disassemblies of the code – meaning we can at least set a breakpoint somewhere, look at the ASM that the ’shitty’ PS3 compiler turned our C/C++ code into, and then rewrite the code ad nauseam until the code generation starts to make more sense of our code and translate it to faster ASM code.
While I don’t wish for this site to start devolving into scene drama, I felt a post was necessary on why it is that I’ve recently decided to part ways with psx-scene and what led me to taking the drastic decision I felt was necessary in order to make a statement. This affects all other homebrew developers in the grand scheme of things – and I acted accordingly as the situation demanded it.
New TOS waiving your rights
After the acquisition by QJ.Net, they quietly injected into the already existing Terms of Services a new law that expressly authorizes PSX-scene to effectively assume ownership of any and all material you make available to the public on their community site. I will let this quote do the talking so that there can be no debate about this -
“User Provided Content, License. You are solely responsible for all content or materials that you post, submit to, or transmit through the Service. By submitting materials or content to PSX-SCENE.COM, you grant PSX-SCENE.COM a license to copy, use, display and create derivative works of the material or content submitted for any purpose, including, without limitation, the promotion and marketing of the Service and the operation of the PSX-SCENE.COM system. By submitting materials or content, you automatically agree (or, to the extent you do not own all rights to such materials or content, you represent and warrant that the owner of the content or materials has expressly agreed) that without any particular time limit, and without the payment of any fees, PSX-SCENE.COM and anyone it permits may reproduce, display, distribute and create new works of authorship based on and including the content or materials. You may not submit content or materials trademarked or copyrighted by anyone other than yourself.”
Now, obviously, this is disingenuous on its face. Stepping aside for a moment the fact that they can not simply waive your rights like this – the fact remains that there are licences to obey when you as a porter decide to port an emulator to another system. Those licences are very clear on ownership rights and your ability or permission to be able to buy or resell them. For psx-scene to put themselves into the enviable position where they believe they can ‘reproduce, display, distribute and create new works of authorship’ speaks to the audacity and clear disrespect of these new site owners who happen to view developers as a bunch of schmucks to be hoodwinked and cajoled into servitude.
‘Successful business != Conning people into waiving their rights’
As a response to that, I took down all emulator threads on their forum that I have personally worked on ( seeing as the previous owner on there granted me Moderator status with no strings attached). Also, I complied with the wishes of many former members who wanted all their content to be removed because they have similar misgrievances about this new ‘clause’ they added to their terms and they don’t like this site profiteering or benefiting in any way from it. Now, obviously, that resulted in my ban – it couldn’t have gone any other way – you know once you throw down the gauntlet like this, what the consequences of that are going to be. I personally requested for them to cast the first stone and ban me – I had no more desire staying on there and feel no need to associate myself with that site anymore.
Now the sad thing is – this kind of profiteering has become fashionable now over the past few years – when you mistake being a ‘good entrepreneur’ with being a ‘con artist’ and a ‘profiteer’ and, worst of all, you don’t even care about any of the amoral connotations that brings along with it – then obviously you’re going to have friction between developers who are doing this mainly out of a passion and then these kinds of cynical businessmen following in the grand footsteps of PT Barnum and his ilk.