czwartek, 13 marca 2014

etcpak 0.2.2

This version contains some minor performance improvements and a benchmark mode, which can be activated using the -b parameter. It will perform 50 compression passes and print out the average time for one pass. It should provide better environment for measurements, as the PNG decode is the slowest component during normal operation.

I've also made an example 8192x8192 image available for test purposes. It is based on the Carina Nebula shot from Hubble.

For comparison, here's the previous method of speed measurement, heavily influenced by the PNG decoder:
$ time etcpak.exe 8192.png

real    0m1.471s
user    0m0.000s
sys     0m0.030s
And here's the new benchmark mode:
$ etcpak.exe 8192.png -b
Image load time: 1330.949 ms
Mean compression time for 50 runs: 631.308 ms

Download: https://bitbucket.org/wolfpld/etcpak/downloads

sobota, 10 sierpnia 2013

etcpak 0.2.1

A new version of etcpak has been published today. What's new:
  • Reduced number of spawned threads and context switches.
  • Memory mapped files are used for output. This allows writing compressed data to disk during compression. The downside is that writing PVR output files no longer can be disabled.
  • 32 bit version has been discontinued. From now on only 64 bit version will be provided. It was always the recommended one to use, anyways, as it performed much better than the 32 bit one.
  • Various optimizations.
etcpak 0.2.1 is 10% faster than etcpak 0.2, with the compression time measured at 0.08 s (after deducting PNG load time).

Download: https://bitbucket.org/wolfpld/etcpak/downloads

niedziela, 7 lipca 2013

etcpak 0.2

It would appear that etcpak was programmed in quite inefficient way up until now. That's a funny way to talk about a program which was an order of magnitude faster than competing ones. And it's one of these obvious things which you wonder about afterwards, how wouldn't you think about it in the first place.

etcpak will no longer wait for compression to start until all image data is available. Data processing will now be performed simultaneously with PNG image decode process, which basically means that by the time the source image is fully loaded, we're almost done with the compression.

Some numbers.

TestTime (full)Time (minus PNG load)
etcpak 0.1 RGB1.12 s0.45 s
etcpak 0.1 RGB + alpha1.36 s0.69 s
etcpak 0.2 RGB0.83 s0.16 s
etcpak 0.2 RGB + alpha1.00 s0.33 s

This new version can be downloaded from https://bitbucket.org/wolfpld/etcpak/downloads.

niedziela, 9 czerwca 2013

Fastest ETC compressor on the planet: etcpak

My new (mobile) game has quite a lot of assets. Four different resolution sets, more than 23000 source images in each of them. After packing everything into atlases, the resulting data set is about 1106 Mpixels big. Since each pixel occupies 4 bytes (RGB + alpha channel), the raw data would roughly fit on a DVD disc.

Of course, 3D hardware supports texture compression basically since the beginning of 3D revolution, which is quite some time. In the mobile space there are two major texture compression formats. The first one (and the only one supported on iOS) is PVRTC. The second one is ETC (Ericsson Texture Compression) and it's supported by virtually all OpenGL ES devices. It's also here to stay, as ETC compression support is mandatory in OpenGL ES 3.0 and OpenGL 4.3.

Now, the problem is that compression takes time. A looong time. On a dedicated i7 CPU it takes about 3 hours to compress all my atlases to PVRTC format, using ImgTec's PVRTexTool utility. ETC is better with about half an hour, but that's still unacceptable for a quick, iterative development. There are various quality settings, you can choose between perceptual and non-perceptual processing, but even in the fastest mode the compression is still unbearably slow.

There are other compression utilities available, but they fare no better. I am aware of the following ones:
I have a test image, which is a real-life data 4096x4096 RGBA texture atlas filled up to about 87%. Since some tools load PNG files and other require PPM input, which is basically streaming raw image data from the disk, I have measured the load time of the PNG test image on my i3 540 to be 0.67 second. Every utility loading PNG image will have that time deducted from total time, even if the program reports it took longer (for example, crunch says it loads the texture in 1.029s).

ToolCommand lineTime
PVRTexToolCL 3.40PVRTexTool.exe -i atlas-base1.png -o pvr.pvr -f ETC1 -q etcfast24.71 s
ericsson ETCPACK 1.06etcpack.exe -s fast -e nonperceptual atlas-base1.ppm etc.ktx23.86 s
mali etcpack 4.0.1etcpack.exe atlas-base1.ppm . -s fast -e nonperceptual -c etc119.20 s
crunch (rg-etc1) 1.04crunch_x64 -ETC1 -fileformat KTX -mipMode none -uniformMetrics -dxtQuality superfast -file atlas-base1.png4.41 s

So, crunch is really fast, isn't it? Well, I didn't know that before I set out to write my own compression utility. And it runs circles around crunch. The compression time is 0.45 s. That's not a typo, it's 10x as fast as the fastest utility previously available. It's 50x as fast as PVRTexTool. And it has a special mode for processing alpha channel textures. Creating two ETC textures, one with RGB data and a second one with alpha channel takes 0.69 s. That's the time it takes to decompress the PNG image. And it's so fast you will be limited by HDD I/O wait.

As for the resulting image quality, my tool was never intended for production usage. And for testing during development it doesn't look that bad. Take a look.

OriginalCompressed

You can download the Windows executables (both 32 and 64 bit, but use 64 one, as it's a lot faster) from https://bitbucket.org/wolfpld/etcpak/downloads. As usual, MSVC redist is required.

[edit: new version is available]

Source code can be found at bitbucket.

środa, 24 października 2012

N900 software rendering

Some time ago I wrote a software renderer and presented the video on N73 running it. Then I ported it to N900, updated the model and lighting, but never actually published the video of it in action. Well, here it is:
I think the low FPS values (around 12) were the reason it was not published. It is due to the amount of triangles the new model consists of. The old Caesar's one was much simpler and thus rendered faster. With proper low-poly model the above animation would run with at least 30 FPS without any problems.

Fun side note. The software renderer on N900 was actually faster than running the hardware accelerated version. Well, hardware and/or drivers sucked greatly on that phone.

wtorek, 2 października 2012

CRT-like rendering on LCD monitors followup

Apparently some folks on some strange forum-like site have been wondering how the CRT effect works. Next time you should write a comment to the entry instead of relying on me watching site traffic analysis.

Anyway. I have prepared a stripped down version of the code and it should be simple enough for anybody competent to replicate the effect in his own code. As the ReadMe file says, the shaders are not optimized in any way whatsoever. Some of them are written in a blatantly bad way. But it's a good starting point for anyone interested.

Windows binary: http://team.pld-linux.org/~wolf/CRT%20demo.7z. You will probably need MSVC 2012 redistributable package.
Source code: http://team.pld-linux.org/~wolf/CRT%20demo%20src.7z

niedziela, 6 maja 2012

CRT-like rendering on LCD monitors

The advances of technologly in the past few years have given us quite a nice improvement in the quality of images displayed by our monitors. Thanks to RAMDACs that don't suck, LCD monitors, digital video interfaces, then LCD monitors that don't suck we're now able to display sharp visuals of unprecedented quality, at rather big resolutions too.

But there is a problem. Some types of content looked great in the past, but there is something missing when it's viewed nowadays. There are people who may not even know how it's supposed to be, due to the old technology becoming obsolete. Text mode looks different. When you want to use 8x8 font it ends up looking either blurred or super blocky. Use modern TTF fonts and you get nice curved shapes, antialiasing, etc., but it looks just wrong. It's not how it should be anymore. Another good example are 8-bit emulators. What has happened? These games used to look good, but now they are ugly in their perfectness!

Well, some things need to look bad to look good. Thanks to programmable GPUs we can now re-introduce all these bad things that were plaguing us in the CRT days, so that we can be happy once more. Let me show some pictures, each split in half. The left side is post-processed and the right side is the original content.





Some notes:
  1. The effect is dynamic and looks better when it's watched live.
  2. This is not an emulator, these are just screenshots of 8-bit games. I am using the post-processing for other purposes.
  3. This is based on what I thought would look good, not on any comparison to a real CRT, or analysis of errors happening in the VRAM -> analog -> CRT path.