Boost without Bullshit (boost-wobs)

January 25th, 2015

After some friends were ranting about Boost on twitter, I decided to do something about the single Boost dependency in my game code… but not the usual thing (rewriting it).

I’ve been using Boost’s operators.hpp for nearly a decade, because it actually makes writing operator code cleaner. Alas, my trimmed down subset of Boost was bringing nearly 60 source files with it, just for one file. That’s dumb. Thus, Boost without Bullshit was born:

https://github.com/povrazor/boost-wobs

The idea behind boost-wobs is that yes, there is some good stuff in Boost, but many developers refuse to use it because of the “BS” that comes with it. You can’t just add a single header to you project and get only the feature you want. It’s all or nothing.

I’ve now fixed operators.hpp to make it standalone with zero dependencies. Optionally, it also supports operations on std::iterator’s, which is now enabled using a special define. It used to be boost::iterator’s, but boost/iterator.hpp is depricated, and actually just std::iterator, so why not just use std::iterator instead?

When and why use UTF-8 and UTF-16? Stringy thoughts.

January 12th, 2015

This was a bit of a shower thought, but until this I didn’t have a good reason to choose UTF-16 for anything.

UTF-8 makes a lot of sense. It has all the benefits of ASCII text formatting, and the ability to support additional characters above and beyond the ASCII 127 or 254. It’s very similar to ASCII, you just have to be careful with your null and extended codes. No arguments here.

The main problem with UTF-8 is that you can’t just iterate as you do with ASCII (ch++). You need to check for special characters every step (just in case). On the plus (I think) all special characters are above a certain byte value, so the initial test is cheap, but it’s the variations that are a little costly.

UTF-16 makes a lot of sense for the actual storage format of strings in RAM. Most characters are a single 16-bit Glyph (any product I plan on working anyway), but there is support for 32-bit two-character Glyphs as well. UCS-2 is the name of the legacy wide-character pre-UTF encoding.

With UTF-16, if you prepare for it, you can get away with simple iterations (short* ch++). Not to mention, it’s a 2-byte read, which should effectively be faster/lest wasteful than multiple 1-byte reads. It would be wise to just discard/replace characters above the 16-bit range (no Emoji). China may have a problem (GB 18030), but its a controlled situation, and most glyphs will be on the Basic Multilingual Page anyway. Plus there’s a whole 6k of Private Use glyphs if needed anyway (item iconography like swords, shields, potions, etc). That’s kind-of a nice feature.

ASCII still makes a lot of sense for keys, script code, and file names. Since UTF-8 is backwards compatible with 7-bit ASCII, if we impose the restriction that all keys will be 7-bit ASCII, then our string are an optimal/smaller size. Also, UTF-16, though each character is wider, as long as it’s the correct endianness, it should also be 7-bit ASCII compatible. With the exception of strings and comments, it’s reasonable to impose a 7-bit ASCII restriction. After all, this is still a UTF-8 compatible file.

16bit String Lengths are a reasonable limitation. 0 to 65535 will cover 99.99% of string lengths. The only thing that will push it over is if you happen to have a novel worth of text, or a large body of text that’s heavily tagged (HTML/XML). So it’s bad for a web browser or text file, but fine for anything else. In an optimal use case, this means you’re using 3 extra bytes per UTF-8 string (16bit size, 8 bit terminator) or 4 extra bytes per UTF-16 string (16bit size, 16bit terminator).

This aligns well if you NEVER plan to take advantage of 32bit or 64bit reads. If you do however, then having a larger string length means the string itself will be padded to your preferred boundary. Use platform’s size_t for optimal usage.

Most standard zero-checking string functions are better fed a pointer to the data directly. This lets them work exactly like normal C strings, looping until they hit a zero terminator. But a smarter string function may want to know size faster (i.e. if equal, confirm sizes first).

A pre-padded string length can be made mostly compatible with a pre-padded datablock type. The string version will be one character longer, but any functions that deal with copies will be (mostly) the same (one extra action, pad with a 0 at the end).

Line Chunked, in addition to whole strings, is a useful format. A text editor would want a fast way to go from line 200 to 201. Always iterating until you find a newline is slow, so it’s best to do this initially. If the line sizes don’t need to change, you can butcher the source string by replacing CR and LF with 0, and having a pointer to each line start. If you need to change lines, and those lines will definitely grow larger than some maximum, then each line should be separately allocated, and be capable of re-allocating.

Unfortunately this doesn’t help for word wrap. Word wrapping is completely dependent on the size of the box the text is being fit in to. Ideally, you probably want some sort of array of link lists containing wrap points. Each index should know how many wraps it has, meaning you’ll need to track both what line and what wrap you are on. Process the text the same way as Line Chunked, but you wrap at wraps.

Text Interchange formats like JSON can be padded with spaces/control characters. A padded JSON file wont be as tightly packed as a whitespace removed one, but the fastest way to read/write string data is when it’s 32-bit aligned. On many ARM chips, it’s actually a requirement that you do aligned 32-bit reads. On Intel chips it wastes less cycles.

Or if all you care about is EFIGS (English, French, Italian, German, Spanish), then just ASCII and be done with it. There should be some rarely/never used characters under the Extended ASCII set, which gives room for your custom stuff.

Long story short:

Use UTF-8 as an external storage format. Use UTF-16 as an in-memory storage format. Use ASCII for keys, script code, file names, etc. Or be lazy, do ASCII for EFIGS.

cJSON-LAX (Relaxed) on GitHub

January 12th, 2015

Today I made some modifications to cJSON, a JSON parsing C library. And like a good boy, I’ve made them available on GitHub.

https://github.com/povrazor/cjson-lax

The changes are against the strict JSON spec, but instead are usability improvements:

  • Add C style /* Block Comments */ and C++ style // Line Comments.
  • Added support for Tail Commas on the LAST LINES of Arrays and Objects

The benefits of comments in JSON should be obvious, but Tail Commas are quite the nicety. When manually editing JSON files, you sometimes re-order your lines with copy+paste. According to the spec, all members of an Objext or Array are followed by a comma, except the last one. Removing that comma, or making sure it exists on all lines after copy+pasting is an unnecessary pain. Now its optional.

Linux (Ubuntu) 2015 Setup Notes

January 5th, 2015

Hello 2015. The laptop I’ve been using the past few years actually fell apart, so I bought a new one. I started using Linux (Ubuntu) almost exactly 1 year ago, and as much as I like it, it’s not always the most logical and obvious OS to use, so I take notes. Here are notes.

The machine is a Lenovo X230 Tablet. It’s actually an older laptop, a 2013 model, but to put things mildly 2014 was NOT a good year for Lenovo Thinkpads. Fortunately, as of CES (today/yesterday), Thinkpads will return to being useful.

Notable specs: Core i5 Processor with Hyperthreading. Intel HD 4000 GPU. 2x USB 3.0 ports, 1x USB 2.0 (powered). Oh, and of course it has a TrackPad, a IBM TrackPoint (red nub joystick mouse), a Touch Screen and a Wacom Digitizer (Pen).

I dropped a 500 GB Samsung SSD in to it, dedicated a 200 GB partition to Windows 7 Pro, and the rest to Ubuntu 14.10 (~260 GB, though I’m thinking about giving Windows a little more). I’ve installed various dev tools and SDKs on both. As expected, Windows has used about 100 GB of its space, and Ubuntu about 20 GB. Typical. ;)

Return of Oibaf (bleeding edge Video drivers)

Setting up the machine went very smoothly, except for two issues:

– The Steam UI was … strange. Not slow, but unresponsive (had to right click to refresh)
– the Print Screen key (and scrot) could not take screenshots (they were wrong)

The solution was to upgrade the video driver. The very latest Intel and OSS drivers are always available as part of the Oibaf graphics driver package.

https://launchpad.net/~oibaf/+archive/ubuntu/graphics-drivers

To add them, add the PPA, and do an upgrade:

To remove them, use ppa-purge:

The latest-and-greatest drivers can sometimes be risky to use. I normally don’t use them, but for a time I did, and all was fine until Mesa mainline got busted. Sadly oibaf doesn’t keep “last known good drives” around, only the bleeding edge, so I only recommend it if all else fails.

Skype Tray Icon Fix

Skype is a 32bit app running on 64bit Linux. To correctly make tray icons work, this package fixes it.

Download it, and restart Skype.

TLP – Advanced Linux Power Management

This little piece of software dramatically improves my Linux battery life. I had used it in the past, but for a time I was using pre-release builds of Ubuntu, and no TLP update was available (I hadn’t learned about how to grab old versions of software from PPA’s yet).

http://linrunner.de/en/tlp/docs/tlp-linux-advanced-power-management.html

Installation is pretty easy. First add the PPA.

Then grab the packages.

I run Thinkpad laptops, which there is extra software for.

SDL2 Setup

I always seem to forget the essentials needed to build SDL on Linux. Hopefully this list is correct.

SDL_Mixer 2.0 Prerequisites on Ubuntu

For OGG support do:

And the rest:

Running Emscripten

The emrun tool can be used to quickly test Emscripten compiled apps.

First, compile with the command line option "--emrun".

Then run the script:

More details.

SDL2 and Emscripten

NOTE: SDL2 Emscripten support is brand-new. It was just added to mainline SDL on Xmas eve.

There are two ways to build SDL2 apps using Emscripten. The SDL way, and Emscripten way.

The SDL way is building SDL2 from source, using various Emscripten tools. There are notes about how to build SDL_mixer, but I was unable to get this working.

https://hg.libsdl.org/SDL/file/817656bd36ec/docs/README-emscripten.md

The Emscripten way uses Emscripten-Ports. Details can be found here:

http://kripken.github.io/emscripten-site/docs/compiling/Building-Projects.html

As of this writing, only SDL and SDL_image are available in ports (not SDL_mixer).

https://github.com/emscripten-ports

Fixing the Dokuwiki

As root, give ownership of the data directory to nobody.

AMD Graphics and Steam

My other PC decided to be a pain.

If you’re having troubles running Steam, you may need to:

– Switch to the Open Source driver
– Install Steam updates
– Switch back to the proprietary driver

You can do this with the “Additional Drivers” program.

Open it, wait a moment for it to show up, and do the above.

SauceDriver

You can quickly restart the graphics driver by logging out (instead of rebooting).

PHP Mad Notebook

October 21st, 2014

This isn’t a blog. It’s a notebook.

APCu Functions

Arguments in []’s are optional. Cross reference with APC docs. PECL Page. Github.

Iterator functions are omitted, but also available.

The above is a cleaned up version of what’s output by "php --re apcu".

Perl-like ?: Operator

From Tips.

Data Format: Raw PHP variables (var_export)

To serialize something to disk in the fastest way PHP can read it, you make it source code by calling var_export. Whenever the file changes, it should cause a cache miss with OPcache.

To use it:

Alternatively, if you store it a different way:

But this will cause OPcache to miss every time.

Data Format: JSON (decode, encode)

Apparently this is the fastest encoder, as of PHP 5.3. Benchmarks.

json_decode, json_encode, json_last_error

Data Format: Serialize, Unserialize

A faster decoder (slower encoder), and types/classes persist.

serialize, unserialize

Data Format: IGbinary

An alternative, external binary encoder/decoder. According to benchmarks, the fastest.

https://github.com/igbinary/igbinary (PECL)

Smaller too.

Tips and Tricks:http://ilia.ws/files/zendcon_2010_hidden_features.pdf

Data Format: CSV

Reading Only.

http://php.net/manual/en/function.str-getcsv.php

Data Format: XML

Reading Only (there is writing, but it seems more difficult).

If you prefer Array format (like me), here’s a function.

Then simply

Data Format: HTML

Use Simple HTML DOM parser.

http://simplehtmldom.sourceforge.net/ (Manual)

Data Format: String Delimiter

explode, implode

unset

val and type juggling

Shutting Down ludumdare.com/planet!! (Just the planet)

September 4th, 2014

Most people don’t know we have a blog syndication sub-site on Ludum Dare called the Ludum Dare Planet. It’s an extra WordPress blog and site that eats very few resources, but still needs to be regularly updated.

For the sake of security, I’ll be taking the down the planet website. I’m going to wait a few days first, but if nobody responds, I’ll just take it down. If you need time to grab your favorite RSS feeds, come to my blog (http://toonormal.com) and let me know.

EDIT: And the Planet is now gone. Nothing to see here. Move along.