Drive ya Nuts
November 25th, 2010
Last time I was home I found my old "Drive Ya Nuts" puzzle. I don't remember ever solving it. Here it is, unsolved, in all its frustrating glory.
This Thanksgiving I'm on a long overnight in Nashville, TN so I wrote a prolog program to end this once and for all. New puzzles come solved in the retail box but there is no good reason to believe that this is the only solution.
use_module(library(lists)).
% Nuts.
nut(t, 1, 2, 3, 4, 5, 6).
nut(u, 1, 2, 5, 6, 3, 4).
nut(v, 1, 3, 5, 2, 4, 6).
nut(w, 1, 3, 5, 4, 2, 6).
nut(x, 1, 4, 2, 3, 5, 6).
nut(y, 1, 5, 3, 2, 6, 4).
nut(z, 1, 6, 5, 4, 3, 2).
% Rotation successors.
succ(r0, r1).
succ(r1, r2).
succ(r2, r3).
succ(r3, r4).
succ(r4, r5).
% A nut is zero rotated if marks match
% a nut definition.
rotate(r0, nut(Name,A,B,C,D,E,F)) :-
nut(Name,A,B,C,D,E,F).
% A nut is rotated R if there is a
% sequence of preceding rotations.
rotate(R, nut(Name,A,B,C,D,E,F)) :-
succ(PrevR, R),
rotate(PrevR, nut(Name,F,A,B,C,D,E)).
% Solution for the seven (*) pegs.
% Letter variables are matching edges.
% * I *
% J H
% C B
% * D * A *
% E F
% K G
% * L *
% Nuts are marked CCW starting from the
% right edge.
solution(Center, Right, TopRight, TopLeft, Left, BottomLeft, BottomRight) :-
Center = nut(N1, A,B,C,D,E,F),
Right = nut(N2, _,_,H,A,G,_),
TopRight = nut(N3, _,_,_,I,B,H),
TopLeft = nut(N4, I,_,_,_,J,C),
Left = nut(N5, D,J,_,_,_,K),
BottomLeft = nut(N6, L,E,K,_,_,_),
BottomRight = nut(N7, _,G,F,L,_,_),
rotate(r0, Center),
rotate(_, Right),
rotate(_, TopRight),
rotate(_, TopLeft),
rotate(_, Left),
rotate(_, BottomLeft),
rotate(_, BottomRight),
permutation([t,u,v,w,x,y,z], [N1,N2,N3,N4,N5,N6,N7]).
So....
?- [drive_ya_nuts].
% drive_ya_nuts compiled 0.00 sec, -48 bytes
true.
?- solution(A,B,C,D,E,F,G).
A = nut(w, 1, 3, 5, 4, 2, 6),
B = nut(y, 2, 6, 4, 1, 5, 3),
C = nut(t, 5, 6, 1, 2, 3, 4),
D = nut(v, 2, 4, 6, 1, 3, 5),
E = nut(z, 4, 3, 2, 1, 6, 5),
F = nut(u, 1, 2, 5, 6, 3, 4),
G = nut(x, 3, 5, 6, 1, 4, 2) ;
false.
?-
Yup, only one solution. I bought this puzzle used at a garage sale around 1990 so it hasn't been arranged for at least 20 years. I'd love to get some insight that makes the computer unnecessary but this is good enough for now. Happy Thanksgiving!
Named and optional parameters in C
July 20th, 2010
/*
This might be language abuse. Variadic macros, designated
initializers, structs, and compound literals makes it possible to
have named and optional parameters. Tested in gcc 4.5 and clang 1.1.
*/
#include <stdio.h>
#define foo(...) foo_((struct foo){__VA_ARGS__})
struct foo { const char *name; int number; };
void foo_(struct foo foo)
{
printf("%s: %d\n", foo.name ?: "untitled", foo.number);
}
int main(int argc, char **argv)
{
foo("three", 3);
foo("hello");
foo(.name = "zero");
foo(.number = argc, .name = "argc",);
foo(.number = 42);
return 0;
}
/* Output
three: 3
hello: 0
zero: 0
argc: 1
untitled: 42
*/
Tagged pointers: a fresh look at runtime dispatch in C
July 20th, 2010
I stumbled upon a technique that I think is quite good. It requires no casts, allows untagged structures and primitive types to be polymorphic variants, does not depend on structural inheritance, can statically overload a function name to operate on multiple types, can inline if the type is known at compile time, and will warn during compilation if a case is missing for a particular type.
Here's an example of a graph with different node types. I left out the obvious details for clarity.
/* The type tags */
typedef enum {
NODE_GEOMETRY,
NODE_TRANSFORM,
NODE_GROUP
} node_type;
typedef struct node_geometry node_geometry;
typedef struct node_transform node_transform;
typedef struct node_group node_group;
/* This is the tagged pointer representation */
typedef struct {
node_type type;
union {
node_geometry *geometry;
node_transform *transform;
node_group *group;
} raw_pointer;
} node_pointer;
/*
These structs are disjoint plain old data.
No structural inheritance. Yay.
*/
typedef struct {
mat4 matrix;
} node_transform;
typedef struct {
geometry_model *model;
} node_geometry;
typedef struct {
int member_count;
node_pointer members[NODE_GROUP_SIZE];
} node_group;
/* Regular monomorphic functions accepting a plain old pointer */
static void node_geometry_draw(node_geometry *);
static void node_transform_draw(node_transform *);
static void node_group_draw(node_group *);
/* Dispatching on the tagged pointer */
void node_draw(node_ptr node)
{
switch (node.type) {
case NODE_GEOMETRY:
node_geometry_draw(node.raw_pointer.geometry);
break;
case NODE_TRANSFORM:
node_transform_draw(node.raw_pointer.transform);
break;
case NODE_GROUP:
/* Notice the error: the type signature of node_group_draw() will catch it. */
node_group_draw(node.raw_pointer.transform);
break;
}
};
We typically tag an object for runtime polymorphism. C++ adds a vtable. GObject copies C++. Numerous tutorials on the web store type information with the object or struct.
But type information isn't always needed in a struct because the struct's complete type (including size and member offsets) is known by the compiler. Type information is only needed when an unknown structure is accessed at runtime through a pointer. From this perspective it makes at least equal sense to store type information with pointers and leave structures uncluttered.
This technique uses a tagged union to represent a pointer to plain old data. Functions are dispatched on the type contained in the pointer representation. Dispatching with a switch statement gives us the benefit of some compiler optimizations and heuristics. If the pointer's type tag is proved to be constant the switch can be completely elided and the case statement inlined. If there are a large number of type tags or they are numerically all over the place, the compiler can emit the best code for that situation.
This is a multimethod approach to polymorphism. It can result in a more fragile make tree, where changes to a single structure will cause more recompilation. That is also a result of code organization so I might be a tad hasty with that criticism. On the upside, dispatching on multiple types is obvious and I get warnings when there are missing cases.
Because runtime type is stored externally in pointers, data objects can take any form in memory. This is advantages if they are members in other structures since the containing structure does not need the type tag. It also allows for representations defined in other libraries or having constraints like vertex and connectivity data for OpenGL.
Lastly, structural inheritance is overrated. I like different structural types that can share an interface. This technique does that efficiently while using the language instead of abusing it.
Ruby on Rails 3.0.beta Knows Cows
February 6th, 2010
>> RAILS_GEM_VERSION
=> "2.3.5"
>> "cow".pluralize
=> "cows"
>> Rails.version
=> "3.0.0.beta"
>> "cow".pluralize
=> "kine"
>> # Huh?
?> "kine".singularize
=> "cow"
I'm Still Seventeen or Older, Apple
January 27th, 2010
I love my iPhone. It just can't believe I'm 30 years old despite me repeatedly telling it otherwise. Continually misjudging me for a 16 year old is slightly less flattering than say a 25 year old, at least for me. I'd like to think of myself as fully developed but as my youth slips away that distinction matters less.

Apple, always attuned to workflow efficiency, couldn't have designed a better dialog. Microsoft might have placed the affected apps in one list that appeared at the start of an upgrade. Linux, well pacman and apt-get at least, aren't polished enough to even confirm my age! That's something Linux needs to fix if it ever wants to be taken seriously. Meanwhile Apple makes the dialog appear several times over the duration of an update process, presumably to accentuate the compliment. They even pause the update to make sure I get the message and unimportant details, like which app is affected, are subdued for a simpler interface.
One way to improve the feature for maximum flattery would default to age 25 for men and age 20 for women. Unfortunately the iPhone is callously insensitive to gender when it comes to detecting perverse material, both in the binary sense and in the liberal high-dimensional, continuous-field kind of gender. Usually only naked people and bloody murder are considered inappropriate for children. The iPhone detects a much wider set of perversions and four of my apps are deemed perverse. Could it be the Shakespeare ereader? Could it be the weather apps? Apple opts not to show the name of the perverse app in the dialog. It's simpler and it protects the children from knowledge, instant messages, clouds, and music.
I'm still 30 years old, which is greater than 17. Time hasn't reversed since the last dialog. Either Apple doubts these facts or it just likes flattering me. I think it's the latter. Thank you, Apple!
Irb is a Text Adventure
December 16th, 2009
Ruby's flavor of classed OO offers a thoughtful balance of rigidity and flexibility along with plenty of introspection and reflection. Over time I have come to experience this model less as a framework for thought and more as a living thing with which to converse. Dave Thomas brought up the prospect of a Ruby object browser over nine years ago. That thread raised questions that still haven't been answered.
Unfortunately for a browser the rigidity offered by Ruby is optional. I personally feel that HTML with links is the only graphical form that comes close to capturing the needed range. I just described a blank slate for a digraph so feel free to think it's lame. The object browser is possible and desirable but after all these years Irb is still the Ruby browser of choice.
My take is that every object in Irb is like a noun in a text adventure, and likewise every object is different. There are common verbs and uncommon verbs, and not every noun supports every verb. The components of trial, error, failure, and reward are the same. Nothing ruins my suspension of disbelief like having to consult documentation, and I like to believe that I am programming for as long as possible. The notable missing link to the underground world of Zork is "look". Here is "look". It is nothing more than a mashup of introspection methods but I've found it indispensable.
# Examples:
# foo.look
# foo.class.look
# foo.method(:meth).look
# foo.method(:meth).source_context
# foo.instance_variable_hash
Object.class_eval do
def instance_variable_hash
ret = {}
instance_variables.each { |v|
ret[v] = instance_variable_get(v)
}
ret
end
def look
{
:class => self.class,
:interesting_methods => (methods - Object.new.methods).sort,
:instance_variables => instance_variables.sort,
}
end
end
Class.class_eval do
def look
{
:class => self.class,
:ancestors => ancestors,
:included_modules => included_modules,
:interesting_methods => methods - Class.new.methods,
:instance_methods => instance_methods(false),
:class_variables => class_variables,
:instance_variables => instance_variables,
:constants => constants,
:source_files => methods(false).collect { |m| method(m).source_location.try(:first) }.compact.sort.uniq,
}
end
end
Module.class_eval do
def look
{
:class => self.class,
:ancestors => ancestors,
:included_modules => included_modules,
:class_variables => class_variables,
:interesting_methods => methods - Class.new.methods,
:instance_methods => instance_methods(false),
:instance_variables => instance_variables,
:constants => constants,
}
end
end
Method.class_eval do
if RUBY_VERSION < "1.9"
def source_location
end
end
def look
{
:class => self.class,
:arity => arity,
:name => name,
:owner => owner,
:receiver => receiver,
:source_location => source_location,
}
end
def source_context
return nil unless source_location
line_number = source_location.second
file = File.open(source_location.first, "r")
lines = file.lines.drop(line_number - 1)
indent = lines.first.scan(/^ */).first.length
end_marker = Regexp.new("^" << " " * indent << "end")
interesting_lines = []
begin
line = lines.shift
interesting_lines << line
end until line =~ end_marker or interesting_lines.count > 20 or line.nil?
file.close
source_location << interesting_lines.join
end
end
Plays well with 1. Wirble, 2. pretty printing, 3. Ruby 1.8, 4. Ruby 1.9, and OF COURSE your .irbrc.
GLib Interface Performance with Vala
September 7th, 2009
As I factored more class commonalities into interfaces I began to wonder about their runtime cost. A study of gtype.c and gtype.h revealed that glib appears to be doing quite a bit of work to find an interface method so I decided to benchmark them.
Beginning with a type instance glib must lookup the class, then binary search an array of interface entries for the id of the wanted interface. Once it finds a match it can finally dispatch via the vtable that is part of the interface belonging to that entry. This is much more work than a virtual call but it can be well worth the trouble. I should add that glib isn't a stupid implementation. It must do all this work because of design compromises. For comparison, the C++ version of an interface bloats each instance with a vtable pointer in exchange for fast method lookup. Calling a C++ interface method shouldn't be noticeably slower than any other virtual function since that's exactly what it is. There is some fixup arithmetic for multiple bases classes but arithmetic is pretty cheap. In contrast, an instance of a GType with interfaces does not own any interface vtables. Those belong to the class, the concrete type.
Assume only the interface type of an instance is known at compile time and the concrete type is unknown. Different concrete types are free to implement any number of interfaces and therefore an interface entry may have different offsets within each class. Given an interface id and an unknown instance, the only way to find the interface entry that corresponds to both that instance and the interface id is to search the interface entries in the instance's class for the id. If the id is found, bingo, the entry's interface's vtable is used for dispatch. In the classic space/time trade we spend time to buy space.
Here's a quick interface benchmark written in Vala.
using GLib;
// Our interface
public interface Foo : Object
{
public abstract int ifrob ();
}
// And a factory method so clients never touch
// the concrete type.
public Foo makeFoo ()
{
return new FooC ();
}
// Finally a concrete type.
public class FooC : Object, Foo
{
// Foo interface method
public int ifrob () {
return 1;
}
// virtual method
public virtual int vfrob () {
return 1;
}
// regular method
public int frob () {
return 1;
}
}
// Number of method calls
const long n = 10000000;
void main ()
{
// Interface var
var fi = makeFoo ();
// Concrete type
var fc = fi as FooC;
var timer = new Timer ();
print ("GLib Dispatch Mechanisms\n");
print ("%ld iterations\nTimes in seconds\n\n", n);
timer.start ();
for (long i = 0; i < n; ++i) {
fc.frob ();
}
print ("direct: %f\n", timer.elapsed ());
timer.start ();
for (long i = 0; i < n; ++i) {
fc.vfrob ();
}
print ("virtual: %f\n", timer.elapsed ());
timer.start ();
for (long i = 0; i < n; ++i) {
fi.ifrob ();
}
print ("interface: %f\n", timer.elapsed ());
}
Here's the output, compiled with --Xcc='-O3'. GLib is compiled with '-O2'.
GLib Dispatch Mechanisms 10000000 iterations Times in seconds direct: 0.000001 virtual: 0.060368 interface: 0.450858
See how -O3 inlines the direct call and then removes the useless loop. You gotta love C compilers, gcc 4.4.1 in this case. Direct calls take about 0.075 seconds with optimizations off. But look at the virtual vs. the interface call. It's over seven times slower. I added some dummy interfaces to FooC in different orders but there was virtually no effect compared to the hit for using an interface in the first place. The disassembler revealed a call to fixed offsets of eax in both virtual and interface cases so the only thing getting optimized out by the compiler is vala boilerplate. The real work is still there. '-O2' yielded interface calls approximately six times more costly than their virtual cousins.
In summary, expect interface calls to be six or seven times slower than a regular virtual one. Interfaces are still sort of new in GLib. Lets hope they get faster because I really like using them.
Comparing Schemes of Dynamic Dispatch in C
May 4th, 2009
Schemes of Dynamic Dispatch under Study
The performance of three dynamic dispatch schemes were examined. I also pontificate on the merits of my favorite schemes.
Virtual-table: Objects contain a pointer to a function table. Each method has a fixed offset within the table.
Function-table: There is one table for each method. Objects contain an integer corresponding to the object type that is used as an index in the table.
Switch-statement: Objects contain a type-integer like they do in the function-table scheme. The switch executes the correct function depending on the integer key.
Virtual-table dispatch is used in classed OO languages because the vtable corresponds to a class. It is a data-centric approach because the dispatch mechanism is associated with the data type and not with any specific function type. Method names in the vtable can be overloaded in different struct definitions of the vtable. Extending a copy of the table corresponds well to single inheritance.
The latter two schemes are more similar to each other than they are to the first. They are function-centric because the dispatch mechanism is associated with a function type rather than a data type. An object passed to these dispatchers contains an integer type-id that serves as an index in a dispatch table or a key in a switch statement. Objects in this scheme are unclassed, though there is nothing prohibiting an association on the type integer with a class-like structure. This class-like structure is actually one interpretation of the dispatch table.
Variations tested:
- Virtual-table.
- Function-table at a static address.
- Function-table at a dynamic address.
- Function-table with fix-up subtraction.
- Switch-statment with inline method definitions.
- Switch-statement with few, ordered keys.
- Switch-statement with few, dispersed keys.
- Switch-statement with many, dispersed keys.
- Direct call (for comparison).
The essence of each scheme is in the call instruction. '-S -fverbose-asm' is kind enough to show the names of static addresses.
Virtual table
| call *16(%edx)
The constant 16 corresponds to the offset of the method in the table. %edx holds the address of the table.
Static function table
| call *func_table-3996(,%edx,4)
The constant *func_table is the static offset of the function table. %edx contains the integer corresponding to the object's type. The constant 4 is the size of a pointer on my machine. Finally, 3996 is the index offset multiplied by 4. Index offset would be 0 for a table beginning at type 0, but a subtype might be found in a sub-range of indexes. Rather than build a sparsely populated table for this subtype, a smaller table can be used with a fix-up subtraction computed at compile time.
Dynamic function table
| call *(%edx,%eax,4)
%edx contains the address of the function table. %eax again contains the object's type-id. Fix-up subtractions would obviously need to be computed at run-time. I didn't test this combined with a fix-up subtraction because I consider it pretty useless anyway. I included this test for completeness since a virtual-table is inherently dynamic. Only the method offset is fixed in the virtual-table.
Switch statements
| call compiler black magic
It should be noted that this scheme semantically performs two direct calls for each dynamic method call. The implementation could be placed in the switch to eliminate function call overhead but it may be beneficial in some cases to have separate implementations and treat the switch statement like an interface declaration. In-lining removes the double-call overhead and can be counted on within a compilation unit.
Benchmark Implementation
Most popular benchmark algorithms do not depend on dynamic dispatch. I just did the naive thing and called a function recursively. It can be tricky to fool gcc into making a wheel-spinner that does no real work. For example, it is capable of rearranging the statements of a direct call that is not tail recursive into one that is, then unrolling it, then placing it in-line in the caller.
Timing points are in the code because instrumenting the executable for profiling caused the execution time to double. The key to using gprof might be to turn off instrumentation of the function calls under test and just time the top-level. However, measuring with the real-time clock gives results that are consistent enough to trust.
The Results
Times are in microseconds.
-O0 Classed virtual: 1404442 Function table: 1501014 Function table with subtraction: 1559695 Function table dynamic: 1607794 Switch inline code: 1823262 Function switch: 2612194 Function switch large: 3633052 Function switch dispersed: 3003073 Direct call: 1348344 -O1 Classed virtual: 1506692 Function table: 1506174 Function table with subtraction: 1506311 Function table dynamic: 1506651 Switch inline code: 1311783 Function switch: 2588107 Function switch large: 3026327 Function switch dispersed: 2549603 Direct call: 1368460 -O2 Classed virtual: 1505832 Function table: 1505468 Function table with subtraction: 1505384 Function table dynamic: 1506625 Switch inline code: 1376533 Function switch: 1982367 Function switch large: 2518238 Function switch dispersed: 2046317 Direct call: 1366636 -O3 Classed virtual: 1505480 Function table: 1506294 Function table with subtraction: 1505536 Function table dynamic: 1506019 Switch inline code: 1476872 Function switch: 1476428 Function switch large: 2517948 Function switch dispersed: 1647105 Direct call: 516581 -O3 -fomit-frame-pointer -mtune=native Classed virtual: 1379984 Function table: 1384040 Function table with subtraction: 1380380 Function table dynamic: 1505872 Switch inline code: 1439144 Function switch: 1438757 Function switch large: 2248245 Function switch dispersed: 1506101 Direct call: 509054
These represent a single run. I computed no averages and made no attempt at statistics. However, on repeated runs the times were consistent to within two milliseconds, which is significantly smaller than than the differences between dispatch schemes.
Conclusion, extrapolation, even some guessing and recommendations
Small switch statements do better than large ones, and dispersion of keys has only a small negative effect. For this reason it may be beneficial to reserve switch statements for dispersed keys while using table dispatch for grouped keys. The switch can actually best the table for a small number of keys but then its advantage depends on compiler in-lining or placing implementation code inside the switch. Even if the compiler succeeds in translating the switch to a jump table the double call elimination still depends on in-lining. A macro definition for explicit table dispatch is short and has guaranteed pretty-good performance. It could easily be converted to a switch, or even a switch macro, for the absolute best performance.
One technique might be to treat switch statement/function-table dispatch as one would an interface. Methods like "show", "display", "draw" and other things that usually go in base classes as stubs are good candidates for an interface.
Methods that need to be overloaded for each type definitely benefit from the virtual-table since the virtual-table can contain different types for the method name across different class structures. The advantage of the virtual-table is overloading and inheritance, but if a function is overloaded at the call site then the object type is probably known, and if its type is known then it should be a direct call anyway. Virtual-tables may also have a performance advantage for types that are dynamically created since the table is found through a pointer regardless. The performance of type-indexed table dispatch theoretically relies on having a static table address but my data doesn't show any difference in performance with optimizations turned on. Only -O0 shows any significant difference between method-indexed virtual-tables and different schemes of type-indexed tables.
Large switch statements with dispersed keys perform poorly but this case should not often arise in practice since type-id integers can be assigned sequentially. It is notable that the widest range in performance occurs in switch statements because they give the compiler the most room for optimization. These numbers show that a switch on type-id is the fastest possible dispatch scheme if implementation code is placed in the switch rather than in a separate method. But that's messy. In-lining in -O3 is capable of removing some of the function call overhead and the direct call-switch-call performs slightly better than a function-pointer. The best number here is 200,000,000 calls in 1376533 microseconds for a switch with explicit in-line code compiled with -O2. -O3 brings the performance down to 1476872 microseconds, which is nearly equivalent to the switch that calls a function for each case. The performance is identical in -O3 because of gcc in-lining. With some different tuning parameters the virtual-table and function-table are just as fast.
Personally, I prefer the interface style that jives with the type-indexed schemes. They are interchangeable, don't need a class, and perform as well as virtual-tables. The style lends itself more to multiple interface inheritance, similar to Haskell type-classes with monomorphism turned off. If a type (which is an integer index) has interfaces A and B (which are table or switch dispatched functions), then it may implement C (which is dispatched the same way).
This excellent style of multiple inheritance can also be done in C++ with one pure virtual base class for each set of functions that comprise an interface. But because C++ uses vtables, each object must keep a pointer for every interface from which it derives. I can't help but think a type-id implementation would make more sense in this case too. Tnen each object would have exactly one type-id and every pure virtual base class would be a table of implementations for those type-ids. Maybe I'm ignoring practical issues like linking and horribly sparse tables, but what about the practical issue of keeping objects small? Virtual tables yield big objects and small tables. Type-ids yield big tables and small objects. I ran this same test with C++ and the performance matched the vtable here. Automatic casting to the interface reference didn't appear to affect performance.
Techniques for local viewing with wget
December 20th, 2008
Wget is a good way to collect interesting websites or HTML e-books for off-line reading. The program is capable of pretty much anything and it took me some trial and error to get reliable results. Some hosts have unfavorable robots.txt files or send different results depending on the user-agent. I've even seen some hosts configured to deny the wget user-agent. After some iteration I have found a good set of options that just works all the time.
wget -rSNpk -np --execute robots=off -U "Mozilla/5.001 (windows; U; NT4.0; en-us) Gecko/25250101"
The above line contains the following options in -rSNpk -np. The rest of the line just tells wget to ignore robots.txt and to send a bogus user-agent.
- -r recursive fetching
- -S preserve server modified date
- -N download only if file on server is newer than local copy
- -p fetch page requisites (images, sounds, stylesheets)
- -k convert links for local viewing
- -np no parent, recurses only on siblings and children of a url
The --follow-ftp option is also useful for some sites having ftp assets, which is common for pdf and video.
If you don't want to swamp the host there are the following options.
- -w
wait n seconds between fetches - --random-wait modify wait time by 50% to 150%
The following line is in my ~/.bashrc
alias wgetr="wget -rSNpk -np --execute robots=off -U \"Mozilla/5.001 (windows; U; NT4.0; en-us) Gecko/25250101\" $@"
Usage: wgetr http://example.com/ebook-with-no-gzipped-download-link
And that is how I use wget. If it's worth reading it's worth saving.
Vala is Awesome
December 11th, 2008
The GUI Dilemma
Linux has many language implementations but I keep finding myself driven to C by quality documentation and faithful implementations. Python, Ruby, and Scheme are all great but docs for bindings are second-hand and I usually need to pull out the original library reference or look at C headers and test-cases to accomplish anything.
If I want to port to the maemo platform I have to port the bindings too. Just more version hell.
On top of that there is interpreter lock with internal threads and that isn't good for the GUI. Loading a GtkListStore is already slow enough when calling from C but doing it in an interpreted language means an inner loop with additional marshaling. Database extensions should also be written as efficiently as possible because that makes them more flexible as I don't have to tiptoe around them in SQL; worrying whether my slowness is getting called 10 or 10e6 times per query. Then there's callbacks, like for sorting, which are called nlogn times or worse. If I'm writing asynchronous IO, ListModel loading, database extensions, and sort callbacks in C then there's very little left to do in an interpreted HLL.
I hate GObject C but I like GObject enough to wish I had used it for some things that are slowly turning into their own private spaghetti hell. Refactoring creates even more opportunities for error and is time consuming. C is hard, let's go shopping :)
C++ would be a good option if I had a much higher IQ. I can use existing classes but writing a new class without memory leaks, using it properly with exceptions, and looking out for excessive copying is a mental exercise in itself. That's actually a languages strength, lots of meaning in a line, but it's not an abstraction as much as it is a complex macro expansion. It's a fun language but it doesn't offer the blissful ignorance I often want.
Introducing Vala
With Vala the blissful ignorance is there for the taking, up to a point of course. It is a modern but slightly crippled language that steals as much as possible from C#. It has all the OO features of GObject, including singe inheritance, multiple interfaces, runtime binding, and dynamic properties. Destructors and constructors are supported along with private/protected/public members. There is also a limited "compact" class that avoids GObject for better performance. Structs and types built on machine words like int, float, and bool do not have OO features and have an exact correspondence to C. There is a string type built on GString that supports some useful operators like + and +=. Type inference means variables can be declared simply with the "var" keyword. The type-system also differentiates between nullable and non-nullable types and checks GError exceptions at runtime. Garbage collection is accomplished via reference counting, which is good enough for most things. Vala supports generics with the familiar template-like syntax. All generics appear to be implemented with gpointer, i.e. (void *), instead of creating new code and structures for each instantiated type. For example a Point2<int> p = {1,2} would create a struct containing two gpointers. Function delegates can be defined inline and look like a lambda. Currently, only formal parameters of the lambda may appear in the body. Vala accepts full lexical closures syntactically but the generated C contains undeclared identifiers and does not compile. Vala will most likely implement ref-counted closures in the future via GClosure, or so I hope.
The language compiles through C and anyone familiar with GObject should be able to read the C output to easily figure out what's going on. I wouldn't hesitate to use a Vala library from C because it's really just GObject under the hood. Bindings are easily implemented in a separate "vapi" file. There is no documentation for vapi at the moment but it really doesn't need much. Marshaling is delegated to GObject at runtime and calling GObject code from Vala doesn't cost any more than calling from C. The whole language is actually syntax sugar for C. It's C++ reinvented with the benefit of C# and Java in the rear-view mirror.
GObject has a high runtime cost but the Vala designers have created some very useful pockets in the type system to mitigate that issue. Vtable lookup always occurs for interface methods, unfortunately even when the concrete type is known at compile time. Inherited methods are called statically through upcasting so better performance is possible by restricting OO usage to compact classes and inheritance-only method resolution. The pointer overhead in generic structures would probably offset any performance increase from static method binding unless it is used carefully. Unrestricted use of generics would imply using them in interface constraints and those entail virtual method calls anyway. The language also doesn't leverage stack allocation for objects like C++ so instantiation requires heap allocation. Structs have access-controlled members and are normally stack allocated but copy constructors and destructors are still generated for structs to support full-object members. The compiler currently accepts inheritance and interface syntax for struct declarations but ignores it when generating code.
The roadmap is set for a 1.0 release in March 2009; fingers crossed. Vala is promising because it's conceptually simple, as portable as glib, has decent performance, and plays well with C in both directions.
PS. I should warn the reader that I just discovered Vala yesterday and my knowledge of it is limited to examining the C output of the version 0.5.2 compiler. Some of this info is probably wrong so this article is As-Is, No Warranty. However, I'd appreciate a comment if I failed to understand something about Vala.
The Port Authority Draft
December 2nd, 2008
Yesterday I spoke with a man whose supposed infraction of Port Authority rules may lead to a personal fine of $1000. His crime was failing to challenge an under-cover officer's undisplayed ID in a secured area. To tell this story while protecting the innocent I'll just call him Joe Pilot and say he works for a made-up Westwind Airlines. The undercover officer is Decoy Dan.
Decoy Dan approached Joe Pilot in a hallway of a secured area and started a short conversation. After a few seconds of this Dan indicated that Joe should come with him because Joe had done something wrong. Joe obliged and the pair walked a few steps to the chief pilot's office where Joe learned the extent of his liability.
The secured area I'm referring to is one of offices, filing cabinets, desks, lounge chairs, vending machines, lockers, and break rooms. It's not a busy ramp where alertness is required for physical safety but a place where crew go to print paperwork for their next flight, relax, or take a nap. A terrorist would enjoy the area for its access to aircraft so security is very important. In this area people should voluntarily raise awareness to abnormal situations for the same reason they bring any other safety related issues to immediate attention. The issue here is not whether Joe should have challenged for ID but whether it was his legal duty.
Joe was allowed in the secured area because he had an ID badge issued by Westwind Airlines. The terms and duties of his employment with Westwind were governed solely by his union contract and there was no requirement therein for Joe to challenge undisplayed ID. Joe wasn't even on paid company time during the infraction. The Port Authority issued badges to vendors and other airport employees but not to Joe. Joe did not have a Port Authority ID, which is the one Decoy Dan should have been wearing when he spoke to Joe. Joe had only a company ID and was required to challenge a Port Authority officer for failure to display a Port Authority ID.
Joe's infraction is one of inaction. He didn't DO something illegal but FAILED TO DO something supposedly required of him. Pilots have grave responsibility and have agreed to perform certain duties in accordance with their company employment and FAA certification. Joe can be held responsible for e.g. failing to check fuel levels before departure by both the company and the FAA. He has studied his duties because he is a professional and he performs them voluntarily because he is a free man who agreed to perform them. The FAA certificate bearing his signature is proof of that agreement.
Joe has never had an agreement with the Port Authority and yet they are holding him responsible for failure to notice and enforce rule violations committed by their officers. Free people can do anything that does not infringe on the rights of others. Doing nothing does not infringe on the rights of others so free people are free to do nothing. If the Port Authority can conscript Joe into their service then Joe is not free. This is a back-door draft of formerly free people into government service.
Why Google Apps Fail
July 15th, 2008
Google Apps is(are) the new MS Office complete with spreadsheet, slide presenter, database, and word processor. They all suck. Commonly cited reasons for failure are that the browser is not an OS, Javascript lacks support for concurrency, Javascript is slow, businesses are slow to adapt to new technology, community authorship is already solved by VCS and wikis, etc. Here’s the real reason they fail: fighting the last war. Word processors and spreadsheets are generic, where “generic” carries the same negative connotation as when applied to performance art.
Google shouldn’t be trying to make generic apps easier to use on the web. Email, IRC, IM, wikis, forums, and social bookmarking have an evolved interface balancing generality with complexity. They are also not stand alone programs that can have the network part removed like a word processor. Even if all the technical browser/script problems are solved, what will Google Apps offer other than a plethora of features bolted on to something resembling a wiki? It’s generic.
Features can increase generality, but at some level of generality the competing technology becomes Sqlite, Linux, or Java; not MS Word. I’ve witnessed the humor of a Salesforce tech-support conference call. My client was using a form that posted to Salesforce and was trying to add some work-flow features on the Salesforce end. What I heard astounded me, the client was getting a lesson in scripting from the tech. I left the call early, glad my part was limited to creating the form. A solution can go in two directions as it becomes more general; more abstract, or more complex. MS Word and Powerpoint embrace complexity. Programming languages lean toward abstraction. Either way, generic solutions are always harder to use.
The MS Office suite thrived because it made it easy to share. Embrace, extend, extinguish. One app to open them all! The complexity was necessary for generality, and generality was necessary for sharing. On the isolated PC there wasn’t room for a special version of Powerpoint for teaching with another for pitching sales. Word had to handle letters, resumes, and instruction manuals, in formats including HTML, text, and binary. It had to interface with Access and embed Excel documents, and each of those programs had to share well with others in the suite. It was a very capable one-size-fits-all solution that was very complex as a result.
With the web offering effectively unlimited storage on top of easy distribution and sharing, there is no longer a need for one-size-fits-all suites. New apps can depart from the generic view of the suite and approach problems directly. There should not be a Powerpoint of the Web . Instead, applications should specialize so that one for e.g. teaching aircraft systems or training sales reps will bundle a tailored likeness of Powerpoint. Slides in this new hypothetical software might pull facts from manuals or link to simulations, each of which is another specialized application, reiterating the importance of sharing. Thus, The Web
Google Apps solve the problem of sharing by offering a uniform API (so 1990), on the web (so 2000). In contrast the new application suite will have as many APIs as there are applications, and as many applications as there are problems to solve. Google Apps are generic and lame.
Sqlpilot
April 15th, 2008
My aviation logbook project, Sqlpilot, recently become usable (for me). I was able to turn my old data into CSV with plenty of grep, sed, awk, sql, manual labor, trial and error, and luck. Sqlpilot is a C program linking to gtk and sqlite. Using a relational database feels like the right way to manage a logbook and I now have a whole new view of my flight log. I've been querying things like average leg time and distance for specific aircraft, number of days with over eight hours of flying, number of approaches flown grouped by month, total time in each aircraft and type, and other interesting trivial things that I would have never known had I not started this project.
Read the rest of this entryBeans worth Blogging About
March 3rd, 2008
Tara and I frequent a Thai Restaurant about a half mile from our apartment. Without too much thought I placed an order for something on the menu called Sator Shrimp since I like shrimp and spice doesn’t bother me. The waiter asked if I had ever had sator before and warned that the dish had a strong smell. That piqued my interest so I stuck with the order. The dish wreaked of odorized propane gas and the flavor of sator defies description. I didn’t know a plant could taste like that even though I ate things like pine needles and radish leaves as a kid. I finished the dish because I was extremely entertained that someone somewhere actually planted sator beans with the intent of selling them as food. Each bite was like the culinary equivalent of the Jerry Springer Show. After the beans were in my stomach and the sensory-masking effect of curry wore off I began to notice the smell again. It was my breath; onions are to sator what tictacs are to onions.
Googling after dinner I found that sator (parkia speciosa) is informally know as a Stink Bean
I can’t even believe a plant like sator is legal to grow in the USA considering growing relatively benign plants like Cannabis can win their owner a prison sentence. We need laws to protect our children from sator now :)
"Missing the point", Misses the Point!
February 7th, 2008
If there was ever a phrase which should be banned from all discourse it is “missing the point”. Missing the point means making an irrelevant conclusion. It’s similar to a non-sequitur with emphasis on the conclusion rather than the evidence. Instead of saying a conclusion is wrong because the evidence is irrelevant (non sequitur – does not follow), “missing the point” directly attacks the conclusion as irrelevant and therefore wrong.
Many Internet discussions have no point and therefore none to miss. Good discussions have many points and that’s why most comment systems are threaded – so different points can be clearly discussed! Don’t tell me I’m missing the point because I introduce a new spin on things. Instead, be a genius and fork the thread. If you disagree then say so, but how could you disagree if I was truly missing the point? If you rebut irrelevance with anything other than an explanation of why something is irrelevant then your clever rebuttal is non-sequitur since the chain of logic is broken. This is what downmods are for. Use them. “You’re missing the point”, is just a red flag that means irrelevant, obvious, or arrogant statements follow.
Which brings me to arrogance. By claiming that I am missing the point you have implied that there is some “one true” point that I have missed. How is that even possible? Does each cause have one and only one effect, and vice versa? It’s nearly the definition of simple-mindedness. Like thought police you seek to limit the scope of discussion to protect the logical purity of your narrow, self-idolizing views though mindlessly repeated idioms. It’s doubleplus ungood for real discussion.
Think I’m missing the point or full of shit? Consider that Google claims over three times the number of hits for missing the point then for the overused phrases full of shit and full of crap combined. Please stop polluting discussions with this trash.