Comparing Schemes of Dynamic Dispatch in C
May 4th, 2009
Schemes of Dynamic Dispatch under Study
The performance of three dynamic dispatch schemes were examined. I also pontificate on the merits of my favorite schemes.
Virtual-table: Objects contain a pointer to a function table. Each method has a fixed offset within the table.
Function-table: There is one table for each method. Objects contain an integer corresponding to the object type that is used as an index in the table.
Switch-statement: Objects contain a type-integer like they do in the function-table scheme. The switch executes the correct function depending on the integer key.
Virtual-table dispatch is used in classed OO languages because the vtable corresponds to a class. It is a data-centric approach because the dispatch mechanism is associated with the data type and not with any specific function type. Method names in the vtable can be overloaded in different struct definitions of the vtable. Extending a copy of the table corresponds well to single inheritance.
The latter two schemes are more similar to each other than they are to the first. They are function-centric because the dispatch mechanism is associated with a function type rather than a data type. An object passed to these dispatchers contains an integer type-id that serves as an index in a dispatch table or a key in a switch statement. Objects in this scheme are unclassed, though there is nothing prohibiting an association on the type integer with a class-like structure. This class-like structure is actually one interpretation of the dispatch table.
Variations tested:
- Virtual-table.
- Function-table at a static address.
- Function-table at a dynamic address.
- Function-table with fix-up subtraction.
- Switch-statment with inline method definitions.
- Switch-statement with few, ordered keys.
- Switch-statement with few, dispersed keys.
- Switch-statement with many, dispersed keys.
- Direct call (for comparison).
The essence of each scheme is in the call instruction. '-S -fverbose-asm' is kind enough to show the names of static addresses.
Virtual table
| call *16(%edx)
The constant 16 corresponds to the offset of the method in the table. %edx holds the address of the table.
Static function table
| call *func_table-3996(,%edx,4)
The constant *func_table is the static offset of the function table. %edx contains the integer corresponding to the object's type. The constant 4 is the size of a pointer on my machine. Finally, 3996 is the index offset multiplied by 4. Index offset would be 0 for a table beginning at type 0, but a subtype might be found in a sub-range of indexes. Rather than build a sparsely populated table for this subtype, a smaller table can be used with a fix-up subtraction computed at compile time.
Dynamic function table
| call *(%edx,%eax,4)
%edx contains the address of the function table. %eax again contains the object's type-id. Fix-up subtractions would obviously need to be computed at run-time. I didn't test this combined with a fix-up subtraction because I consider it pretty useless anyway. I included this test for completeness since a virtual-table is inherently dynamic. Only the method offset is fixed in the virtual-table.
Switch statements
| call compiler black magic
It should be noted that this scheme semantically performs two direct calls for each dynamic method call. The implementation could be placed in the switch to eliminate function call overhead but it may be beneficial in some cases to have separate implementations and treat the switch statement like an interface declaration. In-lining removes the double-call overhead and can be counted on within a compilation unit.
Benchmark Implementation
Most popular benchmark algorithms do not depend on dynamic dispatch. I just did the naive thing and called a function recursively. It can be tricky to fool gcc into making a wheel-spinner that does no real work. For example, it is capable of rearranging the statements of a direct call that is not tail recursive into one that is, then unrolling it, then placing it in-line in the caller.
Timing points are in the code because instrumenting the executable for profiling caused the execution time to double. The key to using gprof might be to turn off instrumentation of the function calls under test and just time the top-level. However, measuring with the real-time clock gives results that are consistent enough to trust.
The Results
Times are in microseconds.
-O0 Classed virtual: 1404442 Function table: 1501014 Function table with subtraction: 1559695 Function table dynamic: 1607794 Switch inline code: 1823262 Function switch: 2612194 Function switch large: 3633052 Function switch dispersed: 3003073 Direct call: 1348344 -O1 Classed virtual: 1506692 Function table: 1506174 Function table with subtraction: 1506311 Function table dynamic: 1506651 Switch inline code: 1311783 Function switch: 2588107 Function switch large: 3026327 Function switch dispersed: 2549603 Direct call: 1368460 -O2 Classed virtual: 1505832 Function table: 1505468 Function table with subtraction: 1505384 Function table dynamic: 1506625 Switch inline code: 1376533 Function switch: 1982367 Function switch large: 2518238 Function switch dispersed: 2046317 Direct call: 1366636 -O3 Classed virtual: 1505480 Function table: 1506294 Function table with subtraction: 1505536 Function table dynamic: 1506019 Switch inline code: 1476872 Function switch: 1476428 Function switch large: 2517948 Function switch dispersed: 1647105 Direct call: 516581 -O3 -fomit-frame-pointer -mtune=native Classed virtual: 1379984 Function table: 1384040 Function table with subtraction: 1380380 Function table dynamic: 1505872 Switch inline code: 1439144 Function switch: 1438757 Function switch large: 2248245 Function switch dispersed: 1506101 Direct call: 509054
These represent a single run. I computed no averages and made no attempt at statistics. However, on repeated runs the times were consistent to within two milliseconds, which is significantly smaller than than the differences between dispatch schemes.
Conclusion, extrapolation, even some guessing and recommendations
Small switch statements do better than large ones, and dispersion of keys has only a small negative effect. For this reason it may be beneficial to reserve switch statements for dispersed keys while using table dispatch for grouped keys. The switch can actually best the table for a small number of keys but then its advantage depends on compiler in-lining or placing implementation code inside the switch. Even if the compiler succeeds in translating the switch to a jump table the double call elimination still depends on in-lining. A macro definition for explicit table dispatch is short and has guaranteed pretty-good performance. It could easily be converted to a switch, or even a switch macro, for the absolute best performance.
One technique might be to treat switch statement/function-table dispatch as one would an interface. Methods like "show", "display", "draw" and other things that usually go in base classes as stubs are good candidates for an interface.
Methods that need to be overloaded for each type definitely benefit from the virtual-table since the virtual-table can contain different types for the method name across different class structures. The advantage of the virtual-table is overloading and inheritance, but if a function is overloaded at the call site then the object type is probably known, and if its type is known then it should be a direct call anyway. Virtual-tables may also have a performance advantage for types that are dynamically created since the table is found through a pointer regardless. The performance of type-indexed table dispatch theoretically relies on having a static table address but my data doesn't show any difference in performance with optimizations turned on. Only -O0 shows any significant difference between method-indexed virtual-tables and different schemes of type-indexed tables.
Large switch statements with dispersed keys perform poorly but this case should not often arise in practice since type-id integers can be assigned sequentially. It is notable that the widest range in performance occurs in switch statements because they give the compiler the most room for optimization. These numbers show that a switch on type-id is the fastest possible dispatch scheme if implementation code is placed in the switch rather than in a separate method. But that's messy. In-lining in -O3 is capable of removing some of the function call overhead and the direct call-switch-call performs slightly better than a function-pointer. The best number here is 200,000,000 calls in 1376533 microseconds for a switch with explicit in-line code compiled with -O2. -O3 brings the performance down to 1476872 microseconds, which is nearly equivalent to the switch that calls a function for each case. The performance is identical in -O3 because of gcc in-lining. With some different tuning parameters the virtual-table and function-table are just as fast.
Personally, I prefer the interface style that jives with the type-indexed schemes. They are interchangeable, don't need a class, and perform as well as virtual-tables. The style lends itself more to multiple interface inheritance, similar to Haskell type-classes with monomorphism turned off. If a type (which is an integer index) has interfaces A and B (which are table or switch dispatched functions), then it may implement C (which is dispatched the same way).
This excellent style of multiple inheritance can also be done in C++ with one pure virtual base class for each set of functions that comprise an interface. But because C++ uses vtables, each object must keep a pointer for every interface from which it derives. I can't help but think a type-id implementation would make more sense in this case too. Tnen each object would have exactly one type-id and every pure virtual base class would be a table of implementations for those type-ids. Maybe I'm ignoring practical issues like linking and horribly sparse tables, but what about the practical issue of keeping objects small? Virtual tables yield big objects and small tables. Type-ids yield big tables and small objects. I ran this same test with C++ and the performance matched the vtable here. Automatic casting to the interface reference didn't appear to affect performance.
on October 13th, 2011 at 04:13 AM
Really like your blog,thanks for sharing it with us.And I also llike 49ers jerseys.Despite your Super Bowl has ended, the Seattle Seahawks Jerseys article off calendar year auction can completely however accelerate your admirers and aswell accompany beatitude that will them. As we could see that you will discover multitudes of NFL jerseys of Reebok during the industry. The NFL will admission a 10-12 months absolute authorization to Reebok alpha during the 2002 NFL division to accomplish and promote NFL accountant merchandise, together with jerseys, amusement apparel, cossack and an NFL-branded accoutrement line. The acceding aswell offers Reebok absolute rights to advance a fresh band of NFL exercising devices. The 49ers jerseys aswell has the edge to entry an disinterestedness situation during the new small business.
on October 20th, 2011 at 04:48 AM
Really like your blog,thanks for sharing it with us.And I also like beats by dr dre. Monster hammers headphones is so accustomed in accompany offworld,beats by dr dre, it truly is abreast at audio followers for audio fanatics, what is more, it is acclimatized devising to buck your admired audio with antic accurateness and acclimatized audio. Get an accessory with the photo, you appetite applause it,beats by dr dre studio, having an ambrosial bloom training course,monster beats flat the Monster Beats abandoned has afflicted a allocation in the boyish ladies. The acceding a affiliate of Monster, Lady Gaga, and red-colored was show up afore these days.The Monster Beats Solo Acclimatized Version essentially appears absolute apparent, and is also no which the adapt individuality that youngster peoples ambition?
on October 25th, 2011 at 04:01 AM
Really like your blog,thanks for sharing it with us.And I also like beats by dr dre.If you are a songs fanatic admiring to apprehend your admired music most effective durations of time,beats by dr dre outfits you entirely properly. They take been meant to complete assertive a blood-tingling music ability for continued intervals without any causing affliction or conceivably affliction on the personal. A alluring ideal through the monster Bests is you put on take to assignment down into your wallet to monster abeyance or media mute. A rapid faucet into is the affair that it takes to complete the needful. You’ll be able to baddest from a arrangement of clear equipment to evaluation these monster earphones. Browse the net in your case to cross a alluring arrangement of admirable components around the internet.These are provided in awfully huge discounts.
on October 29th, 2011 at 03:48 AM
Really like your blog,thanks for sharing it with us.And I also like beats by dr dre.It can motion you the lightest fat as able-bodied because the admirable overall look. In the three areas of your architecture will allow the abandoned fits in the bunched accustomed situation. With Beats abandoned to acquaintance admirable songs has never been so straightforward. Administration Discussion: The ear plugs can complete abiding you feel well. Expend time over the proper beats by dr dre for auction to apprehend not alone new music to obtain additional acceptable alfresco sounds, but aswell accord on your own the affluent full you’d like. Splendidly acceptable for all kinds of gamers. Which has a prime ascendancy accidentally say pictures for that headphone cable Dre, you yield ascendancy of your accountable has moved. New aperture beatific additional than a actor gallons of oil into your river Kalamazoo, claimed Wednesday, letters AFP reported. The Environmental Protection (EPA) mentioned which the aperture commenced Monday, if the pipes accessibility from the Marshall Michigan, and spits on oil monster beats affordable. All within the abundance can be exhausted for quite a few hours.
on November 7th, 2011 at 04:53 AM
Really like your blog,thanks for sharing it with us.And I also like
beats by dr dre
.Is dollars accrued for you? Any individual may say “yes”. Within their thoughts, they can not abide afterwards dollars and capital of beforehand is accrued for them. They get they’re able to blot money to acquire gathered they allegation and want, spending it to order a abounding residence, a admirable garments, affluence purses, big-ticket makeup, a complete vehicle, activated beautification and much of casting items, including the angel acclaimed headsets of beats by dr dre, acclaimed watch-Rolex and many others. In acclimation to admission as abounding funds as possible, these receiving might beats headphones rob through the flush particular person, abduct a bazaar during the night, barefaced any one for capital or maybe accessory an aged man only for accepting tons of money for affluence items. A number of canicule in the past, I abutting my colleague’s alliance ceremony. The guy who she affiliated are sixty now accepting 3 sons and a woman. I used to be abashed afresh , brainwork what can make a 20 a long time previous admirable bairn ambition to accessory such a previous and beastly guy. Is it the authentic applause amidst them? I apprehension to get a linked time but however obtained no response. Afresh my accession abettor abreast to me “she you should not applause that person whatsoever, what she applause is his income. That aged man obtain added than 1million RMB. “Oh, my god, I am unable to purchase my ears. Afresh I get to amass that these receiving who forward funds is extra significant than abolishment away , such as their bodies, their soul. Will lose theirselves for money at last. How addled and inadequate they can be.on August 31st, 2012 at 09:03 PM
The Go programming language uses interface-based dispatch tables when the exact type isn’t know, and direct calls when it is. You might find it interesting. They are at golang.org.
Thanks for the benchmarks, —Glenn