Introduction to Isometric Avatars


URI:

http://herbert.gandraxa.com/herbert/iia.asp

Link template:   

<a href="http://herbert.gandraxa.com/herbert/iia.asp">Introduction to Isometric Avatars</a>


Link symbols:   

Local LinkOn current page | DocumentOn this site | External PageOn external site | WikipediaWikipedia article | Compressed ArchiveZIP archive | PDF documentPDF | E-MailE-Mail


Article

Organization

DocumentHome » DocumentArticles » DocumentDesigning Isometric Avatars » Introduction to Isometric Avatars

Scope

This article is the first one within the series Designing Isometric Avatars. It discusses the principles in the design of avatars in isometric worlds (as opposed to the design of avatars in 3D worlds). The fundamental differences between the two techniques are outlined, and a guide to set up a framework for your experiments is presented.

Many references are made to the article DocumentIsometric Projection: its understanding facilitates understanding this text.

Author

DocumentHerbert Glarner

Published

2007-Mar-30

External Links


Before Starting

2D, 3D, Isometric

The terms 2D and 3D have come to signify a very distinct meaning, particularly when used in computer games. Since avatars are mainly used in such environments, it seems sensible to adapt their common usage when discussing the design of avatars, even if they technically are used in an incorrect way.

2D translates to two-dimensional. Of course, all drawings on a flat plane necessarily need to be two-dimensional, because flat surfaces lack a third dimension. No matter what canvas you use for your drawing, be it a sheet of paper or your monitor's screen: the outcome will be two-dimensional. Indeed, technically all current "3D games" which do not make use of a real 3rd dimension (like e.g. in holograms) in reality are two-dimensional.

So, where does the notion of "3D" come from? It is the perceived quality of the perspective which led to the wide-spread usage of this term.

To qualify as a three-dimensional environment in terms of "game language", the rendering of the objects within the game's world must be such, that the third dimension can be "felt". Conceptionally, this can be achieved quite easily: the further away an object is located in relation to the observer (or implied camera), the smaller it must be.

In a so-called 2D environment, however, such is not the case: all objects still will have the same size, no matter how far away they are from a hypothetical (or real) observer. However, this does not imply, that 2D environments can not provide a feeling of depth at all. In fact, each axonometric projection (of which the isometric projection is a kind) does aim at providing the illusion of depth.

Although conceptionally easy, the realization of a proper 3D perspective requires a whole set of sophisticated techniques, both mathematically and technically. Involving these techniques is a heavy burden for the responsible processor on your graphics card. Fortunately, actual graphic cards have so-called hardware accelerators, which are able to perform incredibly many complex operations in a very short time. However, the data still needs to be passed to those accelerators, as well as the instructions what to do with that data. No matter how clever a 3D design is, a PC's CPU still will be pretty busy with moving large quantities of data forth and back.

Compared with such a 3D environment, rendering a 2D environment is by far less CPU intense. This is not due to different techniques per se, because the mathematical concepts behind the scenes are the same, but it is because they can be simplified to a very large extent. An example: instead of calculating trigonometrical functions over and over again, they oftentimes can be precalculated (if that's needed at all) and kept handy for continuous usage during the whole lifetime of the application in question. In many cases, such precalculated values even can be coded as constants.

It is worth noting, that most ("all") of the 3D rendering is done by the graphic card's processor, and this needs not necessarily be the case for 2D renderings. This remark is not made to relativate the speed advantage: what can be done in 3D can also be done in 2D, and it can be done in a simpler (i.e. faster) way. The remark was made to point out, that it is very well possible to render a 2D environment also without the use of accelerating hardware by just using a PC's CPU. Due to the number of involved operations, this is a thing which barely is feasible for 3D environments.

To avoid the confusing term "2D" (which also can signify the 2D representation of a 3D environment) we henceforth will use "isometric" when referring to objects in an isometric environment, but we will stick to the term "3D" when referring to environments in which object sizes decrease with distance. For the theoretical concept and our usage of the term isometric (also slightly misleading, by the way, as we properly should speak about dimetric projections) please refer to the article DocumentIsometric Projection.

Static Design

Despite having said, that all what can be done in 3D could also be done (more efficiently) for 2D environments, in practice, there is a big difference in designing avatars for isometric environments and 3D environments.

In isometric worlds all objects (including avatars) do not change in size no matter how far away they might be from the observer. Therefore, there is no real need to deal with coordinates defining the possibly many subshapes (such as arms and legs) of those objects. Although the coordinates still may be of use even in isometric environments, all that overhead can be omitted, when the objects are designed statically.

"Statical" means, that an object's appearance is predetermined, i.e. "what you see (while designing) is what you get". This comes as a big advantage, because static objects can be rendered with ease: in the easiest case, the appropriate pixels just need to be copied from a "ready-to-use"-template to the output device, directly served from the PC's CPU. This usually saves massively on computations (if done cleverly, that is, since it is very well possible that good acceleration hardware can outperform a PC's CPU when implemented badly).

However, there's also a drawback. There are several, actually.

Firstly, rather then defining an object just once and then being able to render the once finished product from whatever point of view the user (or operator) desires (as is possible in 3D thanks to massive computations), static objects need to be designed multiple times, i.e. once for each direction from where we wish the observer to look at it. This explains, why we usually have a very limited choice of viewing directions (oftentimes just 1, but usually 4 of them, by allowing the user to rotate the viewing direction in angles of 90°).

And secondly, if the object shall be able to move (which usually is the case for avatars), then there is the need for different "frames".

Frames

An avatar shall be able to move in an isometric world consisting of tiles. Depending on the size of a tile and the desired speed of our moving object, we need to calculate how many times per second the object needs to be rendered in order to achieve a more or less fluent motion from one tile to the next one.

For each such rendering a "still picture" needs to be designed, showing a gradually completion of the desired motion. A single "still picture" is called a frame. If carefully designed frames are rendered fast enough, the motion appears to be fluent: such is the case also with 3D renderings (and TVs for that matter). The frequency at which the frames follow each other is called the frame rate. It is measured in Hertz (occurrences per second; here: pictures per second). Note, however, that we in no way are bound to use the "second" as the time measure (nor any other fixed time period). Although measured in Hertz (which by definition does use the second as the time component), a complete series of frames needs the time we define it to need: this can be anything, ranging from a few milliseconds only up to minutes or even hours.

For obvious reasons we don't want to design individual pictures for a movement between every imaginable point: once we have a particular "frames set" (determined by the start and the end of a distinct motion, say an avatar's step in a walking sequence), we naturally aim to apply it multiple times. And there another challenge becomes apparent: we must design the start and end of a set such, that it can be repeated in an apparently seamless manner. We will see, that this requires cautious planning (and some math) even before thinking about designing the first frame.

We will also see, that sometimes it is unavoidable to have frame sets depending on other frame sets: for instance, it is obvious that we are not done with designing an avatar's step with its left leg and repeat that set repeatedly, as there must be a step with its right leg in between (unless we designed the set to include both steps, with the drawback, that we can not bring the avatar to an instant halt, but need to complete a possibly just begun double-step first).

Transitions are something we need to care for as well. It would be wrong to generally assume, that we simply can start a frame set intended for repetition, for example "walking" as outlined above. Here, we will need a distinct (comparably short) frame set for the actual start of the walking process as well as the actual end of it.

Conclusion

To summarize: designing aesthetically appealing objects in a 2D environment is a very time-consuming task, far more than for 3D objects.

What one gets for these efforts is a massively reduced running time, if not making bad mistakes with the implementation.

And of course, we need a carefully set up framework, with precisely defined measures, in order to be able to calculate distances, needed times, number of frames and more, such that our design results in credible avatars. This is defined in the next section of this article.


A Framework

Tile Design Considerations

Before going into details, it might be worthwhile to recap some tile properties as outlined in article DocumentIsometric Projection, section "Irregular Tile Properties", fig. 12 and 13:

How is this all important and how does it affect us? Well, a tile is made up of pixels at discrete intervals, i.e. these pixels are countable and not continuous. If we want to be able to locate and address a tile's center (which is crucial for many reasons, as we will see), then we must make sure that we can identify such a center. Now, addressing the center is only possible, when the center is indeed addressable, i.e., the center may not lay between two addressable elements, but must fall on an addressable element.

Let's visualize this and look at the edges of two tiles, seen from the side. To save space, let's assume the edge has a length of 8 "addressable elements" (which you can think of to be pixels for now; the details are discussed further below):

An addressable tile edge's middle point

Fig. 1: An addressable tile edge's middle point

The figure makes clear, that an edge's middle point is addressable only then, when the perceived edge length (i.e. including both end points) has a number of elements which is odd, so that the same number of elements can be distributed equally to both sides of the middle point. And because only one of these edges can be actually defined in any single tile (the opposite edge only being implied), the actual design of a tile must exclude that opposite edge, forcing us to design tiles having an even number of addressable edge elements.

Now, why did we begin to talk about "addressable elements" and not just about pixels all at a sudden? The short answer is: because these elements are not pixels. Recall how we defined the isometric flat surface: to obtain an optically appealing line we effectively introduced a dimetric (although nearly isometric) projection (see DocumentIsometric Projection, section "Dimetric Projection", fig. 3) with the consequence, that the designed width of a tile is double the designed height of that tile. Because of this, an "addressable element" consists of a rectangle of two horizontal pixels (the other dimension measuring 1 pixel).

Yet another property is of importance: the two topmost edges of a tile meet in the tile's topmost corner. This means, that the addressable element in that corner is shared by both topmost edges [1]. When we speak of "edge lengths", we should keep that in mind (or we could end up with doubling the addressable element at the top, thus generating two corners, which of course would be wrong). This implies, that the designed tile's overall width needs to consist of an odd number of addressable elements [2].

So much for a tile's width. As for its height, I'd like to refer you to DocumentIsometric Projection, section "Spotlight on Tiles", fig. 8-10: that section visualizes the aspect of seamlessly tiling our tiles in order to form a (flat) landscape without "holes". In essence, we need to make sure, that the leftmost and rightmost corners of the tile are doubled in height, by stacking two addressable elements over each other.

Put all together, we arrive to the following constraints with respect of a flat tile's design:

When following above rules, we end up with tiles with which we can comfortably work.

Notes:

[1] Because the two topmost edges are themselves shared by the bottommost edges of their adjacent tiles, all corners (with the exception of the ones being located along the whole landscape's lower edges) in fact are shared by a total of 4 tiles each.

[2] Since an addressable element is 2 pixels wide, this equals an even number of pixels, but it would be wrong to define it this way, because also an even number of addressable elements results in an even number of pixels.

Dimensions and Measurements

We start out with a vague idea about the size of our future avatars: after all, it is primarily the optical impression which counts. From this, we will work out all measures by deduction. It always is a good idea, to not to begin with an actual drawing: just use a very primitive draft, or, like I do here, just a box as a placeholder for the future design.

The only important properties of this sketch are the true "real-life" height of your avatars, and the number of pixels used to display the avatar. Actually, it might even be a good idea to choose two different heights, within which your avatars heights can be varied. For example could the lower bound be used for females, the upper bound for males, etc. If you do use multiple heights, then the relations must already be observed. In our example, these relations are 180:160 = 72:64.

Defining the effective height

Fig. 2: Defining the effective height

Note, that an empty box easily can give the impression that it is "too small", but you might be surprised about the quantity of pixels which can be stuffed in comparably small boxes. If you lack a pretty good imagination, it might be an advantage to indeed use a draft rather than a box to define the heights. Of course, ultimately the height depends on your application: feel free to half or triple the size as per the needs and requirements of your environment.

If you were defining multiple heights as in our example, then you needed to observe the relations and know already, that 1 height pixel corresponds to 2.5 cm. Otherwise it must be calculated now. Unsurprisingly, both 180 cm/72 px and 160 cm/64 px yield 2.5 cm/px.

So much for the trivial part of defining the measures. But no worries, it doesn't get really complicated. The next step is to define, how many centimeters a pixel represents along a tile's edge.

"Huh? 2.5 cm as well, no?" - Well, yes and no. Recall how the edge lengths were defined: we said, that all 3 axes (the two defining the tile's surface in the x and z dimension, plus the upward y dimension representing the height) have the property, that equal segment lengths represent equal effective lengths. Unfortunately, the x and z dimensions are not orthogonal to the y axis: with an assumed horizontal line running through a tile's bottommost corner these axes form an angle a of a=arctan(1/2)=26.565°:

The edge e is foreshortened

Fig. 3: The edge e is foreshortened

We can measure along the edge e, but it isn't before the rotation of the edge downward onto s then we "see" the real length. Now, whatever the length is that we measured along the edge e, it is represented by a line made up of pixels, which can be thought of as the diagonal of a surrounding box with the width ew and the height eh. Would we rotate e onto s then its width would become sw = ew/cos(a) = ew/0.8944 pixels wide (with a height of 0 px). But when we want to transfer a length we already know, then we actually know sw (and not ew) and need to rotate s up to e, which makes ew shorter than sw: ew = sw×cos(a) = sw×0.8944. At the same time the value represented by a pixel is increased by a factor of 1/0.8944 = 1.118.

We already found above, that a height pixel represents 2.5 cm. The same must be true on its orthogonal sw. Therefore, even without knowing anything about the tile's edge length yet, we know that a pixel on the x and z axes represents 2.5 cm/0.8944 = 2.8 cm. All in all, not that much of a difference, but it still needs to be taken into account if we want to work with a certain level of accuracy.

Now we have all data to find a meaningful edge length.

"Find one? Let's just define one." - This would work well in quite many instances. For example, when we would go for a representation of data obtained via a DEM (as described in the article DocumentDigital Elevation Models), then we presumably already know the grid distance (in the example followed in that article this would be 10 meters). It all depends on how you plan to render the environment and mix in your avatars. However, if your avatars play a crucial role in your landscape (as is most likely the case in games), then we can do better than just use a predefined measure.

For example would it be a great advantage to have a walking avatar start and stop in the center of a tile. Doing so would enable us to have fixed positions: we know, that a standing avatar would be located on the center of a tile, which same point most likely also happens to be the world's focus (i.e. the middle point of the screen displaying the rendered landscape, such placing the avatar in the middle as well, attracting the observer's attention; hence the term "focus"), at least if the avatar is to represent a real user (like a player in a game). This would facilitate many computations. In our examples we want to use such facilitation, therefore we attempt to find a meaningful edge size.

Having that said, the question unavoidably arises: what exactly is meaningful? - Well, we said it would be nice if a walking avatar started and stopped in the center of a tile. Walking is a series of steps, so it would be most natural for an avatar-centered environment to take into account the avatar's step length.

It does not need a lot of online research to find an average (real-life) measure for a human's average step length. Depending on gender, weight and height, the step length for slow to medium walking speeds is in the range of usually 50 to 70 cm (Source: External SiteBasic gait parameters, PDF).

So our edge length ideally should be around 60±10 cm. Expressed in pixels, this is 60±10 cm / (2.8 cm/px) = 21.43±3.57 px = 17.86..25 px. So we are given the choice between (rounded) 18 and 25 px.

At first, one might be disappointed: software developers tend to favour powers of 2, because many calculations (especially divisions) could be performed much faster. As such, 16 or 32 px would have been nice, because they are powers of 2. However, they are out of range. That's not much of a deal, though, as there is a more important consideration to make: it is about the subdivision of tiles.

Doubtlessly, 16 or 32 is a nice number when it comes to split down a tile in ever smaller halves until 1 is reached, but it has a really nasty property as well: powers of 2 are the only numbers which divide a power of 2. Would we want to split the tile into 3 parts, for example, then we would not be able to achieve that cleanly. In that respect, a multiple of the small numbers 1×2×3 = 6 would be desirable (12 would even be better), and there we have luck: 24 pixels is such a multiple, and it is within the allowed range. Furthermore, 24 also fulfills all the constraints which we outlined above.

For these reasons, 24 pixels (along the edges corresponding to 24×2.8 cm=67.2 cm) shall be the edge length of our tile.

We mentioned (see above, fig. 1), that a tile does include both opposite edges. Does this mean now, that these 24 pixels do include both edges, i.e. do we need to subtract one of these edges when designing our tile? - The answer is: "No". From the center of a tile to the center of its orthogonally adjacent tile the distance shall be 24 pixels. While walking from center to center on a tiled landscape, we cross just one edge (which happens to be both the end of one tile and the start of the other one). But when the distance from center to center is 24 pixels, then the same must be true along a tile's edge. And since we start out at a corner, one (and only one) such edge is included.

With this we covered it all and are ready to start with the design of our tile. To sum up our definitions:

The Complete Framework

According to our definitions, we would design a tile as following (magnified 4 times):

Zoomed-in tile with dimensions in pixels

Fig. 4: Zoomed-in tile with dimensions in pixels

Tiling then follows the rules outlined in article DocumentIsometric Projection, section "Spotlight on Tiles", fig. 8-10:

Tiling the defined tiles and outline of an avatar on a tile

Fig. 5: Tiling the defined tiles and outline of an avatar on a tile


Continuation

With this the introduction ends. The next article of the series DocumentDesigning Isometric Avatars shows the practical approach in designing a first avatar fitting into our well-defined landscape: proceed to DocumentComponents of Isometric Avatars after a break (it's a lengthy article).