Tuesday, March 13, 2018

typography - What's the practical difference between a 'glyph' and a 'character'?



I saw this question on the Typography site proposal and it bugged me that I didn't know the answer. I'd always treated 'glyph' and 'character' as interchangable.




After reading an explanation on the Unicode Character Encoding Model page, my understanding is roughly this:



  • Characters are defined by their meaning in language, glyphs, by their appearance. So, the ligature for aesthetically combining fi is one glyph, but two characters.


So, my belief is (please correct me if I'm wrong) that the practical difference would be:



  • Text parsers that aren't interested in the aestetics of text will read glyphs as their respective characters. So:


    • If you were to copy and paste text containing glyphs into a plain text editor, the glyphs would be converted to their respective characters (a ligature glyph would become f and i)

    • Any well made automated system based on text parsing (e.g. search engine crawlers, screen readers, spell checkers) would interpret the glyphs as their respective characters.

    • One character can have many glyphs or glyph sets. I want to say one glyph can only have one character, but this clearly isn't right as there's an example on the linked article of 3 glyphs and glyph sets that seem to each correspond to a character and set of characters. I don't quite see how this could work: surely that means there will be inconsistency or ambiguity in how those glyphs are interpreted, varying by interpretter? (or does it vary by language, or by font?)

    • While glyph browsers (e.g. the one in Illustrator) contain the full glyph set of a font, character maps (e.g. the Windows character map) only contain characters, not glyphs that are multiple characters like ligatures (something I'd not noticed before)




I feel like I'm nearly there but I've clearly misunderstood something somewhere along the line: not just the "One glyph multiple characters" thing, but also, copying and pasting behaviour with ligatures isn't quite what I expected:



  • Copy the ligature from Illustrator to this input box: pastes as fi (two characters) as expected.


  • Paste in the HTML code for it () - displays as the ligature when not in a code block (fi - which in this font doesn't look much like a ligature, but you'll see is one if you try to select just half of it), and the code when in a code block (), as expected.

  • Copy and paste the rendered non-code-block ligature back into the input box: pastes as the ligature character, and renders as the ligature regardless of whether it's in a code block or not (fi and ). Likewise words containing it: fit misfits (fit misfits) pastes as fit misfits (fit misfits). Maybe it depends on whether the place it's being pasted understands the encoding used?




How far wrong is my understanding of this? Can someone put me right: stating a clear definition of the difference between glyphs and characters (if mine is wrong or can be improved), and give clearer/more accurate examples than mine of what that means in practice?



Answer



Glyphs relate to how text is rendered, characters to how it's interpreted. When you copy&paste, the source application usually gives a choice of several formats. Plain text will decompose the fi ligature into f and i, HTML format may translate it to the char entity you quoted or also decompose it in f and i.


In general the relation between characters and glyphs is n:m. In Indic languages some characters divide into two glyphs that are placed at different places of the word. In Latin the closest to that situation would be rendering é as two glyphs (e and ´). In Arabic each character has different glyphs depending on its position within a word: initial, middle, final or isolated.


The translation from characters to glyphs is specific to each application and the typographic features it supports. For Latin text this translation used to be straightforward, but OpenType fonts introduced additional features like ligatures, swashes, alternate forms, small caps etc.


For practical reasons you only concern yourself with glyphs when you implement how an application renders text, or when you design a font, or when you want to apply an OpenType feature that replaces some glyphs with others (e.g. ligatures). Otherwise Unicode code points are your friend.



No comments:

Post a Comment

technique - How credible is wikipedia?

I understand that this question relates more to wikipedia than it does writing but... If I was going to use wikipedia for a source for a res...