{"version": "1.0", "type": "rich", "title": "A Spectre is Haunting Unicode", "author_name": "kontextmaschine", "author_url": "https://kontextmaschine.com", "provider_name": "kontextmaschine", "provider_url": "https://kontextmaschine.com", "url": "https://kontextmaschine.com/post/182721138368/", "html": "<a href=\"https://www.dampfkraft.com/ghost-characters.html\">A Spectre is Haunting Unicode</a>\n<p><a href=\"https://allthingslinguistic.com/post/176636856514/a-spectre-is-haunting-unicode\" class=\"tumblr_blog\" target=\"_blank\">allthingslinguistic</a>:</p>\n<blockquote>\n<p><a href=\"https://www.dampfkraft.com/ghost-characters.html\" target=\"_blank\">An interesting article about Japanese and Unicode</a>. Excerpt:\u00a0</p>\n<blockquote>\n<p>In 1978 Japan\u2019s <a href=\"https://ja.wikipedia.org/wiki/%E7%B5%8C%E6%B8%88%E7%94%A3%E6%A5%AD%E7%9C%81\" target=\"_blank\">Ministry of Economy, Trade and Industry</a>\u00a0established the encoding that would later be known as JIS X 0208, which still serves as an important reference for all Japanese encodings. However, after the JIS standard was released people noticed something strange - several of the added characters had no obvious sources, and nobody could tell what they meant or how they should be pronounced. Nobody was sure where they came from. These are what came to be known as the ghost characters (<a href=\"https://ja.wikipedia.org/wiki/%E5%B9%BD%E9%9C%8A%E6%96%87%E5%AD%97\" target=\"_blank\">\u5e7d\u970a\u6587\u5b57</a>). [\u2026]</p>\n<p>By interviewing the catalogers involved in the creation of the standard, the investigators established that some characters were inadvertently invented as mistakes in the cataloging process. For example, \u599b was an error introduced while trying to record \u201c\u5c71 over \u5973\u201d. \u201c\u5c71 over \u5973\u201d occurs in the name of a particular place and was thus suitable for inclusion in the JIS standard, but because they couldn\u2019t print it as one character yet, \u5c71 and \u5973 were printed separately, cut out, and pasted onto a sheet of paper, and then copied. When reading the copy, the line where the two little pieces of paper met looked like a stroke and was added to the character by mistake. The original character (<a href=\"https://ja.wiktionary.org/wiki/%F0%A1%9A%B4\" target=\"_blank\">\ud845\udeb4</a>) was not added to JIS or Unicode until much later and doesn\u2019t display on most sites for me.<br/></p>\n<p><a href=\"https://www.dampfkraft.com/ghost-characters.html\" target=\"_blank\">Read the whole thing</a>.\u00a0</p>\n</blockquote>\n</blockquote>"}