Forums › Life › Computers, Gadgets & Technology › Testing UTF-8 console on Windows 10 Ubuntu shell
I couldn’t get this to work for shit on native Windows Visual studio for a Windows console app even though the characters seemed to cut and paste OK (although according to “various Chineses” on the MSDN forums this has always been a problem).
it is still a pain having to enter the UTF-8 codes by hand for the special characters,
I had to change the shell font for one which also contained the Asian characters; and add extra space in the teapot for the tea to “breathe” (or the console doesn’t display this character properly)
This also mangles the look of the extra European letters – unless is that meant to be ø and æ in Danish with a Chinese accent?
I can’t even tell as the PC brigade removed Shun Gon’s lyrics from the Danish version of the Aristocats :laugh_at:
There is probably a better way of doing this but I’m unlikely to end up coding apps for a Chinese takeaway in Denmark anyway as I am sure there are thousands of folk smarter than me who have already written much better ones…
This might be a stupid question but why is this necessary?
These days its not uncommon to have to write software that can process text data in more than one language; although mixed European/Latin and Asian alphabets are not that common accented European characters are widely used outside the UK; and even getting some symbols like the Euro sign to appear on command line apps can be a challenge. Original text mode consoles only used standard ASCII or IA5 encoding; which only contained American symbols (even the £ sign was either absent or swapped with $ making computer programming confusing).
Okay thanks, so it needs to be able to process different alphabets?
Just wondering where the word alphabet comezs from seeing as is alpha plus most of beta, the 1st 2 letters of the greek alphabet.
From wikipedia
The English word alphabet came into Middle English from the Late Latin word alphabetum, which in turn originated in the Greek ἀλφάβητος (alphabētos), from alpha and beta, the first two letters of the Greek alphabet.[6] Alpha and beta in turn came from the first two letters of the Phoenician alphabet, and originally meant ox and house respectively.
Informally the term “ABCs” is sometimes used for the alphabet as in the alphabet song (Now I know my ABCs …), and knowing one’s ABCs for literacy, or as a metaphor for knowing the basics about anything.[7]
Which doesn’t really put me much further than i was before I read it.
I was just about to paste that very same text as an example of why Unicode UTF-8 encoding is a Good Thing (rather than the definition of the word itself) as until the mid 2000s it wasn’t guaranteed that transferring text containing different languages would work at all (the above contains both Greek letters and an unusual ‘e’ “ἀλφάβητος (alphabētos)” ). BTW ø and æ were once widely used in English especially the dialect of the kingdom of Northumbria.
there’s a reason I used that particular Chinese character.
char *teapot=”xe8x8cxb6″ ” “
Older people in London sometimes say they are having a “cup of char” when they drink a cup of tea – 茶 (chá) is the Chinese for tea 😉
Unfortunately it doesn’t cut and paste into the nano editor (at least on my versions) – and I don’t want to monkey around with windows settings on this preview build and quite possibly end up with the whole of Windows in Chinese (already Office 2016 on my laptop is in Dutch) so had to enter the character as the 3 separate octets(bytes).
UTF-8 allows the whole worlds languages plus other useful symbols and even stuff like emojis to be encoded as octet strings and is most commonly used across the Internet – there are various other encodings but for various (complicated) reasons UTF-8 works the best as it is compatible with older protocols.
@General Lighting 984047 wrote:
I was just about to paste that very same text as an example of why Unicode UTF-8 encoding is a Good Thing (rather than the definition of the word itself) as until the mid 2000s it wasn’t guaranteed that transferring text containing different languages would work at all (the above contains both Greek letters and an unusual ‘e’ “ἀλφάβητος (alphabētos)” ). BTW ø and æ were once widely used in English especially the dialect of the kingdom of Northumbria.
Those characters are seen a lot in the dictionary as pronunciation notes.
Can imagine the Cyrrilic alphabet is bad enough but for the middle eastern alphabet and alos Chinese (extended Chinese???????) can be a pain to implement.
Remember at school in maths class we learned how morse code works but then set a task to create a morse encoding for the Russina alphabet with the extension that 4 dots or dashes could be used instead of the normal 3. While I had no difficulties with it, working methodically made it simple, but quite a few people struggled and I’m not sure why but even just 2 extra characters increased the complexity higher than I’d imagined.
0
Voices
5
Replies
Tags
This topic has no tags
Forums › Life › Computers, Gadgets & Technology › Testing UTF-8 console on Windows 10 Ubuntu shell