c++ - Properly print utf8 characters in windows console -


this way try it:

#include <stdio.h> #include <windows.h> using namespace std;  int main() {   setconsoleoutputcp(cp_utf8);    //german chars won't appear   char const* text = "aäbcdefghijklmnoöpqrsßtuüvwxyz";   int len = multibytetowidechar(cp_utf8, 0, text, -1, 0, 0);   wchar_t *unicode_text = new wchar_t[len];   multibytetowidechar(cp_utf8, 0, text, -1, unicode_text, len);   wprintf(l"%s", unicode_text); } 

and effect ascii chars displayed. no errors shown. source file encoded in utf8.

so, i'm doing wrong here ?

to wouterh:

int main() {   setconsoleoutputcp(cp_utf8);   const wchar_t *unicode_text = l"aäbcdefghijklmnoöpqrsßtuüvwxyz";   wprintf(l"%s", unicode_text); } 
  • this doesn't work. effect same. font of course lucida console.

third take:

#include <stdio.h> #define _win32_winnt 0x05010300 #include <windows.h> #define _o_u16text  0x20000 #include <fcntl.h>  using namespace std;  int main() {     _setmode(_fileno(stdout), _o_u16text);     const wchar_t *u_text = l"aäbcdefghijklmnoöpqrsßtuüvwxyz";     wprintf(l"%s", u_text); } 

ok, begins work, output is: ańbcdefghijklmno÷pqrs▀tuŘvwxyz.

by default wide print functions on windows not handle characters outside ascii range.

there few ways unicode data windows console.

  • use console api directly, writeconsolew. you'll have ensure you're writing console , use other means when output else.

  • set mode of standard output file descriptors 1 of 'unicode' modes, _o_u16text or _o_u8text. causes wide character output functions correctly output unicode data windows console. if they're used on file descriptors don't represent console cause output stream of bytes utf-16 , utf-8 respectively. n.b. after setting these modes non-wide character functions on corresponding stream unusable , result in crash. must use wide character functions.

  • utf-8 text can printed directly console setting console output codepage cp_utf8, if use right functions. of higher level functions such basic_ostream<char>::operator<<(char*) don't work way, can either use lower level functions or implement own ostream works around problem standard functions have.

the problem third method this:

putc('\302'); putc('\260'); // doesn't work cp_utf8  puts("\302\260"); // correctly writes utf-8 data windows console cp_utf8  

unlike operating systems, console on windows not file accepts stream of bytes. it's special device created , owned program , accessed via own unique win32 api. issue when console written to, api sees extent of data passed in use of api, , conversion narrow characters wide characters occurs without considering data may incomplete. when multibyte character passed using more 1 call console api, each separately passed piece seen illegal encoding, , treated such.

it ought easy enough work around this, crt team @ microsoft views not problem whereas whatever team works on console doesn't care.

you might solve implementing own streambuf subclass handles doing conversion wchar_t correctly. i.e. accounting fact bytes of multibyte characters may come separately, maintaining conversion state between writes (e.g., std::mbstate_t).


Comments

Popular posts from this blog

java - Play! framework 2.0: How to display multiple image? -

gmail - Is there any documentation for read-only access to the Google Contacts API? -

php - Controller/JToolBar not working in Joomla 2.5 -