c++ - Properly print utf8 characters in windows console -
this way try it:
#include <stdio.h> #include <windows.h> using namespace std; int main() { setconsoleoutputcp(cp_utf8); //german chars won't appear char const* text = "aäbcdefghijklmnoöpqrsßtuüvwxyz"; int len = multibytetowidechar(cp_utf8, 0, text, -1, 0, 0); wchar_t *unicode_text = new wchar_t[len]; multibytetowidechar(cp_utf8, 0, text, -1, unicode_text, len); wprintf(l"%s", unicode_text); } and effect ascii chars displayed. no errors shown. source file encoded in utf8.
so, i'm doing wrong here ?
to wouterh:
int main() { setconsoleoutputcp(cp_utf8); const wchar_t *unicode_text = l"aäbcdefghijklmnoöpqrsßtuüvwxyz"; wprintf(l"%s", unicode_text); } - this doesn't work. effect same. font of course lucida console.
third take:
#include <stdio.h> #define _win32_winnt 0x05010300 #include <windows.h> #define _o_u16text 0x20000 #include <fcntl.h> using namespace std; int main() { _setmode(_fileno(stdout), _o_u16text); const wchar_t *u_text = l"aäbcdefghijklmnoöpqrsßtuüvwxyz"; wprintf(l"%s", u_text); } ok, begins work, output is: ańbcdefghijklmno÷pqrs▀tuŘvwxyz.
by default wide print functions on windows not handle characters outside ascii range.
there few ways unicode data windows console.
use console api directly, writeconsolew. you'll have ensure you're writing console , use other means when output else.
set mode of standard output file descriptors 1 of 'unicode' modes, _o_u16text or _o_u8text. causes wide character output functions correctly output unicode data windows console. if they're used on file descriptors don't represent console cause output stream of bytes utf-16 , utf-8 respectively. n.b. after setting these modes non-wide character functions on corresponding stream unusable , result in crash. must use wide character functions.
utf-8 text can printed directly console setting console output codepage cp_utf8, if use right functions. of higher level functions such
basic_ostream<char>::operator<<(char*)don't work way, can either use lower level functions or implement own ostream works around problem standard functions have.
the problem third method this:
putc('\302'); putc('\260'); // doesn't work cp_utf8 puts("\302\260"); // correctly writes utf-8 data windows console cp_utf8 unlike operating systems, console on windows not file accepts stream of bytes. it's special device created , owned program , accessed via own unique win32 api. issue when console written to, api sees extent of data passed in use of api, , conversion narrow characters wide characters occurs without considering data may incomplete. when multibyte character passed using more 1 call console api, each separately passed piece seen illegal encoding, , treated such.
it ought easy enough work around this, crt team @ microsoft views not problem whereas whatever team works on console doesn't care.
you might solve implementing own streambuf subclass handles doing conversion wchar_t correctly. i.e. accounting fact bytes of multibyte characters may come separately, maintaining conversion state between writes (e.g., std::mbstate_t).
Comments
Post a Comment