c# - Read localized PDF file using Itextsharp -


i trying read pdf file using itextsharp. issue when trying read pdf file other english (hindi or arabic example) it's not getting correct words.

i wondering, should install hindi or arabic font on system or need encoding?

itextextractionstrategy strategy = new simpletextextractionstrategy(); string currenttext = pdftextextractor.gettextfrompage(pdfreader, page, strategy); currenttext = encoding.utf8.getstring(asciiencoding.convert(encoding.default, encoding.utf8, encoding.default.getbytes(currenttext))); text.append(currenttext); 

edit:

sample pdf image:

enter image description here

extracted text:

uxj ikfydk ifj"kn fuokzpd ukekoyh& 2011 i`"b la[;k % 1 1 1 1& & & & ftys dk uke ftys dk uke ftys dk uke ftys dk uke % % % % 0701-ò¶âã£ûæ– 2 2 2 2& & & & fudk fudk fudk fudk; ; ; ; dk uke dk uke dk uke dk uke % % % % 1-¢âî™ 3 3 3 3& & & & okmz la okmz la okmz la okmz la[ [ [ [; ; ; ;k o uke k o uke k o uke k o uke % % % % 1-¯â“¯â™®â£û¶âû §âîºâã®â£û¶âû Õô¯âû®â£û¶âû 4 4 4 4& & & & hkkx la hkkx la hkkx la hkkx la[ [ [ [; ; ; ;k k k k % % % %

do not use kind of encoding, because not know encoding pdf file has.

. think work.

itextextractionstrategy strategy = new simpletextextractionstrategy(); string currenttext = pdftextextractor.gettextfrompage(pdfreader, page, strategy); text=text+currenttext;  ///do want text messagebox.show(text); 

if still not working have install specific font.


Comments

Popular posts from this blog

java - Play! framework 2.0: How to display multiple image? -

gmail - Is there any documentation for read-only access to the Google Contacts API? -

php - Controller/JToolBar not working in Joomla 2.5 -