c# - Read localized PDF file using Itextsharp -
i trying read pdf file using itextsharp. issue when trying read pdf file other english (hindi or arabic example) it's not getting correct words.
i wondering, should install hindi or arabic font on system or need encoding?
itextextractionstrategy strategy = new simpletextextractionstrategy(); string currenttext = pdftextextractor.gettextfrompage(pdfreader, page, strategy); currenttext = encoding.utf8.getstring(asciiencoding.convert(encoding.default, encoding.utf8, encoding.default.getbytes(currenttext))); text.append(currenttext); edit:
sample pdf image:

extracted text:
uxj ikfydk ifj"kn fuokzpd ukekoyh& 2011 i`"b la[;k % 1 1 1 1& & & & ftys dk uke ftys dk uke ftys dk uke ftys dk uke % % % % 0701-ò¶âã£ûæ– 2 2 2 2& & & & fudk fudk fudk fudk; ; ; ; dk uke dk uke dk uke dk uke % % % % 1-¢âî™ 3 3 3 3& & & & okmz la okmz la okmz la okmz la[ [ [ [; ; ; ;k o uke k o uke k o uke k o uke % % % % 1-¯â“¯â™®â£û¶âû §âîºâã®â£û¶âû Õô¯âû®â£û¶âû 4 4 4 4& & & & hkkx la hkkx la hkkx la hkkx la[ [ [ [; ; ; ;k k k k % % % %
do not use kind of encoding, because not know encoding pdf file has.
. think work.
itextextractionstrategy strategy = new simpletextextractionstrategy(); string currenttext = pdftextextractor.gettextfrompage(pdfreader, page, strategy); text=text+currenttext; ///do want text messagebox.show(text); if still not working have install specific font.
Comments
Post a Comment