linux - Remove duplicate words/string from a tab separated file -

April 15, 2013

i want remove duplicate words/strings large tab separated file using linux commands.

names            john, cnn, mac, tommy, mac, patrick, ngc, discovery, john, cnn, adam, patrick cities            san jose, santa clara, san franscisco, new york, san jose, santa clara

the above file format, want retain tabs , commas after removing duplicate words.

names            john, cnn, mac, tommy, patrick, ngc, discovery, adam cities            san jose, santa clara, san franscisco, new york

any appreciated.

awk 'begin {          fs = ", |\t"      }      {           printf "%s\t", $1           delim = ""           (i = 2; <= nf; i++) {               if (! ($i in seen)) {                   printf "%s%s", delim, $i                   delim = ", "               }               seen[$i]           }           printf "\n"           delete seen      }' inputfile

if you're not using gnu awk (gawk) can't delete array, use split("", array) instead.

Search This Blog

Funaction

linux - Remove duplicate words/string from a tab separated file -

Comments

Post a Comment

Popular posts from this blog

jquery - Invalid Assignment Left-Hand Side -

java - Play! framework 2.0: How to display multiple image? -

gmail - Is there any documentation for read-only access to the Google Contacts API? -