linux - Remove duplicate words/string from a tab separated file -
i want remove duplicate words/strings large tab separated file using linux commands.
names john, cnn, mac, tommy, mac, patrick, ngc, discovery, john, cnn, adam, patrick cities san jose, santa clara, san franscisco, new york, san jose, santa clara the above file format, want retain tabs , commas after removing duplicate words.
names john, cnn, mac, tommy, patrick, ngc, discovery, adam cities san jose, santa clara, san franscisco, new york any appreciated.
awk 'begin { fs = ", |\t" } { printf "%s\t", $1 delim = "" (i = 2; <= nf; i++) { if (! ($i in seen)) { printf "%s%s", delim, $i delim = ", " } seen[$i] } printf "\n" delete seen }' inputfile if you're not using gnu awk (gawk) can't delete array, use split("", array) instead.
Comments
Post a Comment