Bootstrap FreeKB - Bash (Scripting) - Remove duplicate lines in a file
Bash (Scripting) - Remove duplicate lines in a file

Updated:   |  Bash (Scripting) articles

Let's say you have a file named foo.txt that contains the following.

Line 1
Hello
Line 2
Hello
Line 3

 

awk can be used to return the lines that have been "seen" in a file. In this example, awk will not return any stdout the first time it parses the line containing "Hello". However, awk will return "Hello" when it parses the second occurrence of the line containing "Hello" because an identical line has already been "seen".

~]# awk 'seen[$0]++' foo.txt
Hello

 

Similarly, including an exclamation point can be used to return lines that have not been "seen" in the file. In this example, awk does not return the second occurrence of the line containing "Hello" because an identical line has already been "seen".

~]$ awk '!seen[$0]++' foo.txt 
Line 1
Hello
Line 2
Line 3

 

The prior awk command will not make any changes to the original file. Instead, this just returns stdout of the lines that have or have not been "seen".

You could use redirection to redirect the output to a different file.

awk '!seen[$0]++' /tmp/foo.txt > /tmp/bar.txt

 

Or, you could use gawk which has the -i inplace option to update the original file.

gawk -i inplace '!seen[$0]++' /tmp/foo.txt

 

I also had a coworker use the following which preserves the order of the lines in the file. I think the prior awk and gawk commands also preserve the order of the file so perhaps the follow command does the same as the prior commands but I wanted to at least make note of this as something to try if the above commands don't do what you want them to do.

awk 'NF{x[$0]++; print (x[$0]>1?"<REMOVE>"$0:$0); next}1' /tmp/foo.txt | sed "s/^<REMOVE>#/#/" | sed "/^<REMOVE>/d" > /tmp/foo.txt.new; mv /tmp/foo.txt.new /tmp/foo.txt



Did you find this article helpful?

If so, consider buying me a coffee over at Buy Me A Coffee



Comments


Add a Comment


Please enter c83f06 in the box below so that we can be sure you are a human.