How to merge similar lines in Perl

Home > Search > How-to
  by

When you want to merge or combine lines, you will usually be merging lines that contain identical data. For example, let's say you want to combine lines that have an identical name (Jeremy in this example).

Jeremy, Engineer
Jeremy, Husband
Natalie, Wife
Winston, Child

 

The following script will merge lines on a command field.

#!/usr/bin/perl
use strict;

# --------------------------------------------
# CHANGE THESE VALUES !!!
# --------------------------------------------
# $inputFile is the full path to your file that contains lines you want to merge
# $outputFile is the full path to your file that will contain the merged lines
#             output file can be the same as input file
#             output file will default to be input file if output file is empty
# $common must be an integer and represents the common field in your file
my $inputFile       = "example.txt";
my $outputFile      = "";
my $inputDelimiter  = ",";
my $outputDelimiter = ",";
my $common          = 0;


# ############################################
#
# DO NOT CHANGE ANYTHING BELOW ! ! !
#
# ############################################

# hash, array, and variables that will be used
my %hash;
my @field;
my $count          = 0;
my $current_count  = 0;
my $int            = 0;

# Set $outputFile, $outputDelimiter to contain the value of $inputFile, $inputDelimiter if empty
if ($outputFile eq "")      { $outputFile = $inputFile; }
if ($outputDelimiter eq "") { $outputDelimiter = $inputDelimiter; }

# The purpose of this block is to update the $count variable
# to contain the number of fields from the lines with the highest
# number of fields in the file.
#
# The reason count is needed is so that we loop through the file
# for each field of each line. 
#--------------------------------------------------------------
# open file for reading
open(FH, "<", $inputFile);

# loop through each line in the file
while(<FH>) {

  # split each line into an array 
  @field = split(/$inputDelimiter/, $_);

  # count the number of feilds in the array
  $current_count = @field;

  # set the count to be the line with the greatest number of fields
  if ($current_count >= $count) { $count = $current_count; }

}
close(FH);

# subtract 1 from count because the following section begins and 0 instead of 1 
$count = $count - 1;


# The purpose of this block is to loop through each line in example.txt.
# The opening while loop is used so that we loop through the entire file one field at a time.
#--------------------------------------------------------------

# execute this while loop until $int has the same value as $count
while ($int < $count) {

  # increment int by 1
  $int = $int + 1;

  # open file for reading
  open(FH, '<', $inputFile);

    # loop through each line in the file
    while(<FH>) {

    # split each line into an array
    @field = split(/$inputDelimiter/, $_);

    # remove new lines
    chomp $field[$common]; 
    chomp $field[$int];

    # run the mergeLines subroutine 
    mergeLines($field[$common], $field[$int]);
  }
  close(FH);
}


sub mergeLines {

  # get the values includes in the mergeLines subroutine that was called above
  my ($key, $value) = @_;  

  # if statement prevents identical data from being pushed into the array
  # if the data is not already in the array, the data (value) is pushed into the key array
  if ( ! grep /$value/, @{$hash{$key}} ) { push @{$hash{$key}}, "$value$outputDelimiter"; }
}


foreach my $key (sort keys %hash) {
  print "$key @{$hash{$key}}\n";
}

 

The print statement in the foreach loop will produce the following result.

Jeremy, Engineer Husband
Natalie, Wife
Winston, Child

 



Add a Comment




We will never share your name or email with anyone. Enter your email if you would like to be notified when we respond to your comment.




Please enter in the box below so that we can be sure you are a human.




Comments