Bootstrap

Perl (Scripting) - Flatten lists (merge similar lines)

by Jeremy Canfield | Updated: March 22 2020 | Perl (Scripting) articles

Let's say you want to flatten a list that has an identical key (Jeremy in this example).

Jeremy, Engineer
Jeremy, Husband
Natalie, Wife
Winston, Child

The following script will flatten the list.

#!/usr/bin/perl
use strict;

# --------------------------------------------
# CHANGE THESE VALUES !!!
# --------------------------------------------
# $inputFile is the full path to your file that contains lines you want to merge
# $outputFile is the full path to your file that will contain the merged lines
#             output file can be the same as input file
#             output file will default to be input file if output file is empty
# $common must be an integer and represents the common field in your file
my $inputFile       = "example.txt";
my $outputFile      = "";
my $inputDelimiter  = ",";
my $outputDelimiter = ",";
my $common          = 0;


# ############################################
#
# DO NOT CHANGE ANYTHING BELOW ! ! !
#
# ############################################

# hash, array, and variables that will be used
my %hash;
my @field;
my $count          = 0;
my $current_count  = 0;
my $int            = 0;

# Set $outputFile, $outputDelimiter to contain the value of $inputFile, $inputDelimiter if empty
if ($outputFile eq "")      { $outputFile = $inputFile; }
if ($outputDelimiter eq "") { $outputDelimiter = $inputDelimiter; }

# The purpose of this block is to update the $count variable
# to contain the number of fields from the lines with the highest
# number of fields in the file.
#
# The reason count is needed is so that we loop through the file
# for each field of each line. 
#--------------------------------------------------------------
# open file for reading
open(FH, "<", $inputFile);

# loop through each line in the file
while(<FH>) {

  # split each line into an array 
  @field = split(/$inputDelimiter/, $_);

  # count the number of feilds in the array
  $current_count = @field;

  # set the count to be the line with the greatest number of fields
  if ($current_count >= $count) { $count = $current_count; }

}
close(FH);

# subtract 1 from count because the following section begins and 0 instead of 1 
$count = $count - 1;


# The purpose of this block is to loop through each line in example.txt.
# The opening while loop is used so that we loop through the entire file one field at a time.
#--------------------------------------------------------------

# execute this while loop until $int has the same value as $count
while ($int < $count) {

  # increment int by 1
  $int = $int + 1;

  # open file for reading
  open(FH, '<', $inputFile);

    # loop through each line in the file
    while(<FH>) {

    # split each line into an array
    @field = split(/$inputDelimiter/, $_);

    # remove new lines
    chomp $field[$common]; 
    chomp $field[$int];

    # run the mergeLines subroutine 
    mergeLines($field[$common], $field[$int]);
  }
  close(FH);
}


sub mergeLines {

  # get the values includes in the mergeLines subroutine that was called above
  my ($key, $value) = @_;  

  # if statement prevents identical data from being pushed into the array
  # if the data is not already in the array, the data (value) is pushed into the key array
  if ( ! grep /$value/, @{$hash{$key}} ) { push @{$hash{$key}}, "$value$outputDelimiter"; }
}


foreach my $key (sort keys %hash) {
  print "$key @{$hash{$key}}\n";
}

The print statement in the foreach loop will produce the following result.

Jeremy, Engineer Husband
Natalie, Wife
Winston, Child

Did you find this article helpful?

If so, consider buying me a coffee over at

Did you find this article helpful?

Comments

Add a Comment