Perl (Scripting) - Flatten lists (merge similar lines)

by
Jeremy Canfield |
Updated: March 22 2020
| Perl (Scripting) articles
Let's say you want to flatten a list that has an identical key (Jeremy in this example).
Jeremy, Engineer
Jeremy, Husband
Natalie, Wife
Winston, Child
The following script will flatten the list.
#!/usr/bin/perl
use strict;
# --------------------------------------------
# CHANGE THESE VALUES !!!
# --------------------------------------------
# $inputFile is the full path to your file that contains lines you want to merge
# $outputFile is the full path to your file that will contain the merged lines
# output file can be the same as input file
# output file will default to be input file if output file is empty
# $common must be an integer and represents the common field in your file
my $inputFile = "example.txt";
my $outputFile = "";
my $inputDelimiter = ",";
my $outputDelimiter = ",";
my $common = 0;
# ############################################
#
# DO NOT CHANGE ANYTHING BELOW ! ! !
#
# ############################################
# hash, array, and variables that will be used
my %hash;
my @field;
my $count = 0;
my $current_count = 0;
my $int = 0;
# Set $outputFile, $outputDelimiter to contain the value of $inputFile, $inputDelimiter if empty
if ($outputFile eq "") { $outputFile = $inputFile; }
if ($outputDelimiter eq "") { $outputDelimiter = $inputDelimiter; }
# The purpose of this block is to update the $count variable
# to contain the number of fields from the lines with the highest
# number of fields in the file.
#
# The reason count is needed is so that we loop through the file
# for each field of each line.
#--------------------------------------------------------------
# open file for reading
open(FH, "<", $inputFile);
# loop through each line in the file
while(<FH>) {
# split each line into an array
@field = split(/$inputDelimiter/, $_);
# count the number of feilds in the array
$current_count = @field;
# set the count to be the line with the greatest number of fields
if ($current_count >= $count) { $count = $current_count; }
}
close(FH);
# subtract 1 from count because the following section begins and 0 instead of 1
$count = $count - 1;
# The purpose of this block is to loop through each line in example.txt.
# The opening while loop is used so that we loop through the entire file one field at a time.
#--------------------------------------------------------------
# execute this while loop until $int has the same value as $count
while ($int < $count) {
# increment int by 1
$int = $int + 1;
# open file for reading
open(FH, '<', $inputFile);
# loop through each line in the file
while(<FH>) {
# split each line into an array
@field = split(/$inputDelimiter/, $_);
# remove new lines
chomp $field[$common];
chomp $field[$int];
# run the mergeLines subroutine
mergeLines($field[$common], $field[$int]);
}
close(FH);
}
sub mergeLines {
# get the values includes in the mergeLines subroutine that was called above
my ($key, $value) = @_;
# if statement prevents identical data from being pushed into the array
# if the data is not already in the array, the data (value) is pushed into the key array
if ( ! grep /$value/, @{$hash{$key}} ) { push @{$hash{$key}}, "$value$outputDelimiter"; }
}
foreach my $key (sort keys %hash) {
print "$key @{$hash{$key}}\n";
}
The print statement in the foreach loop will produce the following result.
Jeremy, Engineer Husband
Natalie, Wife
Winston, Child
Did you find this article helpful?
If so, consider buying me a coffee over at