perl去除重复内容的脚本代码(重复行+数组重复字段)

admin Perl

2023-12-05 0 275

假如有这样的一段序列：1 2 1 2 2 1 1 3 1 4 1 5 4 1 我们需要得到如下的结果：1 3 1 5 2 1 4 1 那么，请借助以下的perl脚本来实现。代码一：复制代码代码如下:#!/bin/perluse strict; use warnings; my $filename; my %hash; my @information; my $key1; my $key2; print \”please put in the file like this f:\\\\\\\\perl\\\\\\\\data.txt\\n\”; chomp($filename=<STDIN>); open(IN,\”$filename\”)||die(\”can not open\”); while(<IN>) { chomp; @information=split/\\s+/,$_; if(exists $hash{$information[0]}{$information[1]}) { next; } else { $hash{$information[0]}{$information[1]}=\’A\’; } } close IN; open(IN,\”$filename\”)||die(\”can not open\”); while(<IN>) { @information=split/\\s+/,$_; if(exists $hash{$information[1]}{$information[0]}) { delete $hash{$information[0]}{$information[1]} } else { next; } } close IN; open(OUT,\”>f:\\\\A_B_result.txt\”)||die(\”can not open\”); foreach $key1 (sort{$a<=>$b} keys %hash) { foreach $key2 (sort{$a<=>$b} keys %{$hash{$key1}}) { print OUT \”$key1 $key2\\n\”; } } close OUT; 代码二：如果有一个文件data有10G大，但是有好多行都是重复的，需要将该文件中重复的行合并为一行，那么我们需要用什么办法来实现 cat data |sort|uniq > new_data #该方法可以实现，但是你需要花上好几个小时。结果才能出来。下面是一个使用perl脚本来完成此功能的小工具。原理很简单，创建一个hash，每行的内容为键,值由每行出现的次数来填充，脚本如下;复制代码代码如下:#!/usr/bin/perl# Author :CaoJiangfeng# Date:2011-09-28# Version :1.0use warnings;use strict;

my %hash;my $script = $0; # Get the script name

sub usage { printf(\”Usage:\\n\”); printf(\”perl $script <source_file> <dest_file>\\n\”);

}

# If the number of parameters less than 2 ,exit the scriptif ( $#ARGV+1 < 2) {

&usage; exit 0;}

my $source_file = $ARGV[0]; #File need to remove duplicate rowsmy $dest_file = $ARGV[1]; # File after remove duplicates rows

open (FILE,\”<$source_file\”) or die \”Cannot open file $!\\n\”;open (SORTED,\”>$dest_file\”) or die \”Cannot open file $!\\n\”;

while(defined (my $line = <FILE>)){ chomp($line); $hash{$line} += 1; # print \”$line,$hash{$line}\\n\”;}

foreach my $k (keys %hash) { print SORTED \”$k,$hash{$k}\\n\”;#改行打印出列和该列出现的次数到目标文件}close (FILE);close (SORTED);

代码三：通过perl脚本，删除数据组中重复的字段

复制代码代码如下:#!/usr/bin/perluse strict;my %hash;my @array = (1..10,5,20,2,3,4,5,5);#grep 保存符合条件的元素@array = grep { ++$hash{$_} < 2 } @array;print join(\” \”,@array);print \”\\n\”;

您可能感兴趣的文章: