perl compare elements of arrays and grouping -
i question. have list of data:
1 l dieltqspe h evqlqesdaelvkpgasvkisckasgytftdhe 2 l divltqsprvt h evqlqqsgaelvkpgasikdty 3 alqltqspsslsas b ritlkesgpplvkptcs c eldkwan 4 alqltqspsslsas b ritlkesgpplvkptcs c eldkwag 5 alqltqspsslsas b ritlkesgpplvkptcs c leldkwasl 6 l diqmtqipsslsaslsic h evqlqqsgvevkmsckasgytfts 7 l syeltqppsvsvspgsit h qvqlvqsakgsgysfs p ynkrkafyttkniig 8 l syeltqppsvsvspgrit h evqlvqsgaasgysfs p nntrkafyatgdiig 9 mpimgssvavlail b divmtqsptvti c evqlqqsgrgp 10 mpimgssvvlail b divmtqsptvti c evqlqqsgrgp 11 l dvvmtqtplq h evkldesvtvtsstwpsqsitcnvahpasstkvdkkie 12 divmtqspdaqyystpysfgqgtkleikr and compare 3rd elements && 5th elements of each row, group them if have same 3rd && 5th elements. example, data above, results :
3: 3 alqltqspsslsas b ritlkesgpplvkptcs c eldkwan 4 alqltqspsslsas b ritlkesgpplvkptcs c eldkwag 5 alqltqspsslsas b ritlkesgpplvkptcs c leldkwasl 9: 9 mpimgssvavlail b divmtqsptvti c evqlqqsgrgp 10 mpimgssvvlail b divmtqsptvti c evqlqqsgrgp fyi, in actual data, 3rd, 5th, 7th elements long. have made them cut see whole.
this have done, know clumsy, beginner, doing best. , problem shows first set of 'same' group. show me went wrong and/or other pretty methods solve this, please?
my $file = <>; open(in, $file)|| die "no $file: $!\n"; @arr; while (my $line=<in>){ push @arr, [split (/\s+/, $line)] ; } close in; (@temp1, @temp2,%hash1); (my $i=0;$i<=$#arr ;$i++) { push @temp1, [$arr[$i][2], $arr[$i][4]]; (my $j=$i+1;$j<=$#arr ;$j++) { push @temp2, [$arr[$j][2], $arr[$j][4]]; if (($temp1[$i][0] eq $temp2[$j][0])&& ($temp1[$i][1] eq $temp2[$j][1])) { push @{$hash1{$arr[$i][0]}}, $arr[$i], $arr[$j]; } } } print dumper \%hash1;
you appear have overcomplicated bit more needs be, that's common beginners. think more how manually:
- look @ each line.
- see whether third , fifth fields same previous line.
- if so, print them.
the looping , unnecessary:
#!/usr/bin/env perl use strict; use warnings; ($previous_row, $third, $fifth) = ('') x 3; while (<data>) { @fields = split; if ($fields[2] eq $third && $fields[4] eq $fifth) { print $previous_row if $previous_row; print "\t$_"; $previous_row = ''; } else { $previous_row = $fields[0] . "\t" . $_; $third = $fields[2]; $fifth = $fields[4]; } } __data__ 1 l dieltqspe h evqlqesdaelvkpgasvkisckasgytftdhe 2 l divltqsprvt h evqlqqsgaelvkpgasikdty 3 alqltqspsslsas b ritlkesgpplvkptcs c eldkwan 4 alqltqspsslsas b ritlkesgpplvkptcs c eldkwag 5 alqltqspsslsas b ritlkesgpplvkptcs c leldkwasl 6 l diqmtqipsslsaslsic h evqlqqsgvevkmsckasgytfts 7 l syeltqppsvsvspgsit h qvqlvqsakgsgysfs p ynkrkafyttkniig 8 l syeltqppsvsvspgrit h evqlvqsgaasgysfs p nntrkafyatgdiig 9 mpimgssvavlail b divmtqsptvti c evqlqqsgrgp 10 mpimgssvavlail b divmtqsptvti c evqlqqsgrgp 11 l dvvmtqtplq h evkldesvtvtsstwpsqsitcnvahpasstkvdkkie 12 divmtqspdaqyystpysfgqgtkleikr (note changed line 10 third field match line 9 in order same groups in output specified.)
edit: 1 line of code duplicated copy/paste error.
edit 2: in response comments, here's second version doesn't assume lines should grouped contiguous:
#!/usr/bin/env perl use strict; use warnings; @lines; while (<data>) { push @lines, [ $_, split ]; } # sort @lines based on third , fifth fields (alphabetically), on # first field/line number (numerically) when third , fifth fields match @lines = sort { $a->[3] cmp $b->[3] || $a->[5] cmp $b->[5] || $a->[1] <=> $b->[1] } @lines; ($previous_row, $third, $fifth) = ('') x 3; (@lines) { if ($_->[3] eq $third && $_->[5] eq $fifth) { print $previous_row if $previous_row; print "\t$_->[0]"; $previous_row = ''; } else { $previous_row = $_->[1] . "\t" . $_->[0]; $third = $_->[3]; $fifth = $_->[5]; } } __data__ 1 l dieltqspe h evqlqesdaelvkpgasvkisckasgytftdhe 3 alqltqspsslsas b ritlkesgpplvkptcs c eldkwan 2 l divltqsprvt h evqlqqsgaelvkpgasikdty 5 alqltqspsslsas b ritlkesgpplvkptcs c leldkwasl 7 l syeltqppsvsvspgsit h qvqlvqsakgsgysfs p ynkrkafyttkniig 6 l diqmtqipsslsaslsic h evqlqqsgvevkmsckasgytfts 9 mpimgssvavlail b divmtqsptvti c evqlqqsgrgp 8 l syeltqppsvsvspgrit h evqlvqsgaasgysfs p nntrkafyatgdiig 11 l dvvmtqtplq h evkldesvtvtsstwpsqsitcnvahpasstkvdkkie 10 mpimgssvavlail b divmtqsptvti c evqlqqsgrgp 12 divmtqspdaqyystpysfgqgtkleikr 4 alqltqspsslsas b ritlkesgpplvkptcs c eldkwag
Comments
Post a Comment