For the Wide Finder project, a way to realize the task would be to split the large file, fork sub-process to process each section in parallel then join all the outputs together. Fork/join is Verilog speak. You might also call it mapping file processors onto each file section in parallel then reducing their output.
I have the following bash shell that I need to test on a real machine:
#!/bin/bash -x
##
## Multi-subprocessing of Wide Finder file
##
# Number of sub processes
#subprocs="$1"
subprocs="4"
# Input file
#infile="$2"
infile="o10k.ap"
subprocscript="clv5.sh"
splitfileprefix=_multi_proc_tmp
rm -f ${splitfileprefix}*
size=$(stat -c%s "$infile")
splitsize=`gawk -v x=$size -v y=$subprocs 'BEGIN{printf\"%i\", (x/y)*1.01}'`
## Split
split --line-bytes=$splitsize "$infile" $splitfileprefix
for f in ${splitfileprefix}*; do
$subprocscript $f > $f.sub
done
jobs
wait
## Join
gawk '{x[$2]+=$1}END{for (n in x){print x[n], n}}' ${splitfileprefix}*.sub \
| sort -n \
| tail
No comments:
Post a Comment