needed to monitor how far it got in reading the files. In my own Python
version of the utility the reading of data was ten times faster than
the subsequent processing and i wanted to find out if this proprietary
solution, which was havinfg performance problems , was equally spending
most of its time reading data.
The proprietary program took a list of 20000 files to process as its
first argument and I remembered that on Linux, the /proc
directory had info on running processes and sure enough , the
/proc/<process id>/fd directory had info on all the file
descriptors currently open by the process as links. So by opening the
list of files to in my editor and searching within it for the file name
shown on one of the file descriptors, I could gauge how many
files been read so far.
I decided to automate the checking and wrote a shell script using
cat/fgrep/gawk/... that then told me what line in the list of files to
process the program was currently at.
Now I've had time to refine things to use mainly python but to
demonstrate its use I also have to generate a test environment
First create some test files to process
style="font-family: monospace;">bash$ style="font-weight: bold;">mkdir -p /tmp/test
style="font-family: monospace;">
bash$ style="font-weight: bold;">for ((i=0; i < 100; i++))
do touch /tmp/test/file$i ;done
style="font-family: monospace;">
bash$ style="font-weight: bold;">/bin/ls /tmp/test/file* >
/tmp/test/all_files.lst
style="font-family: monospace;">
bash$ style="font-weight: bold;">head /tmp/test/all_files.lst
style="font-family: monospace;">
/tmp/test/file0
style="font-family: monospace;">
/tmp/test/file1
style="font-family: monospace;">
/tmp/test/file10
style="font-family: monospace;">
/tmp/test/file11
style="font-family: monospace;">
/tmp/test/file12
style="font-family: monospace;">
/tmp/test/file13
style="font-family: monospace;">
/tmp/test/file14
style="font-family: monospace;">
/tmp/test/file15
style="font-family: monospace;">
/tmp/test/file16
style="font-family: monospace;">
/tmp/test/file17
style="font-family: monospace;">
bash$
style="font-family: monospace;">
bash$ style="font-weight: bold;">for ((i=0; i < 100; i++))
do touch /tmp/test/file$i ;done
style="font-family: monospace;">
bash$ style="font-weight: bold;">/bin/ls /tmp/test/file* >
/tmp/test/all_files.lst
style="font-family: monospace;">
bash$ style="font-weight: bold;">head /tmp/test/all_files.lst
style="font-family: monospace;">
/tmp/test/file0
style="font-family: monospace;">
/tmp/test/file1
style="font-family: monospace;">
/tmp/test/file10
style="font-family: monospace;">
/tmp/test/file11
style="font-family: monospace;">
/tmp/test/file12
style="font-family: monospace;">
/tmp/test/file13
style="font-family: monospace;">
/tmp/test/file14
style="font-family: monospace;">
/tmp/test/file15
style="font-family: monospace;">
/tmp/test/file16
style="font-family: monospace;">
/tmp/test/file17
style="font-family: monospace;">
bash$
Now lets create a test executable to monitor
This script just holds each file open for reading for twenty seconds
before closing the file.
style="font-family: monospace;">bash$ style="font-weight: bold;">python -c ' style="color: rgb(0, 0, 153);">import sys,time
style="font-family: monospace; color: rgb(0, 0, 153);">
for
name in file(sys.argv[1]):
style="font-family: monospace; color: rgb(0, 0, 153);">
f = file(name.strip())
style="font-family: monospace; color: rgb(0, 0, 153);">
time.sleep(45)
style="font-family: monospace; color: rgb(0, 0, 153);">
f.close()
' style="font-weight: bold;">/tmp/test/all_files.lst
&
style="font-family: monospace;">
[2] style="font-weight: bold;"> style="font-family: monospace; color: rgb(0, 0, 153);"> style="font-weight: bold; color: red;">7984
style="font-family: monospace;">
bash$
style="font-family: monospace; color: rgb(0, 0, 153);">
for
name in file(sys.argv[1]):
style="font-family: monospace; color: rgb(0, 0, 153);">
f = file(name.strip())
style="font-family: monospace; color: rgb(0, 0, 153);">
time.sleep(45)
style="font-family: monospace; color: rgb(0, 0, 153);">
f.close()
' style="font-weight: bold;">/tmp/test/all_files.lst
&
style="font-family: monospace;">
[2] style="font-weight: bold;"> style="font-family: monospace; color: rgb(0, 0, 153);"> style="font-weight: bold; color: red;">7984
style="font-family: monospace;">
bash$
here is what the fd directory looks like
style="font-family: monospace;">bash$ ls -l /proc/ style="font-family: monospace; color: rgb(0, 0, 153);"> style="font-weight: bold; color: red;">7984 style="font-family: monospace;">/fd
style="font-family: monospace;">
total 0
style="font-family: monospace;">
lrwxrwxrwx 1 HP
DV8025EA None 0 Apr 21 22:17 0 -> /dev/tty1
style="font-family: monospace;">
lrwxrwxrwx 1 HP
DV8025EA None 0 Apr 21 22:17 1 -> /dev/tty1
style="font-family: monospace;">
lrwxrwxrwx 1 HP
DV8025EA None 0 Apr 21 22:17 2 -> /dev/tty1
style="font-family: monospace;">
lrwxrwxrwx 1 HP
DV8025EA None 0 Apr 21 22:17 3 -> /tmp/test/all_files.lst
style="font-family: monospace;">
lrwxrwxrwx 1 HP
DV8025EA None 0 Apr 21 22:17 4 -> /tmp/test/file0
style="font-family: monospace;">
bash$
style="font-family: monospace;">
total 0
style="font-family: monospace;">
lrwxrwxrwx 1 HP
DV8025EA None 0 Apr 21 22:17 0 -> /dev/tty1
style="font-family: monospace;">
lrwxrwxrwx 1 HP
DV8025EA None 0 Apr 21 22:17 1 -> /dev/tty1
style="font-family: monospace;">
lrwxrwxrwx 1 HP
DV8025EA None 0 Apr 21 22:17 2 -> /dev/tty1
style="font-family: monospace;">
lrwxrwxrwx 1 HP
DV8025EA None 0 Apr 21 22:17 3 -> /tmp/test/all_files.lst
style="font-family: monospace;">
lrwxrwxrwx 1 HP
DV8025EA None 0 Apr 21 22:17 4 -> /tmp/test/file0
style="font-family: monospace;">
bash$
And here is a python script to monitor that fd directories
link number 4 periodically
style="font-family: monospace;">bash$ style="font-weight: bold;">python -c ' style="color: rgb(0, 0, 153);">import sys,time,os,datetime
style="color: rgb(0, 0, 153);">
name2index =
dict((name.strip(), index) for index,name in
enumerate(file(sys.argv[1])))
style="color: rgb(0, 0, 153);">
all = len(name2index)
style="color: rgb(0, 0, 153);">
while True:
style="color: rgb(0, 0, 153);">
path =
os.path.realpath("/proc/7984/fd/ style="font-weight: bold; color: red;">4").strip()
style="color: rgb(0, 0, 153);">
print
name2index[path],"/",all, path, datetime.datetime.now().isoformat()
style="color: rgb(0, 0, 153);">
time.sleep(30)
' /tmp/test/all_files.lst
22 / 100 /tmp/test/file29 2009-04-21T22:34:07.817750
23 / 100 /tmp/test/file3 2009-04-21T22:34:37.820750
24 / 100 /tmp/test/file30 2009-04-21T22:35:07.825750
24 / 100 /tmp/test/file30 2009-04-21T22:35:37.834750
style="color: rgb(0, 0, 153);">
name2index =
dict((name.strip(), index) for index,name in
enumerate(file(sys.argv[1])))
style="color: rgb(0, 0, 153);">
all = len(name2index)
style="color: rgb(0, 0, 153);">
while True:
style="color: rgb(0, 0, 153);">
path =
os.path.realpath("/proc/7984/fd/ style="font-weight: bold; color: red;">4").strip()
style="color: rgb(0, 0, 153);">
name2index[path],"/",all, path, datetime.datetime.now().isoformat()
style="color: rgb(0, 0, 153);">
time.sleep(30)
' /tmp/test/all_files.lst
22 / 100 /tmp/test/file29 2009-04-21T22:34:07.817750
23 / 100 /tmp/test/file3 2009-04-21T22:34:37.820750
24 / 100 /tmp/test/file30 2009-04-21T22:35:07.825750
24 / 100 /tmp/test/file30 2009-04-21T22:35:37.834750
I watched the monitor output over the next couple of hours and found
out when file reading ended and processing of read data started.
END.
strace -e open -p pid
ReplyDeletewill tell you the same kind of information.