Issue2092

Title Entire working copy traversed during pull -u
Priority bug Status done-cbb
Superseder Nosy List abuehl, ede, jglick, mpm, tonfa
Assigned To Topics 1.5, merge, performance

Created on 2010-03-12.17:24:52 by jglick, last changed 2012-02-01.00:14:53 by mpm.

Messages
msg18856 (view) Author: mpm Date: 2012-02-01.00:14:53
No one knows how to fix this.
msg12040 (view) Author: jglick Date: 2010-03-12.23:20:02
Perhaps there could be a quick mode which applies when there are no renames
in the incoming csets, falling back to the current behavior when there are.
msg12039 (view) Author: tonfa Date: 2010-03-12.22:58:02
I managed to make _checkunknown not stat unknown files, but it changes the 
behaviour of hg.

1) merges with directory renames won't moves unknown files anymore
2) the "%% merge of b expected" test of test-merge1 won't merge b with the 
unknown version of b anymore.
msg12038 (view) Author: ede Date: 2010-03-12.22:36:59
I've run into this too. Via strace, the following method calls in merge.update 
independently cause the files in the working directory to be stat'd.

_checkunknown
_forgetremoved
manifestmerge

I'm not terribly familiar with the code, but it looks like it will need some 
special casing to avoid the excessive stat calls for this situation.

pull -u is currently much faster than fetch probably because fetch stats the 
working directory multiple times (issue2085).
msg12034 (view) Author: jglick Date: 2010-03-12.18:14:15
I'm not sure if it is a regression. I thought I remembered pull -u being
much faster than fetch in the past (and thus useful when you have no
outgoing changesets), but I never bothered to trace it until now.
msg12032 (view) Author: tonfa Date: 2010-03-12.17:38:46
It's not a regression right? We've always been inefficient on update, where 
we check for unknown even when we don't need it.

We could probably be a lot more lazy on that regard.
msg12031 (view) Author: jglick Date: 2010-03-12.17:24:52
On a clone of a very large repository, with a fairly warm disk cache, I run
(Hg 1.5, Python 2.6.4, Ubuntu):

$ hg pull -u
pulling from http://....
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files
1 files updated, 0 files merged, 0 files removed, 0 files unresolved

The pull part completes fairly quickly, but it takes a minute or so to
perform the update. This despite the fact that the new changeset involves
just a plain edit to a single file. Since pull -u is supposed to carry over
any local modification (in this case I had none), and I am using a
straightforward ext3 filesystem with no case-folding considerations, in
principle all Hg really needed to do here was:

1. Verify that this one file was not modified.

2. Update it to the new version as given in the new manifest.

Instead, strace reveals it doing a full walk of the working copy, which is
of course orders of magnitude slower. For example, references to an
arbitrarily picked directory with one file in it, completely unrelated to
the new changeset, include (note also the useless double close()):

fstatat64(5, "TemplateCompletionTestCase", {st_mode=S_IFDIR|0755,
st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
.....later.....
open("...full path.../TemplateCompletionTestCase", O_RDONLY|O_LARGEFILE) = 5
fstat64(5, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
fcntl64(5, F_GETFL)                     = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl64(5, F_SETFD, FD_CLOEXEC)         = 0
getdents64(5, /* 3 entries */, 32768)   = 80
fstatat64(5, "template.cc", {st_mode=S_IFREG|0644, st_size=649, ...},
AT_SYMLINK_NOFOLLOW) = 0
getdents64(5, /* 0 entries */, 32768)   = 0
close(5)                                = 0
close(5)                                = -1 EBADF (Bad file descriptor)

Ideally the Hg test suite would permit a limited number of system calls
related to a given repository or working copy path during a given command.
Using a native instrumentation tool like strace is the strictest way to
enforce this, but satisfactory regression tests might be written using
decorators around Python functions that trigger system calls so long as it
is feasible to enumerate all such functions in use in Hg sources. (For
comparison, in Java it is possible to install a SecurityManager which
records every attempted java.io.File access.)
History
Date User Action Args
2012-02-01 00:14:53mpmsetstatus: chatting -> done-cbb
nosy: mpm, tonfa, jglick, abuehl, ede
messages: + msg18856
2010-03-12 23:20:02jglicksetnosy: mpm, tonfa, jglick, abuehl, ede
messages: + msg12040
2010-03-12 22:58:19tonfasetnosy: + mpm
2010-03-12 22:58:13tonfasettopic: + merge
2010-03-12 22:58:02tonfasetmessages: + msg12039
2010-03-12 22:36:59edesetmessages: + msg12038
2010-03-12 18:14:15jglicksetmessages: + msg12034
2010-03-12 17:59:48edesetnosy: + ede
2010-03-12 17:54:07abuehlsetnosy: + abuehl
2010-03-12 17:38:46tonfasetstatus: unread -> chatting
nosy: + tonfa
messages: + msg12032
2010-03-12 17:24:52jglickcreate