Issue2085

Title 'hg fetch' makes four passes over working copy
Priority bug Status done-cbb
Superseder Nosy List djc, jglick, mpm, tonfa
Assigned To Topics performance

Created on 2010-03-10.15:54:56 by jglick, last changed 2012-02-01.00:14:19 by mpm.

Messages
msg18855 (view) Author: mpm Date: 2012-02-01.00:14:19
fetch -> no one cares
msg12007 (view) Author: djc Date: 2010-03-10.15:59:34
fetch hasn't been touched in a long time... There's probably some
low-hanging fruit there.
msg12006 (view) Author: jglick Date: 2010-03-10.15:56:13
This was using Hg 1.5, by the way, on Ubuntu with Python 2.6.4, ext3
(rw,noatime,nodiratime,relatime,data=writeback).
msg12005 (view) Author: jglick Date: 2010-03-10.15:54:56
I do routine edits and fetches on a large repository
(http://hg.netbeans.org/main-silver/ or related clones). Until I am more
confident in the stability of the INotify extension, doing a fetch naturally
requires walking the working copy to make sure there are no uncommitted
modifications. But this can take a very long time; even with a warm disk
cache, one fetch can take a minute or more, and of course it is much worse
with a cold cache.

I ran strace on a fetch in which I had a local changeset touching one file
and pulled down a modest number of new changesets:

pulling from http://hg.netbeans.org/core-main/
searching for changes
adding changesets
adding manifests
adding file changes
added 36 changesets with 146 changes to 144 files (+1 heads)
updating to 163149:87fb51b56b48
145 files updated, 0 files merged, 27 files removed, 0 files unresolved
merging with 163113:d43a0ef297af
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
new changeset 163150:ef713e5b161f merges remote changes with local

There were cca. 850k system calls. I picked one file at random which has not
been touched in a few years to see the overhead; it was stat'd four times!

lstat64("..../visualweb.libs.batik/library/src/org/apache/batik/css/engine/CSSEngineListener.java",
{st_mode=S_IFREG|0644, st_size=2982, ...}) = 0
fstatat64(6, "CSSEngineListener.java", {st_mode=S_IFREG|0644, st_size=2982,
...}, AT_SYMLINK_NOFOLLOW) = 0
fstatat64(6, "CSSEngineListener.java", {st_mode=S_IFREG|0644, st_size=2982,
...}, AT_SYMLINK_NOFOLLOW) = 0
lstat64("..../visualweb.libs.batik/library/src/org/apache/batik/css/engine/CSSEngineListener.java",
{st_mode=S_IFREG|0644, st_size=2982, ...}) = 0

Surely the fetch operation should not need to check the stat on this file
more than once to know it has not been modified locally?

In general, number of stats in the working copy seems to be closely
correlated with how slow Hg operations will feel on a large repository. 'hg
commit <some-small-subdir>' is usually fast enough despite a very large
manifest and changelog.
History
Date User Action Args
2012-02-01 00:14:19mpmsetstatus: chatting -> done-cbb
nosy: + mpm
messages: + msg18855
2010-03-12 22:58:40tonfasetnosy: + tonfa
2010-03-10 15:59:34djcsetnosy: + djc
messages: + msg12007
2010-03-10 15:56:13jglicksetstatus: unread -> chatting
messages: + msg12006
2010-03-10 15:54:56jglickcreate