|
Multicore Plan
Adding multicore support for Mercurial
Contents
1. Introduction
Various Mercurial operations could be faster if they were able to run on multiple cores. These include:
- update/merge
- verify
- status (especially over NFS)
Given that CPU cores are no longer getting significantly faster, using multiple cores allows using the available processing power of a workstation more efficiently.
2. Strategy
Unfortunately, there is no standard paradigm that lets us easily and efficiently use multiple cores.
- Python threads: subject to the infamous GIL
- fork(): not available on Windows
- native threads: not well-supported in Python
- multiprocessing module: architecturally difficult, can't inherit significant state like a repo object
So our strategy will be to use a hybrid approach: fork() on Unix and Python threads on Windows. This hybrid will be called a "worker" and will be subject to the constraints of both models:
- read-only shared state
- no inter-worker locking
- very limited IPC
Workers are managed by a generic dispatcher function that takes a work function and a list of jobs. The dispatcher creates a worker pool, farms out jobs to available workers, collects and combines results, and shuts down the pool on completion.
Code that wants to support multicore should implement a single path (ie worker function) using the dispatcher for both single-threaded and multicore use.
