Doctor Rst Speedup
This is another part in the performance series.
Since I have published the last performance article about Rector, Oskar Stark - one of my twitter followers got in touch with me:
@markusstaab we run OskarStark/doctor-rst on all PRs in symfony/symfony-docs 😃 maybe you will check this package for performance too 😍
He is a member of the symfony core team and is working on the symfony-docs.
DOCtor-RST is a linter used in the symfony-docs repo to check *.rst files. Like other static analysis tools it is scanning the sources at hand and provides feedback about common errors and best practices.
Disclaimer: I had never used this tool before and also have zero experience with RST file format.
At the time of writing, running the linter over the symfony-docs repo takes about 50 seconds in the GitHub Actions workflow. Lets run DOCtor-RST version 1.46.0 locally on my mac against symfony-docs@ff62e1203 to get a baseline:
$ time php bin/doctor-rst analyze ../symfony-docs/ --no-cache 31.35s user 0.30s system 99% cpu 31.689 total
Lets profile it …?
As you already know my the next step when investigating performance is running the blackfire profiler on the workfload.
$ blackfire run --ignore-exit-status php bin/doctor-rst analyze ../symfony-docs/ --no-cache The profile will be stored in your Personal environment. The "--environment" option can be used to specify the target environment. Analyze *.rst(.inc) files in: /Users/staabm/workspace/symfony-docs Used config file: /Users/staabm/workspace/symfony-docs/.doctor-rst.yaml Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 4096 bytes) in /Users/staabm/workspace/doctor-rst/vendor/symfony/string/AbstractUnicodeString.php on line 236 PHP Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 4096 bytes) in /Users/staabm/workspace/doctor-rst/vendor/symfony/string/AbstractUnicodeString.php on line 236
It’s not that unusual that a running a profiler requires more memory on a workload, therefore I raised the php memory limit to 16GB. Still I am running in out of memory errors… 🤔
For a sanity check, I added a memory debug out at the end of the analysis process into the
AnalyzeCommand and ran it again without blackfire:
$output->writeln(memory_get_peak_usage(true) / 1024 / 1024 . ' MB');
PHP reports a peak memory of 12MB, so it was not that high. At this point I concluded we are likely facing a memory issue in the profiler and reported the issue to the blackfire team.
To get the analysis process running nevertheless, I then decided to reduce the number of *.rst files to analyse. Therefore I locally deleted *.rst files in the my symfony-docs checkout until blackfire did run without memory issues. Its not a perfect situation but we could get at least a first idea of the performance characteristics of the workload.
As we already saw in previous investigations reducing IO is a good first thing.
In the following graph you can see a lot of calls to
We just had to introduce a local variable and call it a day.
excessive use of preg_match()
DOCtor-RST internally uses symfony/string which heavily uses multi-byte string functions. These functions are known to be inefficient in PHP - even though with the latest PHP releases they got much better.
The profiles show us a memory bottleneck on said calls:
One experience I had in the past is that in most cases using regular string functions is way more efficient.
I had a look at all used
->matches(…) invocations and decided to concentrate on a few simple ones, which can be expressed without regular expressions.
Rewriting these expression already yielded a great improvement, as these were invoked quite frequently:
Another case where I was able to reduce the use of regular expressions was in the
In this case we had a expression trying to match a string starting with some certain characters.
I decided to add some quick checks which in most cases prevent the acutal regular expression to be executed.
These yielded another great improvement in memory consumption and a small improvement in runtime:
Even if these optimizations were focused on memory oftentimes it turns out they also improve runtime performance. PHP needs to handle huge amounts of data in memory and therefore this managment results in slower executed scripts. Also garbage collection needs to be heavily involved which takes time to track the memory.
I did a few more performance oriented pull requests but nothing of big interesst which needs further explaination.
After all the changes landed lets have another look at the workload:
$ time php bin/doctor-rst analyze ../symfony-docs/ --no-cache 20.35s user 0.30s system 99% cpu 21.689 total
We are now able to run the workload ~10 seconds faster then the initial ~30 seconds. This should reduce wait time when contributing to the symfony-docs.
As always, this improvements were crafted in my freetime. I am not a symfony framework user either. Please consider supporting my work, so I can make sure open source tools keeps as fast as possible and evolves to the next level.
Happy documenting! 📖
Found a bug? Please help improve this article.