My developer experience

Array Shapes For Preg Match Matches

2024-07-05T00:00:00+00:00

In August 2023, I started into an adventure which in the end took me 10 months to figure out. It’s another part about my ongoing efforts to close blind spots in PHPStan’s type inference.

I did similar things before with phpstan-dba, which implements SQL based static analysis and type inference for the database access layer.

The journey to precise array-shapes for `preg_match` $matches

In its most basic form, we search for the answer to the following question: How does the $matches array look like after a preg_match call?

function doFoo(string $s): void {
    if (preg_match('/(?:(a)(\d))?(c)(\s)*/', $s, $matches)) {
        // how can $matches look like at this line?
    } else {
        // how can $matches look like at this line?
    }
    // how can $matches look like at this line?
}

I am not aware of any static analysis tool which is able to figure this out, so it kind of was clear that this will take a few experiments and time-consuming research. Play with the example in the PHPStan playground.

To explore a possible solution, I had to answer a few questions:

Which capturing groups (named vs. unnamed) are contained in the used pattern?
Which capturing groups are optional/conditional?
How do the capturing groups relate to the array-shape of $matches?
How can the $flags parameter influence the array-shape of $matches?
How do the resulting array-shapes flow through the branches of the if-else construct?
How to implement this type-inference improving mechanism in a way, that other preg_match wrapping libraries could benefit from it (e.g. composer/pcre, nette/utils)?

Thanks to the great PHPStan community a few other people stopped by and helped me with some super special corner cases. Also adding more test-cases to the initial prototype was really helpful to get a high quality implementation in the end.

TL;DR: The feature is already merged into PHPStan starting with 1.11.6 and can be enabled via Bleeding Edge. Most relevant pull requests along the road were…

the very first iteration
implementation of ParameterOutTypeExtensions
improvements in the type-specifier to handle non-strict comparisons
drop the regex pattern hack and do everything in the regex AST
handling of optional top level groups
handling of top level alternation groups

Figuring this one out was a joy, sometimes frustrating, and a time-consuming task. It’s a thing no other static analyzer I am aware of can handle and it will save any PHPStan user fiddling with preg_match a lot of time and effort. Please considering sponsoring my open-source efforts 💕.

TL;DR aside, lets dive into it…

Which capturing groups are contained in the used pattern?

One of the easier questions at first sight, since the initial requester of the above feature provided a regex pattern hack which obviously provided this information. I went with this hack for a few months and moved along.

While adding more and more test-cases with different patterns, we realized that the hack was not reliable. It needed a few tweaks to also work with named capturing. It does not work consistently across PHP versions.

As an alternative I started playing around with Hoa\Regex, a library already contained in PHPStan to build a abstract syntax tree (AST) for regex patterns. It’s the only library I could find in the PHP ecosystem suitable for this task. An additional complication is, that this library is not maintained anymore and has a few bugs. To get the AST parsing up to speed, I had to backport a few yet unreleased fixes from the upstream repository and with the support of Michael Voříšek we were able to fix the grammar file so named capturing groups were properly recognized.

In the end we decided to go with the AST parsing, since it was more reliable and also was the only solution which would work consistently for all php versions PHPStan 1.x supports (PHP 7.2+).

Which capturing groups are optional/conditional? How do the capturing groups relate to the array-shape of `$matches`?

In early prototype stage I had implemented a hybrid approach between the regex pattern hack and the AST parsing. We used the AST to identify which capturing groups would be contained and the pattern hack with PREG_UNMATCHED_AS_NULL to get an idea of the optional/conditional groups. PREG_UNMATCHED_AS_NULL started working properly in PHP 7.4, so making this work consistently across php-versions was another problem to solve.

Later I re-implemented the optional/conditional capturing group detection with plain AST based logic, which was a hell of a ride on its own. The main quest was to figure out when preg_match would leave out a capturing group from $matches (trailing optional groups) and how to properly structure the shape, when optional capturing groups are involved before mandatory capturing groups. Additionally, it’s not that easy to figure out, when a capturing group is optional or conditional. A group might be part of an alternation like (?:(\d)|(\w)) or (?:(\d)|(\w)|no-group). An alternation element might be optional on its own - as in (?:(\d)*|(\w)) - or the whole alternation might be optional like in (?:(\d)|(\w))? - or a mix of all that. As you might already imagine the field is pretty complex and doing the regex AST dance properly is quite a challenge.

You can find what was needed to get this working in the related classes: RegexArrayShapeMatcher, RegexCapturingGroup, RegexNonCapturingGroup.

At this point the implementation got simpler because we no longer had this hybrid thing.

Ondřej was also pretty happy about that:

How can the `$flags` parameter influence the array-shape of `$matches`?

That one was easier than the others. Bonus points were in because possible flags are php-version specific. Flags like PREG_UNMATCHED_AS_NULL can also lead to $matches to contain null values. PREG_OFFSET_CAPTURE will lead to a different array-shape, since values will be accompanied by their offset in the input string.

How do the resulting array-shapes flow through the branches of the if-else construct?

Let’s have a look back at our initial example:

function doFoo(string $s): void {
    if (preg_match('/(?:(a)(\d))?(c)(\s)*/', $s, $matches)) {
        // (a) how can $matches look like at this line?
    } else {
        // (b) how can $matches look like at this line?
    }
    // (c) how can $matches look like at this line?
}

One might think getting it resolved should be some kind of already solved puzzle. preg_match needs some special treatment though, because of the by-ref $matches arg is changing the variable outside the if-branch scope. See the following example which asserts the expected PHPStan type-inference within the given branches:

use function PHPStan\Testing\assertType;

function doFoo(string $s): void {
    if (preg_match('/(?:(a)(\d))?(c)(\s)*/', $s, $matches)) {
        // (a)
        assertType('array{0: string, 1: string, 2: string, 3: string, 4?: string}', $matches);
    } else {
        // (b)
        assertType('array{}', $matches);
    }
    // (c)
    assertType('array{}|array{0: string, 1: string, 2: string, 3: string, 4?: string}', $matches);
}

In the (a) branch, the pattern surely matches, so the array-shape consists of a mix of always-matched and sometimes-matched offsets
In the (b) branch, the pattern surely does not match, so the array-shape is empty
In the (c) branch, we don’t know whether the pattern matched, therefore the array-shape could be empty or a match

If you are interested in other test-cases and the types PHPStan can understand in these situations, please consult the test-suite. Alternatively copy the example code, drop it into the PHPStan online playground (don’t forget to enable the ‘Bleeding Edge’ checkbox) and see the expected types.

In an early prototype I was using ony a TypeSpecifyingExtension to override the type of $matches. This lead to some consequential problems though. TypeSpecifyingExtension are meant to narrow an existing type for the if-branch and/or the else-branch. It will not change the types after the if/else construct though.

We had to come up with a new type of PHPStan extension to properly handle the by-ref $matches argument. Up to this point in time a param-out type could only be defined using phpDoc. So we implemented ParameterOutTypeExtensions which allow to define param-out types programmatically and in a context-sensitive way.

The idea is, to use a FunctionParameterOutTypeExtension to type $matches the way the outer scope expects it to be (see (c)). On top, we use a FunctionTypeSpecifyingExtension to narrow this type for the if-branch (a) and/or the else-branch (b).

How to implement this type-inference improving mechanism in a way, that other `preg_match` wrapping libraries could benefit from it?

In the previous chapter we learned what PHPStan needs to ship in its core to support $matches type-inference for the preg_match function. The mentioned FunctionParameterOutTypeExtension and FunctionTypeSpecifyingExtension both rely on the magic which happens in RegexArrayShapeMatcher, which is doing the heavy lifting.

This RegexArrayShapeMatcher-class is declared as @api which means it is meant for use by other extensions outside the phpstan-src repository. We use it to implement the same type inference capabilities in nette/utils or composer/pcre. You might also use this class to build custom extensions for your very own preg_match-wrapping API.

Future work

For the future is planned to

stabilize the implementation to make it general available (without Bleeding Edge)
finalize the composer/pcre integration
finalize the PHP-CS-Fixer Preg::match integration
use similar type narrowing for preg_match_all and maybe other functions
use more precise types when possible

Support my open source work

In case this article was useful, or you want to honor the effort I put into one of the hundreds of pull-requests to PHPStan, please considering sponsoring my open-source efforts 💕.

Readable end-to-end tests for PHPStan with bashunit

2024-06-28T00:00:00+00:00

For a long time in the PHPStan repository, we have isolated, highly-parallel end-to-end tests which are written in bash utilizing GitHub Actions. The design and initial implemenation - as far as I know - has been done by Ondřej Mirtes.

I don’t know any other project doing end-to-end tests the way it is done in PHPStan. Since I have recently added bashunit to the end-to-end tests, I wanted to share some insights and the benefits of this approach.

NOTE: This post is only about end-to-end tests, not about unit tests or integration tests which would require a largely different setup. Also it is not about promoting this approach or comparing it against other mechanisms to implement end-to-end tests.

What’s a end-to-end test?

In the context of this article a PHPStan end-to-end test runs a previously compiled phar-file on the command line and asserts expectations based on the cli exit-code or the generated command output.

Example:

cd e2e/different-php-parser2 # change to test-directory
../../phpstan analyse -l 5 src/ # run the precompiled PHPStan

When these commands are executed within a GitHub Action, the test is considered successful when all commands exit with a 0 exit-code. As soon as a single command exits with a non-zero exit-code, the GitHub Action will stop executing and report an error - similar to how set -e works in bash scripts.

The PHPStan analyze command will return a non-zero exit-code when errors are found or internal errors happen. When the PHPStan analyze command ends without errors a 0 exit-code is returned.

GitHub Action based “data-provider”

Putting such a test into a GitHub Action is a great way to run it in a controlled environment. Every action run is isolated from others and depending on your GitHub pricing-plan the runner environment will execute even hundreds of these tests in parallel:

name: "E2E Tests"

on:
  pull_request:
     # … whatever event you want to trigger the tests

jobs:
  e2e-tests:
    name: "E2E tests"
    runs-on: "ubuntu-latest"
    timeout-minutes: 60

    strategy:
      matrix:
        include:
          - script: | # the actual test
            cd e2e/different-php-parser2
            composer install # install the tests' dependencies
            ../../phpstan analyse -l 5 src # run the precompiled PHPStan

          # … next test

    steps:
      - name: "Checkout" # checkout of the phpstan repository contains the test-source and a precompiled phar
        uses: actions/checkout@v4

      - name: "Install PHP"
        uses: "shivammathur/setup-php@v2"
        with:
          coverage: "none"
          php-version: "8.1"

      - name: "Test"
        run: ${{ matrix.script }}

Each end-to-end test in this case is a simple directory, which can contain anything a regular project could contain, like a composer.json, a phpstan.neon or a phpunit.xml.dist file. It means we can reproduce real world issue, which PHPStan users might face. Even if they only happen combined with other tools.

This setup also works for any other tool which has a command line interface.

For inspiration: Any subfolder below e2e/ in the PHPStan repository represents a single end-to-end test.

Since we are using a regular GitHub Action matrix in this scenario, we can easily add more test-parameters to the mix to cover other use-cases:

name: "E2E Tests"

on:
  pull_request:
     # … whatever event you want to trigger the tests

jobs:
  e2e-tests:
    name: "E2E tests"
    runs-on: "ubuntu-latest"
    timeout-minutes: 60

    strategy:
      matrix:
        include:
          - php-version: "8.1"
            script: |
            cd e2e/different-php-parser2
            composer install
            ../../phpstan analyse -l 5 src

          - php-version: "7.4"
            script: |
            cd e2e/another-test
            ../../phpstan analyse -l 5 src

         # … next test

    steps:
      - name: "Checkout"
        uses: actions/checkout@v4

      - name: "Install PHP"
        uses: "shivammathur/setup-php@v2"
        with:
          coverage: "none"
          php-version: "${{ matrix.php-version }}"

      - name: "Test"
        run: ${{ matrix.script }}

Using such parameters one could easily:

use a different operating system per test (nowadays bash even works on windows)
use different PHP versions per test
use different PHP extensions per test
…

Utilize `bashunit` in end-to-end tests

I recently stumbled over a end-to-end test use-case, in which I needed to assert certain error-message within the output of the PHPStan command.

My initial take on the reproducer, which got refined after great review feedback from Ondřej:

cd e2e/trait-caching
../../bin/phpstan analyze --no-progress --level 8 --error-format raw data/
patch -b data/TraitOne.php < TraitOne.patch
OUTPUT=$(../../bin/phpstan analyze --no-progress --level 8 --error-format raw data/ || true)
echo "$OUTPUT"
[ $(echo "$OUTPUT" | wc -l) -eq 1 ]
grep 'Method TraitsCachingIssue\\TestClassUsingTrait::doBar() should return stdClass but returns Exception.' <<< "$OUTPUT"

This particular test - while working correctly - had a few problems, which make it hard to read, especially for people not used to bash.

what is [ $(echo "$OUTPUT" | wc -l) -eq 1 ] doing?
PHPStan error messages contain all kind of characters, and some of them need special escaping in bash - e.g. doubling the \.
the grep command using input redirection with <<< "$OUTPUT", which handles multi-line strings looks strange for the untrained eye.
making bash scripts work across macOS, linux and windows sometimes requires ugly hacks

In the next iteration to improve the test, I added a small assert.sh wrapper script around bashunit, which allowed us to call the bashunit-assertion functions from the cli:

cd e2e/trait-caching
../../bin/phpstan analyze --no-progress --level 8 --error-format raw data/
patch -b data/TraitOne.php < TraitOne.patch
OUTPUT=$(../../bin/phpstan analyze --no-progress --level 8 --error-format raw data/ || true)
echo "$OUTPUT"
../assert.sh equals `echo "$OUTPUT" | wc -l` 1
../assert.sh contains 'Method TraitsCachingIssue\TestClassUsingTrait::doBar() should return stdClass but returns Exception.' "$OUTPUT"

Note the easily readable assertions without the need to escape certain characters.

At this point we got in contact with the bashunit maintainers, which immediately helped us with a few problems in the initial setup. They also liked the assert.sh script so much, that they integrated the feature natively into bashunit as of version 0.13 (Release Post).

So the final test-case in the end looks like:

cd e2e/trait-caching
../../bin/phpstan analyze --no-progress --level 8 --error-format raw data/
patch -b data/TraitOne.php < TraitOne.patch
OUTPUT=$(../../bin/phpstan analyze --no-progress --level 8 --error-format raw data/ || true)
echo "$OUTPUT"
../bashunit -a line_count 1 "$OUTPUT"
../bashunit -a contains 'Method TraitsCachingIssue\TestClassUsingTrait::doBar() should return stdClass but returns Exception.' "$OUTPUT"

Using bashunit the tests get pretty easy to read and also remove the need for most operating system specific workarounds.

Support my open source work

In case this article was useful, or you want to honor the effort I put into one of the hundreds of pull-requests to PHPStan, please considering sponsoring my open-source efforts 💕.

Published: Open source contributions statistics generator

2024-01-10T00:00:00+00:00

In a recent blog post, we had a look back at 2023 and my personal highlights of these open source contributions.

To back up the message of the post, I used some contribution statistics similar to the ones shown below (excerpt):

|----------------------------------------------|-----------------------|--------------------|
| project                                      | merged pull requests  | addressed issues   |
|----------------------------------------------|-----------------------|--------------------|
| phpstan/phpstan*                             | ~116   (~188 in 2022) | 33    (83 in 2022) |
| rector/rector*                               | ~178                  | 13                 |
| FriendsOfREDAXO/rexstan                      | 88                    | 24                 |
| FriendsOfREDAXO/rexfactor                    | 55                    | 6                  |
| staabm/phpstandba                            | 44  (~300 in 2022)    | 8                  |
| redaxo/redaxo                                | 27   (70 in 2022)     | 4                  |
| TomasVotruba/unused-public                   | 25                    | 1                  |
…

These numbers were crunched with a small tool: staabm/oss-contribs

I just decided to make this tool available for anyone, so you can generate your own statistics.

simple contributions statistics generator

generates a list of merged pull requests in public repositories
generates a list of issues, these pull requests addressed
generates a count of user reactions on these pull requests and issues
takes referenced issues into account event for PRs not targeted to the default branch

the result is grouped by repository.

Find all the details in the tools repository README.

enjoy.

Give back

In case you find my PHPStan contributions and/or this tool useful, please consider supporting my open source work.

PHPStan tailored to your needs

2024-01-01T00:00:00+00:00

Do you need help with PHPStan in some form? You can get me - one of the top contributors to PHPStan or related static analysis tooling - to support your team or project(s).

I have plenty of experience in contributing changes to PHPStan core, or implementing custom extensions.

As of now, I am available for hire to make the tooling fit your needs.

Fixing blockers

You are blocked by a reported issue in PHPStan or related tooling? Your projects would benefit from getting certain features implemented in PHPStan?

I can fix bugs or implement features that are blocking you to get the most out of PHPStan.

Investigate performance issues

PHPStan/Rector is running slow in your project? You need help to get a faster feedback loop?

Let me analyse your case at hand and investigate possible solutions. I love analysing php based tool performance problems.

Specific needs / tailored integration

I can help you build custom extensions and/or rules to seamlessly integrate PHPStan into your framework, libraries, and/or development workflow.

Reduce risk in your projects

PHPStan is critical for your business? Consider supporting my open source work with your sponsoring to reduce the PHPStan projects busfactor.

Get in touch

please reach me via E-Mail or contact me on Twitter or Mastodon for paid support.

Published: phpstan-todo-by

2023-12-17T00:00:00+00:00

Inspired by parker-codes/todo-by I recently created phpstan-todo-by - a new PHPStan extension to check for TODO comments with expiration.

The announcement tweet / toot got a lot of attention and I received a lot of positive feedback.

The project already got 50 stars within the first week after announcement.

Examples

The main idea is, that comments within the source code will be turned into PHPStan errors when a condition is satisfied, e.g. a date reached, a version met.



Supported todo formats

A todo comment can also consist of just a constraint without any text, like // @todo 2023-12-14.
When a text is given after the date, this text will be picked up for the PHPStan error message.


  the todo, TODO, tOdO keyword is case-insensitive
  the todo keyword can be suffixed or prefixed by a @ character
  a username might be included after the todo@
  the comment might be mixed with : or - characters
  multi line /* */ and /** */ comments are supported


The comment can expire by different constraints, examples are:

  by date with format of YYYY-MM-DD matched against the reference-time
  by a semantic version constraint matched against the projects reference-version
  by a semantic version constraint matched against a Composer dependency (via composer.lock)
  by ticket reference, matched against the status of a ticket (e.g. in JIRA, GitHub issues, YouTrack)


Find more details and configuration options in the projects README.

Give back

In case you find my PHPStan contributions and/or this tool useful, please consider supporting my open source work.



Contribution Summary 2023
2023-12-07T00:00:00+00:00
The year 2023 comes to an end and I want to have a look back at my open source contributions.

To be honest: The main motivation for this post is getting awareness for all the open source work happening in my free time.
I am spending 20-40 hours per month and would love 💕 to even reduce hours on my primary job to support the open source community even more.

This will only be possible when more people support my open source work by becoming a sponsor.

Intro

At first, lets have a look back at 2022: I was able create 967 pull requests, of which 831 got merged.
In comparison, at the time of writing I created ~900 pull requests to 70 open-source repositories in 2023, of which 753 got merged.

As you can see the numbers in 2022, are a bit lower than in 2023. I think this is due to the fact that last year the focus was on working through low-hanging fruits in PHPStan and Rector.
With the experience and knowledge gained while working on these projects, I was able to contribute more advanced features and fixes this year.

The following table shows the distribution of contributions across the different projects I am working on.


  
    
      project
      merged pull requests
      addressed issues
    
  
  
    
      phpstan/phpstan*
      ~116   (~188 in 2022)
      33    (83 in 2022)
    
    
      rector/rector*
      ~178
      13
    
    
      FriendsOfREDAXO/rexstan
      88
      24
    
    
      FriendsOfREDAXO/rexfactor
      55
      6
    
    
      staabm/phpstandba
      44  (~300 in 2022)
      8
    
    
      staabm/phpstan-todo-by
      33  (~300 in 2022)
      7
    
    
      redaxo/redaxo
      27   (70 in 2022)
      5
    
    
      TomasVotruba/unused-public
      28
      1
    
    
      staabm/phpstan-baseline-analysis
      22
       
    
    
      OskarStark/doctor-rst
      12
      -
    
    
      easy-coding-standard/easy-coding-standard
      9
      1
    
    
      staabm/annotate-pull-request-from-checkstyle
      8
      -
    
    
      PHP-CS-Fixer/PHP-CS-Fixer
      4
      -
    
    
      Roave/BetterReflection
      4
      -
    
    
      symfony/symfony
      3
      -
    
    
      qossmic/deptrac
      3
      -
    
    
      TomasVotruba/bladestan
      3
      -
    
    
      composer/composer
      2   (7 in 2022)
      -
    
    
      sebastianbergmann/diff
      2
      -
    
    
      TomasVotruba/type-coverage
      2
      -
    
    
      vimeo/psalm
      1 (4 in 2022)
      -
    
    
      mautic/mautic
      1
      -
    
    
      TomasVotruba/cognitive-complexity
      1
      -
    
    
      matomo-org/matomo
      1
      -
    
    
      nette/utils
      1
      -
    
    
      nikic/PHP-Parser
      1
      -
    
    
      briannesbitt/Carbon
      1
      -
    
    
      doctrine/orm
      1
      -
    
    
      … a lot more
      -
      -
    
  


numbers crunched with staabm/oss-contribs

Additionally, to sourcecode contributions I also took the to time to blog about my work.
In these 8 posts, I try to explain what I did, how problems have been approached and what I have learned along the way.
That way I hope to inspire others to contribute to open source as well and share their journey.

If you don’t want to miss my articles, consider subscribing to my RSS feed, follow me on Twitter or mastodon.

Highlights 2023

Lets have a closer look at my personal highlights of 2023.

PHPStan Highlight: Improved developer experience for the result cache

The PHPStan result cache is a key piece for a fast feedback loop. Why, how it works and how to debug problems with it was described in this blog post.
I have dumbed everything I know about it into this article.

Highlight: rexstan & rexfactor

In june 2022 the first version of rexstan, a PHPStan backed REDAXO CMS Addon was released.
Its open source from day 1 and supports developers working with REDAXO every day.

Since then I was able to publish 147 releases - what a ride.

Similar to rexstan, rexfactor is a new REDAXO CMS Addon. It’s backed by Rector and helps developers to migrate their codebase to newer REDAXO versions.
Its open source from day 1 and was first released in March 2023.

The Addon allows using Rector with a simple web UI. Pick your rule/rule-set, define the target source code and get a nice preview of the changes.
Push the “Apply” button and the changes are applied to your codebase.

Highlight Podcast: “Könnte kaputt sein – Statische Code-Analyse mit Markus Staab”

Got interviewed by the Super Duper Developers Club about my open source work (German).



Rector Highlight: “Implement a max jobs per worker budget”

Running Rector on huge projects in a single run was not possible in the past. After implementing process and memory managment this is a fixed problem.
Even huge projects like the Mautic codebase can be refactored with Rector now without out-of-memory issues.



Highlight: phpstan-dba

phpstan-dba is one of my PHPStan extensions which got a bit of traction in 2023.
It’s a PHPStan based SQL static analysis and type inference for the database access layer.

I was even keen enough to talk about it at the PHPUGFFM usergroup and the unKonf Barcamp.
See the slides of said talk if you are curious.

Highlight: Performance improvements

As a regular reader of my blog you already know, that I have spent a few months across different well known projects to improve their performance.
This includes PHPStan, PHPUnit, Symfony, Rector and more. All the details can be found in separate posts of my performance series.

Highlight: “Crafting a more performant Open Source landscape with Blackfire”

A summary of my performance work and my vita was published on the blackfire.io Blog.

PHPStan Highlight: Support for array shape covariance

One of the craziest contributions this year. After days of in-depth analysis finally a one line fix resulted in fixing 5 bugs.



PHPStan Highlight: “Fix !isset() with Variable”

As highlighted in various tweets I was working on falsey-context type inference improvements in PHPStan.
This was my most time-consuming and most rewarding contribution this year.
It took me several tries to finally get it into a mergable state - this very first iteration closed 7 bugs, the oldest of them dating back to July 2020.

The main problem this contribution solves is, that PHPStan gets aware when/if variables are defined after a !isset($variable) check.
To get this right, one needs to check whether the involved variables can get null and whether they are defined in the current scope.
Most interesting is the case where we figured out that a variable which can never be null, also means that it can never be defined in the falsey-context.



Getting this right additionally means that PHPStan gets smarter for the !empty($variable)-case and the null coallescing operator ??.

I have plans to work on !isset($array['offset']) and !isset($object->property) improvements in 2024.

2024 here we come

I wish you all the best for the upcoming year. I am looking forward to continue my open source work and I hope you will support me in doing so.

If one of those open source projects is critical for your business, please consider supporting my work with your sponsoring 💕


Phpstan Filter Baseline
2023-10-30T00:00:00+00:00
Having a week off from my primary job, means more time for my opensource projects :-).

In this post I will describe one way to work thru the sometimes huge PHPStan baseline.

Motivation

Not everyone has the luxury to use static analysis from the very start of a project.

When adding PHPStan to a existing project, you usually need to work thru the levels for an initial cleanup.
Oftentimes the initial budget to setup static analysis is not big enough to level up to a point you are happy with.

When running out of budget, I usually try to find a PHPStan config/rule-set,
which makes sure newly implemented code has a pretty high quality barrier.
At the same time this means I need to baseline a lot of errors, because pre-existing code likely does not match these criteria.

Now we need to somehow figure out a way, how and when you want to work thru the remaining errors in the daily job.
The bigger the baseline is, the more important is a good strategy, on which errors you want to work on first.

Lets go

At first setup phpstan-baseline-analysis to keep track of the current state of the project.
Using this tool we can analyze the project and get an overview of the current error distribution.
In our projects we generate these numbers in a scheduled GitHub action and create trend reports for the dev-team.

Additionally, you may create graphs of the progress to have a visual representation.
It can be a good foundation for a conversation with management people, to give an idea where we are and where we are heading.

Tackle the problem / filter the baseline

Depending on your dev-team focus you might want to work on different PHPStan errors.

Starting with phpstan-baseline-analysis 0.12.4 you can filter the baseline by error classes.
This means we can quickly focus on a certain area of errors.

One common problem in legacy projects is related to invalid PHPDocs.
PHPStan might already be aware of said problems, but since you didn’t have the time yet to work on them, these errors are buried in your baseline.

Using the new filtering capabilities you can filter out these problems from your already existing baseline:

$ echo "$( phpstan-baseline-filter phpstan-baseline.neon --exclude=Invalid-Phpdocs )" > phpstan-baseline.neon


This means, we take the projects baseline run it thru the phpstan-baseline-filter and we keep all errors except those matching the --exclude filter.

Now you can trigger your regular phpstan analyze command which no longer ignores the filtered errors.
That way you can work on the problems as you are used to based on PHPStan result list.

You can use multiple filter keys at once, by separating the keys by comma (,) .

Alternatively to --exclude you can also use --include to filter the baseline, which only outputs the errors matching the filter-key.
This might be useful if you want to further process the filtered error list in a separate tool.

$ phpstan-baseline-filter phpstan-baseline.neon --include=Deprecations,Unknown-Types,Anonymous-Variables > result.neon


Filter keys

If you are curious just invoke the tools help command, to get an idea which filter keys are supported.
At the time of writing it looks like:

$ phpstan-baseline-filter help

USAGE: phpstan-baseline-filter  [--exclude=,...] [--include=,...]

valid FILTER-KEYs: Classes-Cognitive-Complexity, Deprecations, Invalid-Phpdocs, Unknown-Types, Anonymous-Variables, Unused-Symbols


Give back

In case you find my PHPStan contributions and/or this content useful, please consider supporting my open source work.


Phpstan Result Cache Gotchas
2023-10-21T00:00:00+00:00
As part of the performance post series we had a look into a lot of profiling and in detail code optimizations.

In this post we will have a top level look on PHPStan performance from a enduser perspective.

Goal

While we are working hard on squeezing out every bit of performance out of PHPStan,
you as an end user should foremost make sure that PHPStan can benefit from its result cache as often as it can.

In the projects I am working on, we usually see PHPStan analysis times dropping from 5-10 minutes to 10-30 seconds
when everything is going according to plan and the tool can do its job utilizing the result cache.

But what could possibly go wrong?
In this post I will write down what I learned from setting up PHPStan in a lot of different projects and environments.

Lets go

You don’t need to enable result cache explicitly, as it’s enabled by default.
PHPStan tries to be as smart as possible about invalidating the cache when required.

How it works

To find out when/whether PHPStan is using the result cache, you can use the -vvv flags.


  Running it on a project for the very first time will always result in a full analysis:


$ phpstan -vvv
Result cache not used because the cache file does not exist.
 1562/1562 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100% 20 secs/20 secs

Result cache is saved.


 [OK] No errors


Used memory: 2.13 GB


-> note the initial message, telling you about result cache usage.

-> note the analysis in this project is taking 20 seconds and 2.13 GB of memory.


  On a subsequent run, PHPStan will use the result cache:


$ phpstan -vvv
Note: Using configuration file /Users/staabm/workspace/phpstan-src/phpstan.neon.dist.
 1562/1562 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100% < 1 sec/< 1 sec

Result cache is saved.


 [OK] No errors


Used memory: 133.88 MB


-> the analysis process finished in under 1 second in comparison to 20 seconds before.

-> it took 134 MB of memory in comparison to 2.13 GB before.


  In case you e.g. modify dependencies via composer, PHPStan invalidates the cache and triggers a full analysis scan:


$ phpstan -vvv
Note: Using configuration file /Users/staabm/workspace/phpstan-src/phpstan.neon.dist.
Result cache not used because the metadata do not match: projectConfig, composerLocks
1562/1562 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100% 19 secs/19 secs

Result cache is saved.


[OK] No errors


Used memory: 2.14 GB


-> you can see PHPStan realized the composerLocks are different, which made it invalidate the cache.
Starting with PHPStan 1.10.36 we print the reason why invalidation happened.

-> There can be different reasons why the cache is invalidated or not used at all. Find all the details in the ResultCacheManager class.


  If you want to invalidate the cache manually, you can use the clear-result-cache command. This will also reveal the location of the result cache files:


$ phpstan clear-result-cache -vvv
Note: Using configuration file /Users/staabm/workspace/phpstan-src/phpstan.neon.dist.
Result cache cleared from directory:
/Users/staabm/workspace/phpstan-src/tmp



  When running PHPStan with the --debug option, it will not use the result cache:


$ phpstan --debug -vvv
Note: Using configuration file /Users/staabm/workspace/phpstan-src/phpstan.neon.dist.
Result cache not used because of debug mode.
...



  Regeneration of the baseline with a warmed result cache should finish instantly starting with PHPStan 1.10.34:


$ phpstan -vvv --generate-baseline
Note: Using configuration file /Users/staabm/workspace/phpstan-src/phpstan.neon.dist.
 1562/1562 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100% < 1 sec/< 1 sec

Result cache is saved.


 [OK] Baseline generated with 645 errors.


Used memory: 147.88 MB


Debugging the inner workings

Ondřej Pro-Tip: If you need to know in detail, why PHPStan decided to not use the result cache you can diff the result-cache file before and after the run.
That can be especially helpful in CI environments, when debugging the problem at hand is pretty hard.

Result cache on the developer machine

Dedicated resultCachePath

PHPStan by default uses a singe result cache file for all projects on your machine.
This means when you work and switch between multiple projects the very first run after the project-switch will need a full analysis scan.

To get a more efficient experience when switching between projects, you may consider using a different resultCachePath file-name in every projects configuration file.

parameters:
    resultCachePath: %tmpDir%/resultCache-project-X.php


Result cache in CI

Dedicated resultCachePath

In case your CI server does not run projects in a isolated filesystem, you should use a dedicated resultCachePath

GitHub Actions

When using GitHub Actions you should consider using a cache action to persist the result cache between runs.

  - name: "Cache result cache"
    uses: actions/cache@v3
    with:
      path: ./tmp
      key: "result-cache-v1-${{ matrix.php-version }}-${{ github.run_id }}"
      restore-keys: |
        result-cache-v1-${{ matrix.php-version }}-



  By default the cache is written within ./tmp on linux based systems
  Using ${{ github.run_id }} you can make sure to re-use the most recent result cache
  Use a separate result cache per php version, e.g. using ${{ matrix.php-version }}
  Use the push GitHub Actions event on the default-branch, to make sure newly created PRs will utilize a fresh cache from the default-branch.


In case you are working with long running branches you may consider using separate actions/cache/restore@v3 and actions/cache/save@v3 steps instead, to make sure the result cache is also persisted on failling jobs:

  - name: "Restore result cache"
    uses: actions/cache/restore@v3
    with:
      path: ./tmp
      key: "result-cache-v1-${{ matrix.php-version }}-${{ github.run_id }}"
      restore-keys: |
        result-cache-v1-${{ matrix.php-version }}-

  # … run phpstan

  - name: "Save result cache"
    uses: actions/cache/save@v3
    if: always()
    with:
      path: ./tmp
      key: "result-cache-v1-${{ matrix.php-version }}-${{ github.run_id }}"


Update: The above tip regarding GitHub Actions cache handling works also for other tools, like e.g. RectorPHP.

Give back

In case you find my PHPStan contributions and/or this content useful, please consider supporting my open source work.


Rector In Legacy Projects
2023-07-23T00:00:00+00:00
After collecting some experience with introducing Rector to legacy projects,
I want to write down what I have learned along the way.

Goal

The article describes how to utilize Rector to maximize type coverage of a legacy project.
The more types are defined in the codebase the better the results of your IDE or static analysis tools will be.

This is usually the first thing you should do, before applying more advanced rector code transformations.
Rector can be used in a similar way to apply other Rules or Rulesets.

Additionally more type coverage is a great first step after a PHPStan/Psalm setup, to make sure static analysis can find relevant bugs efficiently.
Otherwise adding types to a old codebase can take a lot of time. Doing it manually is also prone to errors.

Overall plan

These are the top level steps I try to follow:


  Setup
    
      make sure you have PHPStan configured for your project, at least at level 5.
      add TomasVotruba/type-coverage to create type coverage information
      create a PHPStan baseline with all existing errors
      analyze your baseline, to get an idea of the overall type coverage of the project
    
  
  Preparation
    
      Fix all “Implicit array creation is not allowed - variable … does not exist” PHPStan errors
      Fix all “Variable … might not be defined” PHPStan errors
    
  
  Adding Types - order is important!
    
      Add return types
      Add property types
      Add parameter types
    
  
  re-generate and re-analyze your baseline to see the improvements / you might create a PHPStan baseline trend report


Analysing the baseline is technically not required. Crunching the numbers can help keep a dev team motivated or these can be used to convince managment people about your current state and potential goals.

Setup

The preparation steps and the linked articles in the “overall plan”-chapter should contain all you need.

Preparation

Fixing the mentioned PHPStan errors to make sure Rector can trust your variables.

Adding Types with Rector

Start with Rector as described in the introduction.
Make sure you have all relevant source paths configured and the setup works as expected.

We will run Rector in the command line on your workstation.
Later on you may configure Rector as part of your CI pipeline, but that’s a topic for another article.

Working with Rector usually means you start by adding one Rector rule at a time.
Let the tool do its magic and review the generated changes. Make sure you feel confident with them.
If you get overwhelmed by the amount of changes,
revert the working state and run your current Rector rule only against a few paths instead of the whole project.

Repeat using smaller steps as long as you feel the result is not reviewable.
How often you need to divide the steps into smaller ones depends on the rule being applied and your codebase.

Between these steps you should commit the intermediate states. This also eases seeing the actual differences between the steps.

NOTE:
Especially in legacy projects its important to make sure rector is not relying on PHPDoc types. This is what *Strict* rector rules are for. If you apply non-Strict rector rules, take special care your PHPDoc is precise.

Add return types

It’s important to add return types first, as it’s the least risky change and should be backwards compatible most of the time.


  If your codebase is pretty large, you may start with final classes first.
  As long as you don’t add new return types to methods which gets overridden in a subclass, you should be fine.
  Give classes some extra attention which somehow integrate with libraries you use, like e.g. Doctrine-Collections.
  If classes implement magic methods (e.g. __get), review related changes properly.


If rector changes things you don’t like, you may ignore source files for single rules or even skip the source file completly.
You can re-visit the skipped cases later again. You may feel more confident after the codebase got enriched with types and PHPStan can better understand the code in question.

I had the most success using the ReturnTypeFromStrict* Rector rules first.
Do so one rule at a time, like described above.

Add property types

In the next step in my experience it’s best to add property types.

Start with private properties and later move on to protected ones of final classes.
If you are not sure about nullability, keep using nullable types for now.

Last add types to protected properties of non-final classes and public properties.

Keep in mind that adding types to public/protected properties to classes which use inheritance can be BC break.

  https://3v4l.org/IonFf
  https://3v4l.org/kTQ7q


I had the most success using the PropertyTypeFromStrict* Rector rules first.
After that try the TypedPropertyFrom* rules.

Add parameter types

Last but not least add parameter types. Be careful, as adding parameter usually breaks backwards compatibility.
That’s especially important in case you work on library code, as it might force you to create a new major version.

I had the most success using the *ParamType* Rector rules.

Give back

In case you find this content useful, please consider supporting my open source work.

project	merged pull requests	addressed issues
phpstan/phpstan*	~116 (~188 in 2022)	33 (83 in 2022)
rector/rector*	~178	13
FriendsOfREDAXO/rexstan	88	24
FriendsOfREDAXO/rexfactor	55	6
staabm/phpstandba	44 (~300 in 2022)	8
staabm/phpstan-todo-by	33 (~300 in 2022)	7
redaxo/redaxo	27 (70 in 2022)	5
TomasVotruba/unused-public	28	1
staabm/phpstan-baseline-analysis	22
OskarStark/doctor-rst	12	-
easy-coding-standard/easy-coding-standard	9	1
staabm/annotate-pull-request-from-checkstyle	8	-
PHP-CS-Fixer/PHP-CS-Fixer	4	-
Roave/BetterReflection	4	-
symfony/symfony	3	-
qossmic/deptrac	3	-
TomasVotruba/bladestan	3	-
composer/composer	2 (7 in 2022)	-
sebastianbergmann/diff	2	-
TomasVotruba/type-coverage	2	-
vimeo/psalm	1 (4 in 2022)	-
mautic/mautic	1	-
TomasVotruba/cognitive-complexity	1	-
matomo-org/matomo	1	-
nette/utils	1	-
nikic/PHP-Parser	1	-
briannesbitt/Carbon	1	-
doctrine/orm	1	-
… a lot more	-	-

My developer experience

Array Shapes For Preg Match Matches

The journey to precise array-shapes for preg_match $matches

Which capturing groups are contained in the used pattern?

Which capturing groups are optional/conditional? How do the capturing groups relate to the array-shape of $matches?

How can the $flags parameter influence the array-shape of $matches?

How do the resulting array-shapes flow through the branches of the if-else construct?

How to implement this type-inference improving mechanism in a way, that other preg_match wrapping libraries could benefit from it?

Future work

Support my open source work

Readable end-to-end tests for PHPStan with bashunit

What’s a end-to-end test?

GitHub Action based “data-provider”

Utilize bashunit in end-to-end tests

Support my open source work

Sponsored PHPStan feature: require-extends and require-implements phpDoc

@phpstan-require-extends trait-example

@phpstan-require-extends interface-example

@phpstan-require-implements trait-example

psalm compatibility

read more

Future scope: generics support

Published: Open source contributions statistics generator

simple contributions statistics generator

Give back

PHPStan tailored to your needs

Fixing blockers

Investigate performance issues

Specific needs / tailored integration

Reduce risk in your projects

Get in touch

Published: phpstan-todo-by

Examples

Supported todo formats

Give back

Contribution Summary 2023

Intro

Highlights 2023

PHPStan Highlight: Improved developer experience for the result cache

Highlight: rexstan & rexfactor

Highlight Podcast: “Könnte kaputt sein – Statische Code-Analyse mit Markus Staab”

Rector Highlight: “Implement a max jobs per worker budget”

Highlight: phpstan-dba

Highlight: Performance improvements

Highlight: “Crafting a more performant Open Source landscape with Blackfire”

PHPStan Highlight: Support for array shape covariance

PHPStan Highlight: “Fix !isset() with Variable”

2024 here we come

Phpstan Filter Baseline

Motivation

Lets go

Tackle the problem / filter the baseline

Filter keys

Give back

Phpstan Result Cache Gotchas

Goal

Lets go

How it works

Debugging the inner workings

Result cache on the developer machine

Dedicated resultCachePath

Result cache in CI

Dedicated resultCachePath

GitHub Actions

Give back

Rector In Legacy Projects

Goal

Overall plan

Setup

Preparation

Adding Types with Rector

Add return types

Add property types

Add parameter types

Give back

The journey to precise array-shapes for `preg_match` $matches

Which capturing groups are optional/conditional? How do the capturing groups relate to the array-shape of `$matches`?

How can the `$flags` parameter influence the array-shape of `$matches`?

How to implement this type-inference improving mechanism in a way, that other `preg_match` wrapping libraries could benefit from it?

Utilize `bashunit` in end-to-end tests

`@phpstan-require-extends` trait-example

`@phpstan-require-extends` interface-example

`@phpstan-require-implements` trait-example

Dedicated `resultCachePath`

Dedicated `resultCachePath`