PHP: Symfony Demo meets Modular, Microservice-ready Architecture - Part 2
TL;DR
I've created a Symfony 6 based Application that can serve as an Enterprise Architecture reference for anyone who's planning to build Large Scale Applications with Symfony. It uses a similar 'Blog' theme as the official Symfony Demo and can be found here.
In the previous part of the Article I've taken on some theoretical foundations that led me to create this project in a form it exists.
In this part I'm going over the actual code and Module-level Architectural decisions.
Introduction
OK, after long and possibly tedious theory in the first part, let's take a deep dive into the code. The second part of the Article is designed so that you can browse the code alongside reading it. Each section describes a real part of the Demo App code. I'm not going over the actual implementation of the Blog (which is simple enough for everyone interested to understand), focusing instead on the generic purpose of the code from the architectural standpoint.
For this part not to turn into another theoretical rant, it would be best for you to browse the code as you read through its sections.
The Infrastructure
Every Application has some shared, feature-unrelated code. It deals with some connectivity and integrations, and provides common utilities. It has no business context and, if needed, can be easily extracted into a separate, shared library. All that code resides in the Infrastructure package. I won't describe every little util class that's in there, but I'd like to briefly talk about two very important Infrastructure-level design choices.
Transactions
Coming from Java's Spring and Hibernate, I've found Doctrine's Transaction support a little too simple. By default all logic in a Request / CLI Command runs in a Transaction that's either Committed or Rolled Back at the end. For many cases this approach is good, because it frees the Developers from having to think about Transaction boundaries.
It, however, comes with some inherent implications:
- you are unable to perform any action once you're absolutely certain a Transaction is Committed/Rolled Back.
- Entity Manager lives throughout the entire life of a Request, which may be inefficient from the server resources consumption point of view.
- Integration Tests that span over multiple Requests may produce non-realistic results due to a single Transaction being re-used for multiple Requests
- Entities loaded into the Entity Manager during handling of Request N will still be available during the handling of the Request N+1.
Doctrine doesn't leave us completely dead in the water - it provides a wrapInTransaction method in the Entity Manager. It's simple and it works, but I wanted a nice, easy and readable way of registering a post-Transaction hooks. I've created a Transaction abstraction layer to register afterCommit and afterRollback handlers. Why did I need them exactly? For now I will tell you only that they are very important when dealing with Events.
There also was a matter of what to do if I wanted to perform multiple Transactions on a single run (for example from a CLI). The default behavior is that once a Transaction fails, Entity Manager is closed and then all the subsequent ones fail without even trying (due to Entity Manager Closed Exception). That's not what I wanted, so I've added an Entity-Manager-per-Transaction pattern, where each Transaction gets its own, fresh EM, that's being cleared and closed upon finishing its work. I think it's a nice, clean and explicit pattern that helps to understand what's going on in the code and why.
Events
One of the Symfony's killer features is the Messenger. A robust Event/Queue abstraction with loads of available transports. I could've just use it directly, but I wanted to have my own little abstraction layer above it for the following reasons:
- There is some common logic related to logging and error handling that I wanted to just put in one place without having to resort to AoP magic
- The application's behavior vastly differs depending on whether we use a synchronous or asynchronous transport. One of the key assumptions of my Architecture was that there should be a way to quickly extract a Module without any consequences to the way the Application functions.
- My thin Events abstraction ensures, that all events are handled in the same way, regardless of the underlying transport type.
I also wanted a neat way to register my Event Subscribers, without having to use any of the Messenger abstractions. Call me a freak, I just like to hide my implementation.
The Modules
Now, that we have the Infrastructure out of the way, we can take a look at the actual Business Logic. As I've stated in the previous part of the Article, there are 4 Modules:
- Posts
- Comments
- Tags
- Security
I believe that having things sorted, labeled and predictable is a big part of a good Software Architecture. That's why the internal layout of every Module is exactly the same. Every single one contains the same, naturally sorted (highest -> lowest) set of Layers. The naming conventions used in the higher Layers are abstract enough to generally not tie to a specific technology / implementation.
This may seem like a detail, but it's very important for the successful life of an Application in the log run:
- Developers that worked on one Module will automatically know they way around all the other ones and will be able to quickly jump on-board without tedious discovery process.
- New members of the team will be quickly on-boarded to the project.
Even though different Modules provide different business values and contain vastly different Logic, the basic Building Blocks are the same. OK, so about the actual Layers:
API
The highest Abstraction Layer describes the way one can communicate with the Module. Generally, every possible interaction in every Application on earth can be categorized as one of only three types:
- Command
- Used to request an Action that will result with Data Modification within the Application
- May optionally return a Result of the Processing
- Syntax-wise always formulated as: DoSomethingCommand
- Query
- Used to request and Action that will retrieve Data from the Application
- Processing of a Query can never lead to any Data Modification within the Application
- Always returns a Result
- Syntax-wise always formulated as: Get/FindSomething(ByCriteria)Query
- Event
- used to notify either external Systems or the Application itself that something has happened within the Application
- should be published only if the changes related to the Event have been committed to the Data Store(s) and are final
- That's why I needed the afterCommit Transaction Hooks.
- Syntax-wise always formulated as a past-tense name: SomethingHappenedEvent
Because the Application can both send and receive Events, I've created a further division between Inbound Events and Outbound Events. The API Layer contains all the Inbound Events - ones that are coming into the Module from the Event Bus.
The API Layer is a great place to find out what features are implemented by the Module. Apart from the above API Objects, it contains also:
- REST/Web Controllers
- Standard Symfony HTTP Controllers that server as a REST API of the Module.
- Can contain Security Logic
- Module API Interface
- A single entry point to all of the features implemented by the Module.
- Implemented by the Domain Module, that contains the actual Domain Logic.
- Can take only Commands, Queries and Events as its arguments. Optionally can contain argument-less methods.
Interaction with the Module should always be done on the API Level. That includes normal, production usage and Tests (more on that later).
Domain
The middle Layer, - contains the actual Business Logic that implements Module API Interface. Generally, consists of:
- Dto
- Domain Objects that act as intermediary Data holders between API and Persistence Layers. Depending on the actual case can be simple copies of the API Commands/Queries/Events, but can also contain additional, enriched Data. Constructed after the API Objects have been validated for correctness.
- Event/Outbound
- Outbound Events that the Module produces as a result of its processing.
- Logic
- Traits and Classes that contain the actual Business Logic. Should follow SOLID/DRY/other programming principles.
- Repository
- Interfaces of all Persistence Repositories that are used by the Business Logic.
- Transactions
- Interface of the Transaction Factory that should be used in the given Module
Domain can optionally contain some other things, like Providers, external Services etc.
Persistence
The lowest Layer - responsible for implementation of all Domain Repositories. Generally, different technologies should reside in separate sub-directories. In the concrete case of the Demo, all persistence logic is implemented using Doctrine.
Technology-specific implementation should never leak into the higher Layers. In Doctrine's case, usage of Entities is permitted only in the Persistence Layer.
The Tests
There are three types of automated Tests in the Application. Each type serves its purpose and their ratio should follow the Tests Pyramid.
Unit Tests
This will be (for the majority of readers) the most controversial part, so It'll take some explaining. Typically a Unit Test should verify the logic of a very small piece of code - usually a Class or a Method. This is a well established pattern for most of the developers. Unfortunately this approach doesn't take into consideration the root causes for writing automated tests:
1. Proving that the application has been implemented according to business requirements.
The application is never written in a vacuum. There is always some business context and there is (or at least should be) a person defining the requirements. The tests confirm that the requirements have been met and the application behaves correctly in different scenarios, especially the edge ones.
2. Protection against regressions due to continuous development or bug fixes.
The tests give us a sense of security and courage in making code changes. We are sure that if we make any accidental, unintended changes, our tests will capture this and will not allow the introduction of a defective product into production.
Does the mere possession of automated tests and a high level of code coverage guarantee us that we can feel safe? Absolutely not! What’s more, if most of our tests are class-level unit tests, they are unlikely to sufficiently protect us against regression. Why?
The code should reflect the business requirements. Is there any concept of a 'class' (in a purely programming sense) in any of the businesses you know of? Probably not. Class is something foreign to business. It is our internal organizational unit that allows you to divide the problem into smaller parts. It gives us encapsulation, abstraction and improves the re-usability of the code. However, it has nothing to do with business requirements. It is an implementation detail. And implementation, as we know, can change. Business requirements therefore naturally have a scope greater than individual classes.
In addition, we can define the following dynamic between implementation and business requirements:
Changing business requirements entails a change in implementation.
But
A change in implementation does not necessarily have to be caused by a change in business requirements.
It can be the result of refactoring, changing approaches, fixing bugs, improving performance, or updating dependencies (e.g. external libraries). So, if our automated tests are at the class level, any implementation changes will require us to change, or even re-implement, the test suite. Tests that are somewhat "glued" to specific classes do not focus on testing business requirements, but on their implementation. We may have a bizarre situation when, despite the lack of a change in business requirements, the desire to re-implement a piece of code will entail the need to re-implement hundreds or even thousands of tests. In such a situation, there is no protection against regression. The code is "immobilized" with tests, and the only thing that tests verify is the current implementation. In order to counteract such situations and to genuinely protect against regression, tests (and in particular asserts – places where we verify the validity of tests) should be changed only if the underlying business requirements are altered. However, in order to achieve this, such tests (yes, even unit tests!) should have adequate scope. Scope greater than a single class. I would argue that the appropriate scope for any unit test is a Module scope.
But wait a minute! There is something called integration testing! Shouldn't the integration tests be responsible for testing integration between classes? In theory, yes. Though in reality, integration tests have one very serious drawback: they are slow. A typical integration test requires us to bring up the spring context, create an in-memory database, interact with many I/O-heavy things like Databases and Message Queues. In an ideal world, if we were not limited by the slowness of integration tests, our test code should consist only of them. Unfortunately, going down this path, although tempting, will become more and more cumbersome as the application (and the test suite) grows in time. In extreme cases, there may be a situation where the application build will take 40 minutes (or even more) and the full test suite will be executed only on the CI server.
Module-level unit tests are the best possible compromise. They behave almost like integration tests, and work at incomparably higher speeds. Why almost like integration tests? The compromise here would be to give up using I/O wherever possible. In this particular case, the actual (even in-memory) database must be replaced by Arrays and/or Collections.
Behavior-Driven Extravaganza
Getting into BDD is no easy accomplishment. Contrary to some beliefs, you don't need a dedicated BDD Testing Framework to do proper BDD. Behat is great an all, but I personally don't like to write Scenarios/Features in Gherkin and find a Test-per-Class to be too bloated approach. In JVM world I've successfully applied BDD with Spock Framework (has a given/when/then/expect labels as a core part of its syntax), but proper g/w/t/e comments in plain old PHPUnit also get the job done.
The crucial part of BDD is whether you are able to start your development on a level/Layer high enough to be able to define business behaviors without getting into the implementation details. My Architecture allows for that and encourages BDD. You can easily start the development of any given Module from the API layer - defining the Module API Interface, all the Commands, Queries and Events with their respective Responses. You don't even have to have any Domain Logic, apart from couple empty Classes/Traits.
The Red/Green/Refactor loop is a natural part of the proposed Architecture:
- Create a failing Spec for a no-op Module API Interface method
- Implement the Method in the Domain Layer (for Unit tests you don't even have to bother with the Persistence for the Spec to pass - you just have to create a simple In-Memory Repository).
- Refactor the code, so that it's pretty and shiny.
Integration Tests
Standard Symfony + PHPUnit based Tests that use the WebTestCase / KernelTestCase base classes to test out the entire HTTP round-trips and the integration with the real Database. They follow only happy-paths and often verify multiple, subsequent interactions at once.
Consumer Driven Contracts
As stated in the first part of the Article, there is no possibility of end-to-end testing of our Application. Every Module acts as a separate, self-contained Service that interacts with other Modules only via Events.
What we can do, is to ensure that the Inbound and Outbound Events are compatible with each other, and that the communication between Modules is correct. For that we need a single source of truth, that will define the Contract between the Modules.
Because single Outbound Event can be consumed by multiple Subscribing Modules via corresponding Inbound Events (each with its own set of expected incoming Data), it makes the most sense to use Consumer-Driven Contracts approach:
- Each Inbound Event has a corresponding JSON Contract that is used to build that Event in Test.
- If the Event is build with the Contract Data without errors - the Test is considered as passed.
- Each Outbound Event is tested against all JSON Contracts for each corresponding Inbound Event.
- If the Event built by the Test matches every JSON Contract (matching is checked for data keys, not values), the Test is considered as passed.
The UI
The Demo Application contains a simple User Interface built with Angular 13 and Angular Material Components Framework. It's there only to see the PHP Code in action without having to play with Postman and/or Curl.
Conclusion
I hope that you see some value in the Architecture I've proposed. I personally think that it's a great fit for large, scalable, enterprise-grade Applications that are planned to be used, extended and maintained for years to come. It ensures that:
- Both Performance and Development Team size can be appropriately and accurately scaled, based on the current needs.
- Splitting the Modules into Micro-services (with the usage of properly configured Transport layer for the Events) won't cause any traditional distributed issues like network traffic congestion or necessity of configuring timeouts/fall-backs.
- New Developers can be quickly on-boarded to the project and reach maximum productivity very fast.
- Developers can switch between Modules without having to re-learn their structure every time.
- Module-level Architecture ensures correct separation of Layers and a level of Testing that, combined, allow for great flexibility and regression resilience.
Where's the place for personal creativity, invention and preferences in all this?
Naming conventions proposed by me are the ones that work for me. They can be, of course, changed to match your own needs, as long as they fulfill the original purpose. The role of an Architect is to standardize and create patterns. That's why Applications that don't have Architects usually look like a patchwork of different ideas and preferences. For me, an Application (no matter the size) should be readable as easily as if it was created by a single person.
As for personal creativity and preferences of the Developers - it can be channeled into creating a smart, performant Business Logic. There's a great freedom in deciding what will be the API of a Module and how will the Domain Logic work. I view this Architecture as a boulder that the Developers can take off their shoulders. They don't have to think about things like:
- 'Where should I put this Class?'
- 'What are the appropriate Layers I should create?'
- 'What is the proper naming convention for that Class?'
I sincerely hope that, even If you won't use my Architecture for your own development (and - let's be realistic - you probably won't :) ), you've found some knowledge that'll help you with your day-to-day work.
Thanks!
I liked the article very much. I am going to apply this approach in my enterprise-scale
ReplyDeleteapplication with Symfony. To be continued?
Sincerely.
Thanks for the article, it was interesting to read and study the code. It is written interestingly, but there are a couple of questions. So far I have seen that simple CRUD commands are implemented without side effect and business logic.
ReplyDeleteAnd how will the application change if such a requirement appears.
The publication of the post is possible only for users with a rating of 100+.
When creating a post, you need to additionally calculate its rating, for example by tags (as an example). The TagRate table with the TagID and rate fields.
After creating a post, you need to send an email to the admin.
Can you make such edits in your project?