Skip to content

[9.x] Database Engine independent Query Expressions #40115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

tpetry
Copy link
Contributor

@tpetry tpetry commented Dec 20, 2021

I was impressed by @olliecodes on twitter about his idea on using expression classes to replace DB::raw() statements. Basically, the idea is to use custom expression classes instead of raw expressions:

use Illuminate\Database\Query\Expression;
use Illuminate\Support\Facades\DB;

DB::table('table')
    ->select(DB::raw('COALESCE(column1, column2) AS column'))
    ->dump(); // select coalesce(column1, column2) as column from "table"

// will be replaced with

class Alias extends Expression
{
    public function __construct(mixed $column, string $alias)
    {
        parent::__construct("{$column} as {$alias}");
    }
}

class Coalesce extends Expression
{
    public function __construct(mixed... $columns)
    {
        $columns = implode(', ', $columns);

        parent::__construct("coalesce({$columns})");
    }
}

DB::table('table')
    ->select(new Alias(new Column(['column1', 'column2']), 'column')))
    ->dump() // select coalesce(column1, column2) as column from "table";

The idea is really great because this concept will allow:

  • Replacing of database-specifc raw code with objects which will work for all databases
  • Higher-level database operations like fulltext relevance sorting hiding their details

But that's only the idea, in reality it does not work because database expressions are grammar-independent:

  • You can't make a single class like Uuid which will use different functions for every database engine
  • Column name escaping rules are dependent on the database engine (backticks for MySQL, double quotes for everyone else)
  • Using raw string literals is impossible because there is no way to escape them

So i've been going through the Laravel source code to fix all these problems by making expression classes more intelligent. In the end the best solution i found was giving expressions access to the grammar they will be rendered for and add some more information to the grammar the expressions will need.

The pull request is basically separated into 3 steps to make it more understandable. I will explain the changes to Laravel and how they affect query expressions:

Commits

1. Expression classes have access to the grammar

The biggest obstacle of the actual implementation is the expression class. It's sole purpose is to return a raw string which is directly embed into query. But as explained before, a simple construct like a sql alias may need to properly quote column or alias names because of reserved words in databases. To solve that problem the expression's __toString() function has been removed and the only way to get the string representation is now the getValue() method which will be provided a grammar instance. With access to the grammar class the name wrapping functions can be used.

use Illuminate\Database\Grammar;
use Illuminate\Contracts\Database\Query\Expression;
use Illuminate\Support\Facades\DB;

class Alias implements Expression
{
    public function __construct(
        private Expression|string $expression,
        private string $alias,
    ) {}

    public function getValue(Grammar $grammar)
    {
        return $grammar->wrap($this->expression).' as '.$grammar->wrap($this->alias);
    }
}

DB::table('table')
    ->select([new Alias('column', 'aliased_column')])
    ->dump(); // select "column" as "aliased_column" from "table"

2. String quoting support in the grammar instance

The former step solves a big problem on how to build up correctly formatted sql constructs by expressions. But there will be cases you need to use raw literals and in the worst case they may be user-input. And user-input must never be used without precautionary measures: they need to be escaped. Adding placeholder support and returning bindings to expressions was a very hard task i didn't finish as i ran in to many edge cases. The most simple solution, and needing the least changes, was to add string quoting support to the grammar (quoteString()). For this, all grammars have been extended by getting the connection object. By using the connection, the expression can work on the real encoding settings used in the pdo connection. In the future the grammars could be more intelligent because they have real access to the database.

use Illuminate\Database\Grammar;
use Illuminate\Contracts\Database\Query\Expression;
use Illuminate\Support\Facades\DB;

class Coalesce implements Expression
{
    private array $expresssions;

    public function __construct(mixed... $expresssions)
    {
        $this->expresssions = $expresssions;
    }

    public function getValue(Grammar $grammar)
    {
        return 'coalesce('.implode(', ', $grammar->wrapArray($grammar->getValue($this->expresssions))).')';
    }
}

class Value implements Expression
{
    public function __construct(
        private bool|float|int|string $value,
    ) { }

    public function getValue(Grammar $grammar)
    {
        return match (true) {
            is_bool($this->value) => $this->value ? 'true' : 'false',
            is_numeric($this->value) => $this->value,
            default => $grammar->quoteString($this->value),
        };
    }
}

DB::table('table')
    ->select([new Coalesce('column', new Value("Robert '); DROP TABLE Students;--"))])
    ->dump(); // select coalesce("column", 'Robert ''); DROP TABLE Students;--') from "table"

By the way, I also fixed with this change the database migration support for enums. If you specify an enum value with a single quote you get an invalid sql statement because you made a sql injection. With access to proper quoting these enum values can be correctly escaped!

3. Database driver name and version support for grammars

The functionality until this point is very nice, you can achieve a lot with it! But my goal was to get rid of database-specific implementations. The speciality should be handled by the expressions, not the programmer using different expressions. So again I extended the grammar and added methods to get the database driver name (getDatabaseDriver()) and database version (getDatabaseVersion ()) to the grammar which are invocable by the expression. Now we really gain support for expressions to be used in multiple databases:

use Illuminate\Database\Grammar;
use Illuminate\Contracts\Database\Query\Expression;
use Illuminate\Support\Facades\DB;

class Uuid implements Expression
{
    public function getValue(Grammar $grammar)
    {
        return match (true) {
            $grammar->getDatabaseDriver() === 'mysql' => 'uuid()',
            $grammar->getDatabaseDriver() === 'pgsql'
                && version_compare($grammar->getDatabaseVersion(), '13', '>=') => 'gen_random_uuid()',
            $grammar->getDatabaseDriver() === 'pgsql' => 'uuid_generate_v4()',
        };
    }
}

DB::table('table')->update(['uuid' => new Uuid()]);
// MySQL: update `table` set `uuid` = uuid()
// PostgreSQL <13: update "table" set "uuid" = uuid_generate_v4()
// PostgreSQL >= 13: update "table" set "uuid" = gen_random_uuid()

Notes

Chaining

The expressions are "chainable" and support column names, expressions and literal values, so you can build very complex ones. Possible candidates for highly specific ones needing chainability are window-functions or many other cases:

 DB::table('table')
    ->select(new Coalesce(new Coalesce('column1', 'column2'), 'column3', new Value('final value')))
    ->dump(); // select coalesce(coalesce("column1", "column2"), "column3", 'final value') from "table"

Provided Expressions

Providing final ready-to-use expressions is out of the scope of this pull request. Also i believe Laravel should only provide the foundation for custom expressions but not their implementation to keep the core maintainable. Expressions should be provided by the community. When this will be merged, I will work with the community to provide a set of expressions ready for the Laravel 9 release date.


This was multiple days of work, and I tried to find ways for changing the least amount of code. If there is something I should change, optimize or explain why I changed it, don't hesitate to comment. Any other feedback? Do you believe this is useful?

Final Note: The changes to the Laravel core are too deep for a package. So, it's not possible to repackage it for a package.

@mattkingshott
Copy link
Contributor

This is a great addition! I hope it gets merged, as it will encourage developers to avoid raw strings / risk sql injection.

@notmd
Copy link

notmd commented Dec 22, 2021

TypeORM community also propose a similar PR but got rejected. They have hepler functions to help construct Expression. I think we should too, something like

ExpressionFactory::coalesce('column1', 'column2');

@lupinitylabs
Copy link
Contributor

TypeORM community also propose a similar PR but got rejected. They have hepler functions to help construct Expression. I think we should too, something like

ExpressionFactory::coalesce('column1', 'column2');

Interesting. Having an Expression facade and something like

->select(Expression::coalesce(Expression::coalesce('column1', 'column2'), 'column3', Expression::value('final value')))

instead of

->select(new Coalesce(new Coalesce('column1', 'column2'), 'column3', new Value('final value')))

might probably feel more laravelian, too.

@tpetry
Copy link
Contributor Author

tpetry commented Dec 22, 2021

Yes, it‘s an interesting idea but not very extensible. If Expression is a Marco class there can only be one implementation for coalesce(). In the best case i would like to see many packages providing different implementations with different ideas.

@notmd
Copy link

notmd commented Dec 22, 2021

Yes, it‘s an interesting but not very extensible. If Expression is a Marco class there can only be one implementation for coalesce(). In the best case i would like to see many packages providing different implementations with different ideas.

IMO the factory should be something like this. Just a shorthand to construct Expression.

class Expression
{
    public static function coalesce(mixed... $expresssions)
    {
        return new Coalesce($expressions);
    }
}

Although it may be duplicated in the core.

@lupinitylabs
Copy link
Contributor

Yes, it‘s an interesting but not very extensible. If Expression is a Marco class there can only be one implementation for coalesce(). In the best case i would like to see many packages providing different implementations with different ideas.

I am not sure I understand what you mean. How would be provide different implementations with your solution?
You may always extend the Expression class and replace the original implementation in the IOC container to extend or replace individual implementations...

@tpetry
Copy link
Contributor Author

tpetry commented Dec 23, 2021

Yes, it‘s an interesting but not very extensible. If Expression is a Marco class there can only be one implementation for coalesce(). In the best case i would like to see many packages providing different implementations with different ideas.

I am not sure I understand what you mean. How would be provide different implementations with your solution? You may always extend the Expression class and replace the original implementation in the IOC container to extend or replace individual implementations...

By using classes you can require multiple packages. I can provide an implementation for coalesce and you can provide an implementation for coalesce. Mine is a basic mapping of the class to the SQL function and yours is a very genius solution, simplifying application code a lot. By using classes both approaches can co-exist within an application. By using Illuminate\Support\Traits\Macroable or dependency injection only one can be used!

This may sound like nitpicking but it's already a problem for packages providing database extensions. Only one package can provide a custom database connection, query builder or grammar class. So you won't be able to use tpetry/laravel-postgresql-enhanced and staudenmeir/laravel-cte within the same laravel application because one will overwrite the other's custom database connection to implement their features.


Nothing prevents you from creating an expression facade for your own packages. That's all I want to say! It's more extensible that way. The core will support many solutions at once by just using class, and the nice API with an expression facade can be done by the package providing you all the expressions. There can even be multiple different easy to use API approaches.

@taylorotwell
Copy link
Member

taylorotwell commented Jan 3, 2022

@tpetry I think this is a generally idea but it's a very large PR with a lot of changes so it may take a while to get right.

One of the first things that strikes me as "off" is needing to pass the entire database connection to the grammar just to instantiate it for, from what I understand, the sole purpose of being able to tell the expressions the database driver and version. Could those strings just be optional parameters of the grammar constructors? It feels pretty heavy to need an actual, valid database connection just to create a grammar instance.

@@ -9,7 +9,7 @@
use Illuminate\Support\Str;
use RuntimeException;

class Grammar extends BaseGrammar
abstract class Grammar extends BaseGrammar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary for this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BaseGrammar (which is abstract) is now requiring extending classes to implement the abstract public function getDriverName(); method. As the query grammar can't implement this for the dbms specific grammars I changed it to be abstract too. Except from unit tests this class is also never initiated once, so it's basically abstract by intention?

use Illuminate\Database\Query\Processors\Processor;
use Illuminate\Database\Schema\Builder as SchemaBuilder;
use Illuminate\Support\Arr;
use LogicException;
use PDO;
use PDOStatement;

class Connection implements ConnectionInterface
abstract class Connection implements ConnectionInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was converting this to an abstract class and the related changes necessary to this PR? If not, it just adds a bit of noise to the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Connection class has a weird (?) implementation about the schema and query grammar: The connection class was returning a default query grammar in getDefaultQueryGrammar which is overridden in all database connection classes with specific grammar (making it useless?). The getDefaultSchemaGrammar is not returning anything but is overridden again by everyone.

The Connection class has been declared abstract to no longer provide implementations for getDefaultQueryGrammar and getDefaultSchemaGrammar. This may sound unnecessary at first, but as the Grammar class is now abstract because the getDriverName method needs to be implemented, a default Grammar object can no longer be created, so the change is required.

@tpetry
Copy link
Contributor Author

tpetry commented Jan 3, 2022

One of the first things that strikes me as "off" is needing to pass the entire database connection to the grammar just to instantiate it for, from what I understand, the sole purpose of being able to tell the expressions the database driver and version. Could those strings just be optional parameters of the grammar constructors? It feels pretty heavy to need an actual, valid database connection just to create a grammar instance.

That would be possible. But the grammar also receives the database connection to do real quoting of values, which is only safe with the connection. quoteString can‘t be used because it‘s not sql-injection safe.

* @param string $value
* @return string
*/
abstract protected function quoteValue($value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these 3 methods need to be abstract methods? They are never actually implemented except by the traits from what I can tell? Why can the base grammar (this file) not simply use the traits? Why does each grammar import the traits individually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is built with extensibility in mind. If the base grammar has to implement quoteValue it needs to be tied to a specific implementation. In this case, any grammar will have a hard dependency to PDO, which is used by every Laravel database driver at the moment. But 3rd party drivers like jenssegers/laravel-mongodb are not using PDO. Moving the logic of quoteValue from the trait to the base Grammar class prevents anyone from implementing 3rd party db drivers which are not built on top of PDO. So in this change I did the effort to ensure backward compatibility for 3rd party packages.

@tpetry
Copy link
Contributor Author

tpetry commented Jan 4, 2022

I just wanted to add that I am aware that this is a big change. But I did my best and invested a lot of time to find an implementation to change the least amount of Laravel code for this feature. All in all the implementation is even in some kind backwards-compatible with 8.x. Any 3rd party driver built for 9.x will also work with 8.x as no semantics changed only new methods have been added.

@DarkGhostHunter
Copy link
Contributor

I just wanted to add that I am aware that this is a big change. But I did my best and invested a lot of time to find an implementation to change the least amount of Laravel code for this feature. All in all the implementation is even in some kind backwards-compatible with 8.x. Any 3rd party driver built for 9.x will also work with 8.x as no semantics changed only new methods have been added.

If that's the case I don't see why it can't be added to 8.x.

Hopefully I can do something like this:

DB::table('foo')->select(Raw::coalescence('bar', 'baz'))->get();

@tpetry
Copy link
Contributor Author

tpetry commented Jan 5, 2022

I just wanted to add that I am aware that this is a big change. But I did my best and invested a lot of time to find an implementation to change the least amount of Laravel code for this feature. All in all the implementation is even in some kind backwards-compatible with 8.x. Any 3rd party driver built for 9.x will also work with 8.x as no semantics changed only new methods have been added.

If that's the case I don't see why it can't be added to 8.x.

It‘s still a breaking change because new methods need to implemented. But designed that any 9.x implementation is fully compatible with 9.x

Hopefully I can do something like this:

DB::table('foo')->select(Raw::coalescence('bar', 'baz'))->get();

Yes, that‘s possible! As explained before the pull request will only add the functionality for the feature. The nice api needs to be done by you or some package. As long as the Raw::coalesce returns an ExpressionInterface like the one in my examples it will work perfectly :)

@tpetry tpetry force-pushed the query-expressions branch from 3c0e493 to 4429a10 Compare January 6, 2022 18:47
@tpetry
Copy link
Contributor Author

tpetry commented Jan 6, 2022

I've rebased the code on the newest master.

@driesvints driesvints changed the base branch from master to 9.x January 12, 2022 19:47
@taylorotwell
Copy link
Member

I'm sorry I just don't think I have the time to take on maintaining this large of a change for 9.x. I will think about the problem and try to come up with some non-breaking solution for it in the future though.

@tpetry
Copy link
Contributor Author

tpetry commented Jan 26, 2022

My plan would be doing this again for 10.x. The feature is based on three different commits. @taylorotwell Would you be open for 3 different PRs for 10.x with weeks between each of them? Each one adding a requirement for it and integrating those features into different parts on the way? As said the quoteString could already be used to make some things more safe.

@inxilpro
Copy link
Contributor

I feel like if we were able to get the database connection as a dependency of grammars live in Laravel 9, there's probably a backwards-compatible way to get the rest of this done later, as that's the major breaking interface change.

@taylorotwell is that something you'd be open to, or do you think it's too late?

I've taken a peek at the code, and making the connection a dependency of grammars actually simplifies things in some places, since functions like compileCreateDatabase(), compileCreate(), and compileCreateTable() need to accept the connection as a parameter to work.

Another option would be something like a ConnectionAwareGrammar interface that the existing grammars could implement, but wasn't necessary for 3rd-party connections to implement. In the end, that may be more difficult to maintain, but is potentially more backwards-compatible.

@driesvints
Copy link
Member

Hey guys. Taylor rarely replies to closed PR's. I think it's best to leave this one for now and let him come up with a solution at a moment when he finds the time for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants