6 min read

MySQL 9 VECTOR Type: Building Semantic Search in PHP Without a Dedicated Vector Database

MySQL 9's native VECTOR data type lets PHP developers build semantic search and AI similarity features without spinning up a separate vector database.

Featured image for "MySQL 9 VECTOR Type: Building Semantic Search in PHP Without a Dedicated Vector Database"

If you have been following the AI tooling wave, you have probably heard the pitch for pgvector, Pinecone, Weaviate, or any number of purpose-built vector databases. The argument is compelling on paper: store embeddings alongside your data and unlock semantic search, recommendation engines, and retrieval-augmented generation (RAG) pipelines. The catch is that these solutions add infrastructure — another service to deploy, another connection to manage, another billing line to justify.

MySQL 9.0, released in July 2024, introduced a native VECTOR data type that lets you store and query vector embeddings directly in the database you are probably already running. For PHP developers at small-to-medium scale, this changes the calculus. You can prototype semantic search against existing tables without leaving MySQL, using the same PDO connections and query patterns you already know.

What the VECTOR Type Actually Is

The MySQL VECTOR type stores a fixed-length array of single-precision floating-point numbers as an opaque binary blob. You define the dimensionality at column definition time:

CREATE TABLE articles (
    id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    body TEXT NOT NULL,
    embedding VECTOR(1536) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

The 1536 here matches the dimensionality of OpenAI’s text-embedding-3-small model. If you use a different model, adjust the number accordingly — Anthropic’s embeddings run at 1024 dimensions, for example.

MySQL ships four helper functions for working with the type:

  • STRING_TO_VECTOR('[0.1, 0.2, ...]') — converts a JSON-array string into the binary vector format
  • VECTOR_TO_STRING(embedding) — the reverse, useful for debugging
  • VECTOR_DIM(embedding) — returns the number of dimensions
  • VECTOR_DISTANCE(v1, v2, 'metric') — computes distance between two vectors

The available metrics for VECTOR_DISTANCE are EUCLIDEAN, COSINE, and DOT. For semantic similarity on text embeddings, cosine distance is your default starting point.

Generating Embeddings in PHP

You need embeddings before you can search. Here is a simple wrapper around the OpenAI embeddings API using curl:

function generateEmbedding(string $text, string $apiKey): array
{
    $response = json_decode(
        file_get_contents('https://api.openai.com/v1/embeddings', false, stream_context_create([
            'http' => [
                'method'  => 'POST',
                'header'  => [
                    'Content-Type: application/json',
                    'Authorization: Bearer ' . $apiKey,
                ],
                'content' => json_encode([
                    'input' => $text,
                    'model' => 'text-embedding-3-small',
                ]),
            ],
        ])),
        true
    );

    return $response['data'][0]['embedding'];
}

In a real project, swap the raw file_get_contents for a proper HTTP client. If you are in Laravel, the openai-php/laravel package gives you OpenAI::embeddings()->create(). The result in both cases is a PHP float array.

Inserting Vectors

MySQL expects the embedding in its binary format, which means you pass through STRING_TO_VECTOR():

function insertArticle(PDO $pdo, string $title, string $body, array $embedding): void
{
    $vectorString = '[' . implode(',', $embedding) . ']';

    $stmt = $pdo->prepare(
        'INSERT INTO articles (title, body, embedding)
         VALUES (:title, :body, STRING_TO_VECTOR(:embedding))'
    );

    $stmt->execute([
        ':title'     => $title,
        ':body'      => $body,
        ':embedding' => $vectorString,
    ]);
}

The pattern is straightforward: serialize your float array to a JSON-array string and let STRING_TO_VECTOR() handle the binary conversion inside MySQL.

Querying for Semantic Similarity

This is where it gets interesting. To find the five articles most semantically similar to a search query, you generate an embedding for the query and then sort by distance:

function semanticSearch(PDO $pdo, array $queryEmbedding, int $limit = 5): array
{
    $vectorString = '[' . implode(',', $queryEmbedding) . ']';

    $stmt = $pdo->prepare(
        'SELECT id, title,
                VECTOR_DISTANCE(embedding, STRING_TO_VECTOR(:query), \'COSINE\') AS distance
         FROM articles
         ORDER BY distance ASC
         LIMIT :limit'
    );

    $stmt->bindValue(':query', $vectorString, PDO::PARAM_STR);
    $stmt->bindValue(':limit', $limit, PDO::PARAM_INT);
    $stmt->execute();

    return $stmt->fetchAll(PDO::FETCH_ASSOC);
}

Lower cosine distance means higher semantic similarity. A distance of 0 would be an identical vector; typical good matches land below 0.2.

Laravel Integration

If you are using Laravel with Eloquent, the pattern translates cleanly. Add the column in a migration:

Schema::create('articles', function (Blueprint $table) {
    $table->id();
    $table->string('title');
    $table->text('body');
    $table->string('embedding', 65535)->nullable(); // store as raw binary via DB expression
    $table->timestamps();
});

Because Laravel’s schema builder does not yet have a first-class vector() column type for MySQL, the most practical approach is to run the CREATE TABLE with raw SQL in your migration, or use a DB::statement() call to add the column after the fact:

DB::statement('ALTER TABLE articles ADD COLUMN embedding VECTOR(1536) NOT NULL');

For queries, use raw expressions inside Eloquent:

$results = Article::selectRaw(
    'id, title, VECTOR_DISTANCE(embedding, STRING_TO_VECTOR(?), \'COSINE\') AS distance',
    ['[' . implode(',', $queryEmbedding) . ']']
)
->orderBy('distance', 'asc')
->limit(5)
->get();

It is not as elegant as a dedicated Scout driver, but it is self-contained and requires no additional packages.

The Limitations You Need to Know

MySQL 9’s VECTOR support is genuine and useful, but it is not a drop-in replacement for pgvector or Pinecone at scale. There are two constraints worth understanding before you commit to it.

First, the VECTOR_DISTANCE() function is available in the Community and Commercial editions, but MySQL’s approximate nearest neighbor (ANN) indexing — the structure that makes vector search fast at millions of rows — is primarily a MySQL HeatWave feature. Without ANN indexing, your similarity queries perform a full-table scan, computing distance against every row. On tables with tens of thousands of rows this is fine. On tables with millions, query time grows linearly with row count and you will notice it.

Second, the DISTANCE() function mentioned in some MySQL documentation and the HeatWave MySQL AI SDK documentation is not the same as the community VECTOR_DISTANCE() — if you see DISTANCE() referenced without qualification, it is a HeatWave-specific function. Stick with VECTOR_DISTANCE() for portability across standard MySQL installations.

For small catalogs (articles, products, knowledge base entries, FAQ items), the full-scan approach works well and the operational simplicity is a real win. For large-scale embeddings with strict latency requirements, evaluate pgvector on PostgreSQL or a purpose-built service.

When This Makes Sense

The MySQL 9 VECTOR type is a strong fit when you are adding semantic search to an existing application and want to avoid infrastructure sprawl. Typical use cases at the right scale include site-wide content search that understands intent rather than just keywords, “related articles” recommendations on a blog or documentation site, FAQ deflection in a support portal, and product discovery in small-to-medium e-commerce catalogs.

If you are already on MySQL 9 and your dataset fits in memory, there is a real argument for starting here. You get to prove out the feature and the embedding pipeline with zero new services, and you can migrate to a more capable vector store later if the scale demands it.

Resources