From Zero to MLflow: Track, Tune, and Serve a Keras Model

# From Zero to MLflow: Tracking, Tuning, and Deploying a Keras Model (Hands-on)

I practiced MLflow end to end across two notebook folders (`1-MLproject` and `2-DLproject`) to learn how to track experiments, compare runs, pick the best model, and serve it. This post distills the workflow with runnable snippets.

What you’ll build

Run and track training experiments with MLflow
Log parameters, metrics, and models from Keras/TensorFlow
Tune hyperparameters with Hyperopt and compare runs in the MLflow UI
Pick the best run, register a model, and serve it via a local REST API

Repo layout used:

`1-MLproject/` — classic ML experimentation (e.g., house price notebook)
`2-DLproject/` — deep learning quickstart with MLflow tracking + Hyperopt
`requirements.txt` — dependencies to reproduce runs

Environment and data

Install dependencies and (optionally) start the UI:

pip install -r requirements.txt
mlflow ui --port 5000

Dataset: White wine quality from the MLflow repo.

import pandas as pd

data = pd.read_csv(
    "https://raw.githubusercontent.com/mlflow/mlflow/master/tests/datasets/winequality-white.csv",
    sep=";",
)

Train/valid/test split:

from sklearn.model_selection import train_test_split

train, test = train_test_split(data, test_size=0.25, random_state=42)

train_x = train.drop(["quality"], axis=1).values
train_y = train[["quality"]].values.ravel()

test_x = test.drop(["quality"], axis=1).values
test_y = test["quality"].values.ravel()

train_x, valid_x, train_y, valid_y = train_test_split(
    train_x, train_y, test_size=0.20, random_state=42
)

Infer a model signature for safer serving later:

import mlflow
from mlflow.models import infer_signature

signature = infer_signature(train_x, train_y)

Set up MLflow experiment

import mlflow

mlflow.set_experiment("/wine-quality")

This creates/uses the “wine-quality” experiment namespace so your runs are grouped in the UI.

A simple Keras model with MLflow logging

The training function below trains a small network and logs params, metrics, and the model to MLflow.

Each hyperparameter trial is recorded as a nested MLflow run.

import numpy as np
import keras
import mlflow
from hyperopt import STATUS_OK


def train_model(params, epochs, train_x, train_y, valid_x, valid_y):
    mean = np.mean(train_x, axis=0)
    var = np.var(train_x, axis=0)

    model = keras.Sequential([
        keras.Input([train_x.shape[1]]),
        keras.layers.Normalization(mean=mean, variance=var),
        keras.layers.Dense(64, activation="relu"),
        keras.layers.Dense(1),
    ])

    model.compile(
        optimizer=keras.optimizers.SGD(
            learning_rate=params["lr"],
            momentum=params["momentum"],
        ),
        loss="mean_squared_error",
        metrics=[keras.metrics.RootMeanSquaredError()],
    )

    with mlflow.start_run(nested=True):
        model.fit(
            train_x,
            train_y,
            validation_data=(valid_x, valid_y),
            epochs=epochs,
            batch_size=64,
            verbose=0,
        )

        _, eval_rmse = model.evaluate(valid_x, valid_y, batch_size=64, verbose=0)

        mlflow.log_params(params)
        mlflow.log_metric("eval_rmse", float(eval_rmse))

        mlflow.tensorflow.log_model(model, "model", signature=signature)

        return {"loss": float(eval_rmse), "status": STATUS_OK, "model": model}

Hyperparameter tuning with Hyperopt (and MLflow tracking)

Define the search space and objective:

from hyperopt import fmin, tpe, hp, Trials

space = {
    "lr": hp.loguniform("lr", np.log(1e-5), np.log(1e-1)),
    # Momentum is in [0, 1] so use uniform (not loguniform)
    "momentum": hp.uniform("momentum", 0.0, 1.0),
}


def objective(params):
    return train_model(
        params=params,
        epochs=3,
        train_x=train_x,
        train_y=train_y,
        valid_x=valid_x,
        valid_y=valid_y,
    )

Run the sweep under a parent run so it’s easy to register later:

import mlflow

mlflow.set_experiment("/wine-quality")

with mlflow.start_run() as parent_run:
    trials = Trials()

    best_params = fmin(
        fn=objective,
        space=space,
        algo=tpe.suggest,
        max_evals=4,
        trials=trials,
        rstate=np.random.default_rng(42),
    )

    best_trial = sorted(trials.results, key=lambda x: x["loss"])[0]

    mlflow.log_params(best_params)
    mlflow.log_metric("eval_rmse", best_trial["loss"])
    mlflow.tensorflow.log_model(best_trial["model"], "model", signature=signature)

    parent_run_id = parent_run.info.run_id

print("Best parameters:", best_params)
print("Best eval RMSE:", best_trial["loss"])
print("Parent run:", parent_run_id)

Open the MLflow UI at `http://127.0.0.1:5000` to compare runs and metrics under the “wine-quality” experiment.

Register the winner in the Model Registry

from mlflow import register_model

result = register_model(
    model_uri=f"runs:/{parent_run_id}/model",
    name="finalmodel",
)

print("Registered model:", result.name, "version:", result.version)

This produces a registry entry like `models:/finalmodel/1`.

Serve the model as a REST API

Serve a specific version from the registry:

mlflow models serve -m "models:/finalmodel/1" -p 5001 --no-conda

Send a sample request (wine dataset has 11 numeric features):

curl -s http://127.0.0.1:5001/invocations \
  -H 'Content-Type: application/json' \
  -d '{"instances": [[7.0, 0.27, 0.36, 20.7, 0.045, 45.0, 170.0, 1.0010, 3.00, 0.45, 8.8]]}'

You can also serve directly from a run:

mlflow models serve -m "runs:/$RUN_ID/model" -p 5001 --no-conda

Comparing with a classic ML notebook (house prices)

The same tracking pattern works for classic ML:

import mlflow

mlflow.set_experiment("/house-prices")

with mlflow.start_run():
    mlflow.log_params({"model": "RandomForest", "n_estimators": 200})
    mlflow.log_metric("rmse", 0.123)
    mlflow.sklearn.log_model(model, "model")

Troubleshooting and gotchas

Hyperopt distributions: `hp.loguniform` samples `exp(U(low, high))` so `low`/`high` must be logs of positive numbers. For [0, 1] ranges (like momentum), use `hp.uniform`.
Nested runs: use `mlflow.start_run(nested=True)` for each trial so the UI keeps the tree under a single parent sweep.
Signatures: `infer_signature` helps validation at serving time and documents expected input/outputs.
Reproducibility: fix random seeds and log data/code versions (e.g., git commit hash tags).
UI hiccups: refresh and confirm you’re in the right experiment.

What I’d improve next

Use `mlflow.autolog()` to capture more metrics automatically
Add explicit data lineage (feature store / data versioning)
Wire CI to run selected experiments and auto-register on metric thresholds
Use Docker for consistent serving environments

Try it yourself

1) Install and run the sweep (end-to-end script)

2) Start the UI: `mlflow ui --port 5000`

3) Serve your registered model: `mlflow models serve -m "models:/finalmodel/1" -p 5001 --no-conda`

That’s it — you now have a repeatable workflow for tracking, selecting, and serving models with MLflow.

From Zero to MLflow: Tracking, Tuning, and Deploying a Keras Model (Hands-on)