Generative AI

    From Zero to MLflow: Tracking, Tuning, and Deploying a Keras Model (Hands-on)

    A hands-on, copy-paste-ready walkthrough to track Keras/TensorFlow experiments in MLflow, run Hyperopt tuning with nested runs, register the best model, and serve it as a REST API.

    Faizan AzizData Scientist
    November 13, 2025
    5 min read
    From Zero to MLflow: Tracking, Tuning, and Deploying a Keras Model (Hands-on)

    # From Zero to MLflow: Tracking, Tuning, and Deploying a Keras Model (Hands-on)

    I practiced MLflow end to end across two notebook folders (`1-MLproject` and `2-DLproject`) to learn how to track experiments, compare runs, pick the best model, and serve it. This post distills the workflow with runnable snippets.

    What you’ll build

    • Run and track training experiments with MLflow
    • Log parameters, metrics, and models from Keras/TensorFlow
    • Tune hyperparameters with Hyperopt and compare runs in the MLflow UI
    • Pick the best run, register a model, and serve it via a local REST API

    Repo layout used:

    • `1-MLproject/` — classic ML experimentation (e.g., house price notebook)
    • `2-DLproject/` — deep learning quickstart with MLflow tracking + Hyperopt
    • `requirements.txt` — dependencies to reproduce runs

    Environment and data

    Install dependencies and (optionally) start the UI:

    pip install -r requirements.txt
    mlflow ui --port 5000

    Dataset: White wine quality from the MLflow repo.

    import pandas as pd
    
    data = pd.read_csv(
        "https://raw.githubusercontent.com/mlflow/mlflow/master/tests/datasets/winequality-white.csv",
        sep=";",
    )

    Train/valid/test split:

    from sklearn.model_selection import train_test_split
    
    train, test = train_test_split(data, test_size=0.25, random_state=42)
    
    train_x = train.drop(["quality"], axis=1).values
    train_y = train[["quality"]].values.ravel()
    
    test_x = test.drop(["quality"], axis=1).values
    test_y = test["quality"].values.ravel()
    
    train_x, valid_x, train_y, valid_y = train_test_split(
        train_x, train_y, test_size=0.20, random_state=42
    )

    Infer a model signature for safer serving later:

    import mlflow
    from mlflow.models import infer_signature
    
    signature = infer_signature(train_x, train_y)

    Set up MLflow experiment

    import mlflow
    
    mlflow.set_experiment("/wine-quality")

    This creates/uses the “wine-quality” experiment namespace so your runs are grouped in the UI.

    A simple Keras model with MLflow logging

    The training function below trains a small network and logs params, metrics, and the model to MLflow.

    Each hyperparameter trial is recorded as a nested MLflow run.

    import numpy as np
    import keras
    import mlflow
    from hyperopt import STATUS_OK
    
    
    def train_model(params, epochs, train_x, train_y, valid_x, valid_y):
        mean = np.mean(train_x, axis=0)
        var = np.var(train_x, axis=0)
    
        model = keras.Sequential([
            keras.Input([train_x.shape[1]]),
            keras.layers.Normalization(mean=mean, variance=var),
            keras.layers.Dense(64, activation="relu"),
            keras.layers.Dense(1),
        ])
    
        model.compile(
            optimizer=keras.optimizers.SGD(
                learning_rate=params["lr"],
                momentum=params["momentum"],
            ),
            loss="mean_squared_error",
            metrics=[keras.metrics.RootMeanSquaredError()],
        )
    
        with mlflow.start_run(nested=True):
            model.fit(
                train_x,
                train_y,
                validation_data=(valid_x, valid_y),
                epochs=epochs,
                batch_size=64,
                verbose=0,
            )
    
            _, eval_rmse = model.evaluate(valid_x, valid_y, batch_size=64, verbose=0)
    
            mlflow.log_params(params)
            mlflow.log_metric("eval_rmse", float(eval_rmse))
    
            mlflow.tensorflow.log_model(model, "model", signature=signature)
    
            return {"loss": float(eval_rmse), "status": STATUS_OK, "model": model}

    Hyperparameter tuning with Hyperopt (and MLflow tracking)

    Define the search space and objective:

    from hyperopt import fmin, tpe, hp, Trials
    
    space = {
        "lr": hp.loguniform("lr", np.log(1e-5), np.log(1e-1)),
        # Momentum is in [0, 1] so use uniform (not loguniform)
        "momentum": hp.uniform("momentum", 0.0, 1.0),
    }
    
    
    def objective(params):
        return train_model(
            params=params,
            epochs=3,
            train_x=train_x,
            train_y=train_y,
            valid_x=valid_x,
            valid_y=valid_y,
        )

    Run the sweep under a parent run so it’s easy to register later:

    import mlflow
    
    mlflow.set_experiment("/wine-quality")
    
    with mlflow.start_run() as parent_run:
        trials = Trials()
    
        best_params = fmin(
            fn=objective,
            space=space,
            algo=tpe.suggest,
            max_evals=4,
            trials=trials,
            rstate=np.random.default_rng(42),
        )
    
        best_trial = sorted(trials.results, key=lambda x: x["loss"])[0]
    
        mlflow.log_params(best_params)
        mlflow.log_metric("eval_rmse", best_trial["loss"])
        mlflow.tensorflow.log_model(best_trial["model"], "model", signature=signature)
    
        parent_run_id = parent_run.info.run_id
    
    print("Best parameters:", best_params)
    print("Best eval RMSE:", best_trial["loss"])
    print("Parent run:", parent_run_id)

    Open the MLflow UI at `http://127.0.0.1:5000` to compare runs and metrics under the “wine-quality” experiment.

    Register the winner in the Model Registry

    Register the parent run’s model artifact:

    from mlflow import register_model
    
    result = register_model(
        model_uri=f"runs:/{parent_run_id}/model",
        name="finalmodel",
    )
    
    print("Registered model:", result.name, "version:", result.version)

    This produces a registry entry like `models:/finalmodel/1`.

    Serve the model as a REST API

    Serve a specific version from the registry:

    mlflow models serve -m "models:/finalmodel/1" -p 5001 --no-conda

    Send a sample request (wine dataset has 11 numeric features):

    curl -s http://127.0.0.1:5001/invocations \
      -H 'Content-Type: application/json' \
      -d '{"instances": [[7.0, 0.27, 0.36, 20.7, 0.045, 45.0, 170.0, 1.0010, 3.00, 0.45, 8.8]]}'

    You can also serve directly from a run:

    mlflow models serve -m "runs:/$RUN_ID/model" -p 5001 --no-conda

    Comparing with a classic ML notebook (house prices)

    The same tracking pattern works for classic ML:

    import mlflow
    
    mlflow.set_experiment("/house-prices")
    
    with mlflow.start_run():
        mlflow.log_params({"model": "RandomForest", "n_estimators": 200})
        mlflow.log_metric("rmse", 0.123)
        mlflow.sklearn.log_model(model, "model")

    Troubleshooting and gotchas

    • Hyperopt distributions: `hp.loguniform` samples `exp(U(low, high))` so `low`/`high` must be logs of positive numbers. For [0, 1] ranges (like momentum), use `hp.uniform`.
    • Nested runs: use `mlflow.start_run(nested=True)` for each trial so the UI keeps the tree under a single parent sweep.
    • Signatures: `infer_signature` helps validation at serving time and documents expected input/outputs.
    • Reproducibility: fix random seeds and log data/code versions (e.g., git commit hash tags).
    • UI hiccups: refresh and confirm you’re in the right experiment.

    What I’d improve next

    • Use `mlflow.autolog()` to capture more metrics automatically
    • Add explicit data lineage (feature store / data versioning)
    • Wire CI to run selected experiments and auto-register on metric thresholds
    • Use Docker for consistent serving environments

    Try it yourself

    1) Install and run the sweep (end-to-end script)

    2) Start the UI: `mlflow ui --port 5000`

    3) Serve your registered model: `mlflow models serve -m "models:/finalmodel/1" -p 5001 --no-conda`

    That’s it — you now have a repeatable workflow for tracking, selecting, and serving models with MLflow.

    #MLflow#MLOps#Keras#TensorFlow#Hyperopt#Experiment Tracking#Model Serving
    Faizan Aziz

    Data Scientist

    Faizan Aziz is a Data Scientist at Originsoft Consultancy with a focus on MLOps, experiment tracking, and reproducible machine learning pipelines. He specializes in bridging the gap between research and production, building systems with DVC, MLflow, and modern data tooling to make ML experiments reliable and deployable.