Generative AI

    From Zero to MLflow: Tracking, Tuning, and Deploying a Keras Model (Hands-on)

    A hands-on, copy-paste-ready walkthrough to track Keras/TensorFlow experiments in MLflow, run Hyperopt tuning with nested runs, register the best model, and serve it as a REST API.

    Originsoft TeamEngineering Team
    November 13, 2025
    5 min read
    From Zero to MLflow: Tracking, Tuning, and Deploying a Keras Model (Hands-on)

    # From Zero to MLflow: Tracking, Tuning, and Deploying a Keras Model (Hands-on)

    I practiced MLflow end to end across two notebook folders (`1-MLproject` and `2-DLproject`) to learn how to track experiments, compare runs, pick the best model, and serve it. This post distills the workflow with runnable snippets.

    What you’ll build

    • Run and track training experiments with MLflow
    • Log parameters, metrics, and models from Keras/TensorFlow
    • Tune hyperparameters with Hyperopt and compare runs in the MLflow UI
    • Pick the best run, register a model, and serve it via a local REST API

    Repo layout used:

    • `1-MLproject/` — classic ML experimentation (e.g., house price notebook)
    • `2-DLproject/` — deep learning quickstart with MLflow tracking + Hyperopt
    • `requirements.txt` — dependencies to reproduce runs

    Environment and data

    Install dependencies and (optionally) start the UI:

    pip install -r requirements.txt
    mlflow ui --port 5000

    Dataset: White wine quality from the MLflow repo.

    import pandas as pd
    
    data = pd.read_csv(
        "https://raw.githubusercontent.com/mlflow/mlflow/master/tests/datasets/winequality-white.csv",
        sep=";",
    )

    Train/valid/test split:

    from sklearn.model_selection import train_test_split
    
    train, test = train_test_split(data, test_size=0.25, random_state=42)
    
    train_x = train.drop(["quality"], axis=1).values
    train_y = train[["quality"]].values.ravel()
    
    test_x = test.drop(["quality"], axis=1).values
    test_y = test["quality"].values.ravel()
    
    train_x, valid_x, train_y, valid_y = train_test_split(
        train_x, train_y, test_size=0.20, random_state=42
    )

    Infer a model signature for safer serving later:

    import mlflow
    from mlflow.models import infer_signature
    
    signature = infer_signature(train_x, train_y)

    Set up MLflow experiment

    import mlflow
    
    mlflow.set_experiment("/wine-quality")

    This creates/uses the “wine-quality” experiment namespace so your runs are grouped in the UI.

    A simple Keras model with MLflow logging

    The training function below trains a small network and logs params, metrics, and the model to MLflow.

    Each hyperparameter trial is recorded as a nested MLflow run.

    import numpy as np
    import keras
    import mlflow
    from hyperopt import STATUS_OK
    
    
    def train_model(params, epochs, train_x, train_y, valid_x, valid_y):
        mean = np.mean(train_x, axis=0)
        var = np.var(train_x, axis=0)
    
        model = keras.Sequential([
            keras.Input([train_x.shape[1]]),
            keras.layers.Normalization(mean=mean, variance=var),
            keras.layers.Dense(64, activation="relu"),
            keras.layers.Dense(1),
        ])
    
        model.compile(
            optimizer=keras.optimizers.SGD(
                learning_rate=params["lr"],
                momentum=params["momentum"],
            ),
            loss="mean_squared_error",
            metrics=[keras.metrics.RootMeanSquaredError()],
        )
    
        with mlflow.start_run(nested=True):
            model.fit(
                train_x,
                train_y,
                validation_data=(valid_x, valid_y),
                epochs=epochs,
                batch_size=64,
                verbose=0,
            )
    
            _, eval_rmse = model.evaluate(valid_x, valid_y, batch_size=64, verbose=0)
    
            mlflow.log_params(params)
            mlflow.log_metric("eval_rmse", float(eval_rmse))
    
            mlflow.tensorflow.log_model(model, "model", signature=signature)
    
            return {"loss": float(eval_rmse), "status": STATUS_OK, "model": model}

    Hyperparameter tuning with Hyperopt (and MLflow tracking)

    Define the search space and objective:

    from hyperopt import fmin, tpe, hp, Trials
    
    space = {
        "lr": hp.loguniform("lr", np.log(1e-5), np.log(1e-1)),
        # Momentum is in [0, 1] so use uniform (not loguniform)
        "momentum": hp.uniform("momentum", 0.0, 1.0),
    }
    
    
    def objective(params):
        return train_model(
            params=params,
            epochs=3,
            train_x=train_x,
            train_y=train_y,
            valid_x=valid_x,
            valid_y=valid_y,
        )

    Run the sweep under a parent run so it’s easy to register later:

    import mlflow
    
    mlflow.set_experiment("/wine-quality")
    
    with mlflow.start_run() as parent_run:
        trials = Trials()
    
        best_params = fmin(
            fn=objective,
            space=space,
            algo=tpe.suggest,
            max_evals=4,
            trials=trials,
            rstate=np.random.default_rng(42),
        )
    
        best_trial = sorted(trials.results, key=lambda x: x["loss"])[0]
    
        mlflow.log_params(best_params)
        mlflow.log_metric("eval_rmse", best_trial["loss"])
        mlflow.tensorflow.log_model(best_trial["model"], "model", signature=signature)
    
        parent_run_id = parent_run.info.run_id
    
    print("Best parameters:", best_params)
    print("Best eval RMSE:", best_trial["loss"])
    print("Parent run:", parent_run_id)

    Open the MLflow UI at `http://127.0.0.1:5000` to compare runs and metrics under the “wine-quality” experiment.

    Register the winner in the Model Registry

    Register the parent run’s model artifact:

    from mlflow import register_model
    
    result = register_model(
        model_uri=f"runs:/{parent_run_id}/model",
        name="finalmodel",
    )
    
    print("Registered model:", result.name, "version:", result.version)

    This produces a registry entry like `models:/finalmodel/1`.

    Serve the model as a REST API

    Serve a specific version from the registry:

    mlflow models serve -m "models:/finalmodel/1" -p 5001 --no-conda

    Send a sample request (wine dataset has 11 numeric features):

    curl -s http://127.0.0.1:5001/invocations \
      -H 'Content-Type: application/json' \
      -d '{"instances": [[7.0, 0.27, 0.36, 20.7, 0.045, 45.0, 170.0, 1.0010, 3.00, 0.45, 8.8]]}'

    You can also serve directly from a run:

    mlflow models serve -m "runs:/$RUN_ID/model" -p 5001 --no-conda

    Comparing with a classic ML notebook (house prices)

    The same tracking pattern works for classic ML:

    import mlflow
    
    mlflow.set_experiment("/house-prices")
    
    with mlflow.start_run():
        mlflow.log_params({"model": "RandomForest", "n_estimators": 200})
        mlflow.log_metric("rmse", 0.123)
        mlflow.sklearn.log_model(model, "model")

    Troubleshooting and gotchas

    • Hyperopt distributions: `hp.loguniform` samples `exp(U(low, high))` so `low`/`high` must be logs of positive numbers. For [0, 1] ranges (like momentum), use `hp.uniform`.
    • Nested runs: use `mlflow.start_run(nested=True)` for each trial so the UI keeps the tree under a single parent sweep.
    • Signatures: `infer_signature` helps validation at serving time and documents expected input/outputs.
    • Reproducibility: fix random seeds and log data/code versions (e.g., git commit hash tags).
    • UI hiccups: refresh and confirm you’re in the right experiment.

    What I’d improve next

    • Use `mlflow.autolog()` to capture more metrics automatically
    • Add explicit data lineage (feature store / data versioning)
    • Wire CI to run selected experiments and auto-register on metric thresholds
    • Use Docker for consistent serving environments

    Try it yourself

    1) Install and run the sweep (end-to-end script)

    2) Start the UI: `mlflow ui --port 5000`

    3) Serve your registered model: `mlflow models serve -m "models:/finalmodel/1" -p 5001 --no-conda`

    That’s it — you now have a repeatable workflow for tracking, selecting, and serving models with MLflow.

    #MLflow#MLOps#Keras#TensorFlow#Hyperopt#Experiment Tracking#Model Serving
    Originsoft Team

    Engineering Team

    The engineering team at Originsoft Consultancy brings together decades of combined experience in software architecture, AI/ML, and cloud-native development. We are passionate about sharing knowledge and helping developers build better software.