From Zero to MLflow: Tracking, Tuning, and Deploying a Keras Model (Hands-on)
A hands-on, copy-paste-ready walkthrough to track Keras/TensorFlow experiments in MLflow, run Hyperopt tuning with nested runs, register the best model, and serve it as a REST API.

# From Zero to MLflow: Tracking, Tuning, and Deploying a Keras Model (Hands-on)
I practiced MLflow end to end across two notebook folders (`1-MLproject` and `2-DLproject`) to learn how to track experiments, compare runs, pick the best model, and serve it. This post distills the workflow with runnable snippets.
What you’ll build
- Run and track training experiments with MLflow
- Log parameters, metrics, and models from Keras/TensorFlow
- Tune hyperparameters with Hyperopt and compare runs in the MLflow UI
- Pick the best run, register a model, and serve it via a local REST API
Repo layout used:
- `1-MLproject/` — classic ML experimentation (e.g., house price notebook)
- `2-DLproject/` — deep learning quickstart with MLflow tracking + Hyperopt
- `requirements.txt` — dependencies to reproduce runs
Environment and data
Install dependencies and (optionally) start the UI:
pip install -r requirements.txt
mlflow ui --port 5000Dataset: White wine quality from the MLflow repo.
import pandas as pd
data = pd.read_csv(
"https://raw.githubusercontent.com/mlflow/mlflow/master/tests/datasets/winequality-white.csv",
sep=";",
)Train/valid/test split:
from sklearn.model_selection import train_test_split
train, test = train_test_split(data, test_size=0.25, random_state=42)
train_x = train.drop(["quality"], axis=1).values
train_y = train[["quality"]].values.ravel()
test_x = test.drop(["quality"], axis=1).values
test_y = test["quality"].values.ravel()
train_x, valid_x, train_y, valid_y = train_test_split(
train_x, train_y, test_size=0.20, random_state=42
)Infer a model signature for safer serving later:
import mlflow
from mlflow.models import infer_signature
signature = infer_signature(train_x, train_y)Set up MLflow experiment
import mlflow
mlflow.set_experiment("/wine-quality")This creates/uses the “wine-quality” experiment namespace so your runs are grouped in the UI.
A simple Keras model with MLflow logging
The training function below trains a small network and logs params, metrics, and the model to MLflow.
Each hyperparameter trial is recorded as a nested MLflow run.
import numpy as np
import keras
import mlflow
from hyperopt import STATUS_OK
def train_model(params, epochs, train_x, train_y, valid_x, valid_y):
mean = np.mean(train_x, axis=0)
var = np.var(train_x, axis=0)
model = keras.Sequential([
keras.Input([train_x.shape[1]]),
keras.layers.Normalization(mean=mean, variance=var),
keras.layers.Dense(64, activation="relu"),
keras.layers.Dense(1),
])
model.compile(
optimizer=keras.optimizers.SGD(
learning_rate=params["lr"],
momentum=params["momentum"],
),
loss="mean_squared_error",
metrics=[keras.metrics.RootMeanSquaredError()],
)
with mlflow.start_run(nested=True):
model.fit(
train_x,
train_y,
validation_data=(valid_x, valid_y),
epochs=epochs,
batch_size=64,
verbose=0,
)
_, eval_rmse = model.evaluate(valid_x, valid_y, batch_size=64, verbose=0)
mlflow.log_params(params)
mlflow.log_metric("eval_rmse", float(eval_rmse))
mlflow.tensorflow.log_model(model, "model", signature=signature)
return {"loss": float(eval_rmse), "status": STATUS_OK, "model": model}Hyperparameter tuning with Hyperopt (and MLflow tracking)
Define the search space and objective:
from hyperopt import fmin, tpe, hp, Trials
space = {
"lr": hp.loguniform("lr", np.log(1e-5), np.log(1e-1)),
# Momentum is in [0, 1] so use uniform (not loguniform)
"momentum": hp.uniform("momentum", 0.0, 1.0),
}
def objective(params):
return train_model(
params=params,
epochs=3,
train_x=train_x,
train_y=train_y,
valid_x=valid_x,
valid_y=valid_y,
)Run the sweep under a parent run so it’s easy to register later:
import mlflow
mlflow.set_experiment("/wine-quality")
with mlflow.start_run() as parent_run:
trials = Trials()
best_params = fmin(
fn=objective,
space=space,
algo=tpe.suggest,
max_evals=4,
trials=trials,
rstate=np.random.default_rng(42),
)
best_trial = sorted(trials.results, key=lambda x: x["loss"])[0]
mlflow.log_params(best_params)
mlflow.log_metric("eval_rmse", best_trial["loss"])
mlflow.tensorflow.log_model(best_trial["model"], "model", signature=signature)
parent_run_id = parent_run.info.run_id
print("Best parameters:", best_params)
print("Best eval RMSE:", best_trial["loss"])
print("Parent run:", parent_run_id)Open the MLflow UI at `http://127.0.0.1:5000` to compare runs and metrics under the “wine-quality” experiment.
Register the winner in the Model Registry
Register the parent run’s model artifact:
from mlflow import register_model
result = register_model(
model_uri=f"runs:/{parent_run_id}/model",
name="finalmodel",
)
print("Registered model:", result.name, "version:", result.version)This produces a registry entry like `models:/finalmodel/1`.
Serve the model as a REST API
Serve a specific version from the registry:
mlflow models serve -m "models:/finalmodel/1" -p 5001 --no-condaSend a sample request (wine dataset has 11 numeric features):
curl -s http://127.0.0.1:5001/invocations \
-H 'Content-Type: application/json' \
-d '{"instances": [[7.0, 0.27, 0.36, 20.7, 0.045, 45.0, 170.0, 1.0010, 3.00, 0.45, 8.8]]}'You can also serve directly from a run:
mlflow models serve -m "runs:/$RUN_ID/model" -p 5001 --no-condaComparing with a classic ML notebook (house prices)
The same tracking pattern works for classic ML:
import mlflow
mlflow.set_experiment("/house-prices")
with mlflow.start_run():
mlflow.log_params({"model": "RandomForest", "n_estimators": 200})
mlflow.log_metric("rmse", 0.123)
mlflow.sklearn.log_model(model, "model")Troubleshooting and gotchas
- Hyperopt distributions: `hp.loguniform` samples `exp(U(low, high))` so `low`/`high` must be logs of positive numbers. For [0, 1] ranges (like momentum), use `hp.uniform`.
- Nested runs: use `mlflow.start_run(nested=True)` for each trial so the UI keeps the tree under a single parent sweep.
- Signatures: `infer_signature` helps validation at serving time and documents expected input/outputs.
- Reproducibility: fix random seeds and log data/code versions (e.g., git commit hash tags).
- UI hiccups: refresh and confirm you’re in the right experiment.
What I’d improve next
- Use `mlflow.autolog()` to capture more metrics automatically
- Add explicit data lineage (feature store / data versioning)
- Wire CI to run selected experiments and auto-register on metric thresholds
- Use Docker for consistent serving environments
Try it yourself
1) Install and run the sweep (end-to-end script)
2) Start the UI: `mlflow ui --port 5000`
3) Serve your registered model: `mlflow models serve -m "models:/finalmodel/1" -p 5001 --no-conda`
That’s it — you now have a repeatable workflow for tracking, selecting, and serving models with MLflow.
Engineering Team
The engineering team at Originsoft Consultancy brings together decades of combined experience in software architecture, AI/ML, and cloud-native development. We are passionate about sharing knowledge and helping developers build better software.
Related Articles
Generative AI in Production Systems: What Developers Must Get Right
Moving Generative AI from demos to production is no longer about prompts. In 2026, success depends on architecture, cost discipline, observability, and trust at scale.
Beyond the Token: Is VL-JEPA the End of the LLM Era?
Meta’s VL-JEPA research challenges the foundations of modern AI. Explore why leading researchers believe token-based language models may not be the future of intelligence.
