Technical Career Building

Why Your ML Project Looks Like a Mess (and How to Fix It)

Dr. Aleena Baby March 2026 10 min read

The folder structure that separates professional data scientists from tutorial followers. Explained for absolute beginners.

You followed a tutorial. You built a model. It works. And then you look at your project folder: 47 files, no structure, three copies of final_model_v3_new.py, and a Jupyter notebook that somehow has 200 cells. You know it works, but you also know that no hiring manager would take it seriously.

This is the gap between doing ML and doing ML professionally. And it is exactly what separates candidates who get interviews from those who do not.

Why You Need Structure (Even If You Are Working Alone)

Most beginners think: “I am the only one working on this. Why do I need folders and structure?”

Here is why:

You are not the only person working on this. Future you is a different person. In 3 months, you will not remember why you did something. In 6 months, you will not remember where the data came from. In 12 months, you will be starting from scratch because nothing is documented. Structure is how you communicate with future you.

No one will take your project seriously without it. You are applying for ML jobs. The interviewer asks: “Show me a project.” You send them a GitHub repo with 47 files in one folder. They close the tab. Why? Because they know that project cannot be deployed, cannot be maintained, cannot be trusted. Professional ML projects have structure. Always.

You cannot deploy chaos. The moment you want to move from Jupyter notebook to production, you need a way to track which data was used. You need a way to reproduce the exact model. You need a way to serve predictions to users. You need a way to monitor when things break. None of this works if your files are named final_v3_new.py.

Structure is not optional. It is the difference between a tutorial and a real project.

The Folder Structure That Actually Works

Here is the standard layout used in production ML projects. Every folder has a specific purpose. You do not need to memorize this. You need to understand why each piece exists.

ml-project/
├── .github/
│ └── workflows/
│ ├── ci.yml
│ ├── train.yml
│ └── deploy.yml
├── configs/
│ ├── params.yaml
│ ├── logging.yaml
│ └── infrastructure/
├── data/
│ ├── raw/
│ ├── processed/
│ └── external/
├── docs/
├── notebooks/
├── src/
│ ├── data/
│ ├── features/
│ ├── models/
│ └── evaluation/
├── tests/
├── artifacts/
├── README.md
├── Dockerfile
├── requirements.txt
└── pyproject.toml

Let us walk through each one.

The `.github/` Folder: Automation That Saves Your Life

This is where you put automated workflows. The reason you need this is so you do not have to manually test, train, or deploy every time you make a change.

Inside the .github/ folder, you will have a workflows/ subfolder with files like ci.yml, train.yml, and deploy.yml.

What Is a `.yml` File?

A .yml file (or .yaml) stands for YAML, which means “Yet Another Markup Language.” It is a configuration file written in a human-readable format. Think of it like a recipe: step 1, do this. Step 2, then do this. Step 3, finally do this.

In ML projects, YAML files tell computers what to do automatically.

Here is what each workflow file does:

ci.yml. Continuous Integration. Runs every time you push code. Checks whether your code has errors, whether tests pass, and whether it is formatted correctly. It catches bugs before you deploy.
train.yml. Training Pipeline. Automatically retrains your model when data changes, logs results to your tracking system like MLflow, and saves you from the disaster of forgetting to retrain after updating the data.
deploy.yml. Deployment. Builds your model into a container, pushes it to a server, and makes it available for predictions.

Without automation, you manually test everything. You manually retrain. You manually deploy. That is how bugs slip through. That is how models get deployed with old data. With .github/workflows/, the computer does it for you every single time.

The `configs/` Folder: Settings You Will Change Often

This is where you store all your project settings. The reason you need this is so you do not hardcode values in 50 different files.

Inside configs/, you will typically have:

params.yaml. Model hyperparameters, data paths, feature lists, training parameters. When you want to change the learning rate or batch size, you change it in one place, not across ten scripts.
logging.yaml. Logging configuration. Where logs go, what level of detail to capture, how to format them.
infrastructure/. Docker configs, cloud deployment settings, anything related to where and how the project runs.

The rule is simple: if a value might change between experiments, environments, or team members, it goes in configs/, not in your code.

The `data/` Folder: Where Your Data Lives

Never scatter data files across your project. All data goes here, organized into three subfolders:

raw/. Original, untouched data as you received it. Never modify files in this folder. This is your source of truth.
processed/. Cleaned, transformed, feature-engineered data ready for modeling. This is what your training script reads.
external/. Third-party data, reference datasets, lookup tables.

Important: Never commit large data files to Git. Use .gitignore to exclude the data/ folder, and document where the data comes from in your README. For version control of data, tools like DVC (Data Version Control) are the standard.

The `src/` Folder: Your Actual Code

This is the core of your project. All Python source code lives here, organized by function:

src/data/. Scripts for loading, cleaning, and preprocessing data.
src/features/. Feature engineering and transformation logic.
src/models/. Model definition, training, prediction, and serialization.
src/evaluation/. Metrics calculation, model comparison, validation logic.

The key principle: each file does one thing. train.py trains the model. predict.py serves predictions. evaluate.py computes metrics. If a file is doing three different things, split it.

The `notebooks/` Folder: Exploration Only

Jupyter notebooks are for exploration and prototyping. They are not production code. Use them for EDA (exploratory data analysis), quick experiments, and visualization. But once something works, move it to src/ as a proper Python module.

A common mistake: building an entire ML pipeline inside a notebook. It works in development, but it cannot be tested, cannot be versioned properly, and cannot be deployed. Notebooks are the sketchpad. src/ is the finished product.

The `tests/` Folder: Proof That Your Code Works

Most beginners skip testing entirely. In industry, untested ML code is a liability. Your tests/ folder should include:

Unit tests for your data processing functions
Tests that verify your model can train and predict without errors
Tests that check data schema and types
Integration tests that run the full pipeline end-to-end

You do not need 100% test coverage. But a hiring manager who sees a tests/ folder in your GitHub repo immediately knows you understand how production code works.

The `artifacts/` Folder: Model Outputs

Trained models, serialized objects, and experiment logs go here. Like data/, this folder should be in your .gitignore. You do not commit large binary files to Git. But the folder should exist in your structure to show where outputs live.

The `README.md`: Your Project's First Impression

This is the first thing anyone sees when they visit your GitHub repo. A good README for an ML project should include:

What the project does (one paragraph)
How to set it up (installation steps)
How to run it (training, prediction, evaluation)
Project structure overview
Results and metrics
Technologies used

A well-written README is the difference between a recruiter spending 10 seconds on your repo and spending 2 minutes. Those 2 minutes are what get you an interview.

How to Fix Your Project This Weekend

You do not need to rebuild from scratch. Here is the minimal action plan:

Create the folder structure above in your existing project.
Move your scripts into src/, organized by function.
Extract config values from your scripts into configs/params.yaml.
Move notebooks into notebooks/ and label them clearly (e.g., 01-eda.ipynb, 02-feature-exploration.ipynb).
Write a README that explains what the project does and how to run it.
Add a .gitignore that excludes data/, artifacts/, .env, and __pycache__/.
Write one test: just one. Test that your data loading function returns the expected shape.

That is it. Seven steps. You can do all of this in one weekend. And when a recruiter opens your GitHub, they will see a project that looks like it was built by someone who knows what they are doing.

This Is What the ML4 Sprint Builds

If you want to go beyond structure and build a full production-ready ML pipeline (from raw data to a deployed model with experiment tracking, containerization, and a live API endpoint) that is exactly what the ML4 Sprint delivers in 4 days.

You do not just learn the theory. You build the project, deploy it, and walk away with a portfolio piece that demonstrates production ML skills to any hiring manager in Germany or Europe.

Build a Production ML Project in 4 Days

The ML4 Sprint gives you a deployed, portfolio-ready ML pipeline with FastAPI, Docker, and MLflow. One project. Four days. Real deployment.

Learn About the ML4 Sprint

Why Your ML Project Looks Like a Mess (and How to Fix It)

Why You Need Structure (Even If You Are Working Alone)

The Folder Structure That Actually Works

The `.github/` Folder: Automation That Saves Your Life

What Is a `.yml` File?

The `configs/` Folder: Settings You Will Change Often

The `data/` Folder: Where Your Data Lives

The `src/` Folder: Your Actual Code

The `notebooks/` Folder: Exploration Only

The `tests/` Folder: Proof That Your Code Works

The `artifacts/` Folder: Model Outputs

The `README.md`: Your Project's First Impression

How to Fix Your Project This Weekend

This Is What the ML4 Sprint Builds

Build a Production ML Project in 4 Days

Related Articles

Machine Learning for Beginners: A Complete Learning Path to Get Job-Ready

How to Build a Data Science Portfolio That Gets Interviews in Europe

How to Prepare for a Data Science Interview in Germany (PhD Edition)

Why You Need Structure (Even If You Are Working Alone)

The Folder Structure That Actually Works

The .github/ Folder: Automation That Saves Your Life

What Is a .yml File?

The configs/ Folder: Settings You Will Change Often

The data/ Folder: Where Your Data Lives

The src/ Folder: Your Actual Code

The notebooks/ Folder: Exploration Only

The tests/ Folder: Proof That Your Code Works

The artifacts/ Folder: Model Outputs

The README.md: Your Project's First Impression

How to Fix Your Project This Weekend

This Is What the ML4 Sprint Builds

Build a Production ML Project in 4 Days

Related Articles

Machine Learning for Beginners: A Complete Learning Path to Get Job-Ready

How to Build a Data Science Portfolio That Gets Interviews in Europe

How to Prepare for a Data Science Interview in Germany (PhD Edition)

Get More on Substack

The `.github/` Folder: Automation That Saves Your Life

What Is a `.yml` File?

The `configs/` Folder: Settings You Will Change Often

The `data/` Folder: Where Your Data Lives

The `src/` Folder: Your Actual Code

The `notebooks/` Folder: Exploration Only

The `tests/` Folder: Proof That Your Code Works

The `artifacts/` Folder: Model Outputs

The `README.md`: Your Project's First Impression