|

Predicting Properties Prices in Mumbai with ML: A Hands On Guide

Can machine learning predict Mumbai’s skyrocketing property prices? Let’s find out with a step-by-step hands-on project inspired by Hands-On Machine Learning (Chapter 2).

You can download the kaggle dataset from here.

Dataset Introduction –

This dataset gives a detailed look into Mumbai’s real estate market with 12,685 property listings collected over 10 months. It covers:

  • 🏢 Property details — possession status, floor information, pricing.
  • 🏗️ Project specifics — ongoing and completed developments across Mumbai.
  • 👷 Developer insights — names, approved authorities, and unique property features.
  • 🛠️ Maintenance and booking — charges, booking amounts, covered and carpet areas.
  • 💡 Utilities and amenities — electricity, water status, lift availability, and more.

Data Cleaning –

We start by importing the dataset (properties.csv) using Pandas. This dataset contains 145 columns, but not all of them are useful for building a prediction model. To avoid overloading the model and to reduce noise in the data, we’ll focus on a smaller set of relevant features.

For this initial selection, we keep columns like:

  • 💰 Price: the target variable we want to predict.
  • 📏 Covered Area and Carpet Area: for property size.
  • 🏢 Floor details:Floor No, total floors.
  • 🛏️ bedroom, 🛁 Bathroom, 🪟 balconies, and 🚪 Lift: basic features of the property.
  • 🌆 City, ⏳ Possession Status, 📜 Ownership Type, 🛋️ furnished Type, and 🏪 Commercial: for categorical attributes that may influence price.
import pandas as pd
import os

def load_housing_mumbai(housing_path):
csv_path = os.path.join(housing_path, "properties.csv")
return pd.read_csv(csv_path)
HOUSING_PATH = "datasets"
properties = load_housing_mumbai(HOUSING_PATH)
properties.head()

We copy these columns into a new DataFrame called properties_cleaned and overwrite properties with it for easier handling in later steps.

important_columns = [
"Price", "Covered Area", "Carpet Area", "Floor No", "floors",
"bedroom", "Bathroom", "balconies", "Lift",
"City", "Possession Status", "Ownership Type", "furnished Type", "Commercial"
]

properties_cleaned = properties[important_columns].copy()
properties = properties_cleaned

Let’s take a sample of how the data looks like.
properties.sample(10)

Predicting Properties Prices in Mumbai with ML: A Hands On Guide was originally published in Artificial Intelligence in Plain English on Medium, where people are continuing the conversation by highlighting and responding to this story.

Similar Posts