|

Using Graphify and NetworkX to Map Python Codebase Structure with God Nodes, Communities, and Architecture Visualizations

In this tutorial, we construct a totally offline Graphify workflow that turns a sensible multi-module Python software right into a data graph. We begin by putting in Graphify and supporting graph libraries, then generate a small however linked pattern software with configuration, database, authentication, service, API, cache, mannequin, and SQL layers. We extract the graph domestically utilizing Graphify’s tree-sitter-based evaluation, so we don’t want an API key or any LLM backend. After loading the generated graph.json into NetworkX, we analyze the codebase’s construction utilizing file varieties, relationship varieties, centrality scores, group detection, and shortest paths amongst necessary symbols. Also, we create each static and interactive visualizations, making it simpler to perceive how modules, lessons, capabilities, and database objects join throughout the venture.

Installing Graphify and NetworkX

import subprocess, sys
def pip(*pkgs):
   subprocess.run([sys.executable, "-m", "pip", "install", "-q", *pkgs], examine=False)
pip("graphifyy[sql]", "pyvis", "networkx", "matplotlib")
import os, json, glob, textwrap, warnings
import networkx as nx
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")

We set up Graphify alongside with the graph evaluation and visualization libraries wanted for the tutorial. We import the required Python modules, together with NetworkX for graph processing and Matplotlib for static plotting. We additionally suppress pointless warnings so the pocket book output stays clear and targeted.

Building the Sample Codebase

ROOT = "sample_app"
os.makedirs(ROOT, exist_ok=True)
FILES = {
"config.py": '''
# Central settings object — used in all places (count on this to be a "god node").
class Settings:
   def __init__(self):
       self.db_dsn = "postgresql://localhost/app"
       self.jwt_secret = "change-me"
       self.rate_limit = 100
settings = Settings()
''',
"database.py": '''
from config import settings
class DatabasePool:
   """Connection pool. WHY: reuse sockets as a substitute of reconnecting per question."""
   def __init__(self, dsn):
       self.dsn = dsn
       self._conns = []
   def purchase(self):
       return {"dsn": self.dsn}
pool = DatabasePool(settings.db_dsn)
def get_connection():
   return pool.purchase()
''',
"fashions.py": '''
class User:
   def __init__(self, user_id, e-mail):
       self.user_id = user_id
       self.e-mail = e-mail
class Session:
   def __init__(self, consumer, token):
       self.consumer = consumer
       self.token = token
''',
"cache.py": '''
from config import settings
class RateLimiter:
   # NOTE: naive in-memory limiter; swap for Redis in prod.
   def __init__(self, restrict):
       self.restrict = restrict
       self.hits = {}
   def enable(self, key):
       self.hits[key] = self.hits.get(key, 0) + 1
       return self.hits[key] <= self.restrict
limiter = RateLimiter(settings.rate_limit)
''',
"auth.py": '''
from config import settings
from database import get_connection
from fashions import User, Session
def hash_password(uncooked):
   return f"hashed::{uncooked}"
def verify_password(uncooked, hashed):
   return hash_password(uncooked) == hashed
class AuthService:
   def __init__(self):
       self.secret = settings.jwt_secret
   def login(self, e-mail, password):
       conn = get_connection()
       consumer = User(user_id=1, e-mail=e-mail)
       return Session(consumer=consumer, token=self.secret + e-mail)
''',
"companies.py": '''
from database import get_connection
from fashions import User
from auth import AuthService
class UserService:
   def __init__(self):
       self.auth = AuthService()
   def register(self, e-mail, password):
       conn = get_connection()
       return User(user_id=2, e-mail=e-mail)
   def authenticate(self, e-mail, password):
       return self.auth.login(e-mail, password)
''',
"api.py": '''
from cache import limiter
from companies import UserService
from auth import verify_password
svc = UserService()
def signup_route(e-mail, password):
   if not limiter.enable(e-mail):
       return {"error": "fee restricted"}
   return svc.register(e-mail, password)
def login_route(e-mail, password):
   if not limiter.enable(e-mail):
       return {"error": "fee restricted"}
   return svc.authenticate(e-mail, password)
''',
"foremost.py": '''
from api import signup_route, login_route
from database import pool
def run():
   signup_route("[email protected]", "pw")
   return login_route("[email protected]", "pw")
if __name__ == "__main__":
   run()
''',
"schema.sql": '''
CREATE TABLE customers (
   user_id  SERIAL PRIMARY KEY,
   e-mail    TEXT UNIQUE NOT NULL
);
CREATE TABLE periods (
   token    TEXT PRIMARY KEY,
   user_id  INTEGER NOT NULL REFERENCES customers(user_id)
);
CREATE VIEW active_sessions AS
SELECT s.token, u.e-mail
FROM periods s JOIN customers u ON s.user_id = u.user_id;
''',
}
for title, physique in FILES.gadgets():
   with open(os.path.be part of(ROOT, title), "w") as f:
       f.write(textwrap.dedent(physique).lstrip())
print(f"Wrote {len(FILES)} recordsdata to ./{ROOT}/")

We create a sensible pattern software with a number of Python modules and one SQL schema file. We design the recordsdata to embody significant cross-module relationships, resembling imports, operate calls, service dependencies, authentication logic, database entry, and fee limiting. We then write all these recordsdata to an area sample_app listing, giving Graphify a whole mini-codebase to analyze.

Extracting the Knowledge Graph

res = subprocess.run(
   [sys.executable, "-m", "graphify", "extract", ROOT, "--no-cluster"],
   capture_output=True, textual content=True
)
print(res.stdout[-1500:] or res.stderr[-1500:])
graph_paths = glob.glob("**/graph.json", recursive=True)
assert graph_paths, "graph.json not discovered — examine the extract output above."
GRAPH_JSON = sorted(graph_paths, key=os.path.getmtime)[-1]
print("Graph file:", GRAPH_JSON)
def load_graphify(path):
   knowledge = json.load(open(path))
   ekey = "hyperlinks" if "hyperlinks" in knowledge else ("edges" if "edges" in knowledge else None)
   G = nx.DiGraph() if knowledge.get("directed") else nx.Graph()
   for n in knowledge.get("nodes", []):
       nid = n.get("id")
       G.add_node(nid, **{okay: v for okay, v in n.gadgets() if okay != "id"})
   for e in knowledge.get(ekey or "hyperlinks", []):
       G.add_edge(e.get("supply"), e.get("goal"),
                  **{okay: v for okay, v in e.gadgets() if okay not in ("supply", "goal")})
   G.graph.replace(knowledge.get("graph", {}))
   return G
G = load_graphify(GRAPH_JSON)
UG = G.to_undirected()
print(f"nGraph: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges")
def label(n):
   return G.nodes[n].get("label", str(n))

We run Graphify domestically on the generated software and extract the venture data graph with out utilizing any API key or LLM backend. We find the generated graph.json file and load it into NetworkX utilizing a version-proof node-link loader. We then convert the graph into an undirected type for simpler structural evaluation and outline a helper operate to show readable node labels.

Analyzing Centrality and Communities

from collections import Counter
ftypes  = Counter(d.get("file_type", "?") for _, d in G.nodes(knowledge=True))
rels    = Counter(d.get("relation", "?")  for *_ , d in G.edges(knowledge=True))
conf    = Counter(d.get("confidence", "?") for *_ , d in G.edges(knowledge=True))
print("nNodes by file_type :", dict(ftypes))
print("Edges by relation  :", dict(rels))
print("Edges by confidence:", dict(conf))
deg = nx.degree_centrality(UG)
btw = nx.betweenness_centrality(UG)
print("nTop 'god nodes' by diploma centrality:")
for n, c in sorted(deg.gadgets(), key=lambda x: -x[1])[:8]:
   print(f"  {label(n):<22} deg={c:.3f}  betweenness={btw.get(n,0):.3f}")
strive:
   communities = nx.group.louvain_communities(UG, seed=42)
besides Exception:
   communities = record(nx.group.greedy_modularity_communities(UG))
node_comm = {n: i for i, com in enumerate(communities) for n in com}
print(f"nDetected {len(communities)} communities:")
for i, com in enumerate(communities):
   members = ", ".be part of(sorted(label(n) for n in com))[:90]
   print(f"  Community {i}: {members}")
def discover(substr):
   for n in G.nodes:
       if substr.decrease() in label(n).decrease():
           return n
   return None
a, b = discover("api"), discover("DatabasePool")
if a and b and nx.has_path(UG, a, b):
   path = nx.shortest_path(UG, a, b)
   print(f"nPath {label(a)} -> {label(b)}:")
   print("   " + "  →  ".be part of(label(p) for p in path))

We analyze the extracted graph by summarizing node varieties, edge relationships, and confidence ranges. We compute diploma centrality and betweenness centrality to determine necessary “god nodes” that join many components of the appliance. We additionally detect communities within the graph and hint a shortest path between key parts to perceive how components of the codebase are linked.

Visualizing the Code Graph

plt.determine(figsize=(13, 9))
pos = nx.spring_layout(UG, okay=0.7, seed=42)
nx.draw_networkx_edges(UG, pos, alpha=0.25)
nx.draw_networkx_nodes(
   UG, pos,
   node_color=[node_comm.get(n, 0) for n in UG.nodes],
   node_size=[300 + 4000 * deg.get(n, 0) for n in UG.nodes],
   cmap=plt.cm.tab20, alpha=0.9,
)
prime = {n for n, _ in sorted(deg.gadgets(), key=lambda x: -x[1])[:14]}
nx.draw_networkx_labels(UG, pos, {n: label(n) for n in prime}, font_size=8)
plt.title("Graphify data graph — dimension=centrality, coloration=group")
plt.axis("off"); plt.tight_layout()
plt.savefig("graph_static.png", dpi=130); plt.present()
strive:
   from pyvis.community import Network
   web = Network(peak="650px", width="100%", bgcolor="#111", font_color="white",
                 pocket book=True, cdn_resources="in_line", directed=G.is_directed())
   palette = ["#e6194B","#3cb44b","#4363d8","#f58231","#911eb4",
              "#42d4f4","#f032e6","#bfef45","#fabed4","#469990"]
   for n, d in G.nodes(knowledge=True):
       c = node_comm.get(n, 0)
       web.add_node(n, label=label(n), title=f"{d.get('file_type','?')} · {d.get('source_file','')}",
                    coloration=palette[c % len(palette)], dimension=12 + 60 * deg.get(n, 0))
   for s, t, d in G.edges(knowledge=True):
       web.add_edge(s, t, title=d.get("relation", ""))
   web.save_graph("graph_interactive.html")
   print("nSaved interactive graph -> graph_interactive.html")
   from IPython.show import HTML, show
   show(HTML(open("graph_interactive.html").learn()))
besides Exception as e:
   print("Interactive viz skipped:", e)
for cmd in (
   ["query", "what connects auth to the database?", "--graph", GRAPH_JSON],
   ["path",  "AuthService", "DatabasePool", "--graph", GRAPH_JSON],
   ["explain", "RateLimiter", "--graph", GRAPH_JSON],
):
   print("n$ graphify " + " ".be part of(cmd))
   r = subprocess.run([sys.executable, "-m", "graphify", *cmd],
                      capture_output=True, textual content=True)
   print((r.stdout or r.stderr)[:1200])
print("nDone. Artifacts: graph_static.png, graph_interactive.html,",
     "and graphify-out/ (graph.json, GRAPH_REPORT.md).")

We visualize the data graph utilizing each static and interactive strategies. We first create a Matplotlib graph the place node dimension represents centrality and node coloration represents group membership. We then construct an interactive Pyvis visualization and run Graphify’s CLI instructions to question the graph, discover paths, and clarify chosen symbols.

Conclusion

In conclusion, now we have a whole native pipeline for changing supply code right into a helpful data graph and finding out it with graph analytics. We noticed how Graphify extracts significant relationships from a Python and SQL codebase, and we use NetworkX to determine central “god nodes,” detect communities, and hint paths between parts resembling authentication and database logic. We additionally generated visible outputs that assist us examine the structure from each a high-level and interactive perspective. This workflow offers us a pathway to cause about code construction, dependency circulate, architectural hotspots, and cross-file connections with out counting on exterior APIs, making it helpful for codebase exploration, documentation, refactoring, and software program structure evaluation.


Check out the Full Codes hereAlso, be happy to comply with us on Twitter and don’t neglect to be part of our 150k+ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to companion with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so on.? Connect with us

The put up Using Graphify and NetworkX to Map Python Codebase Structure with God Nodes, Communities, and Architecture Visualizations appeared first on MarkTechPost.

Similar Posts