A Comprehensive Implementation Guide to ModelScope for Model Search, Inference, Fine-Tuning, Evaluation, and Export

In this tutorial, we discover ModelScope by a sensible, end-to-end workflow that runs easily on Colab. We start by organising the atmosphere, verifying dependencies, and confirming GPU availability so we are able to work with the framework reliably from the beginning. From there, we work together with the ModelScope Hub to search for fashions, obtain snapshots, load datasets, and perceive how its ecosystem connects with acquainted instruments similar to Hugging Face Transformers. As we transfer ahead, we apply pretrained pipelines throughout NLP and laptop imaginative and prescient duties, then fine-tune a sentiment classifier on IMDB, consider its efficiency, and export it for deployment. Through this course of, we construct not solely a working implementation but additionally a transparent understanding of how ModelScope can help analysis, experimentation, and production-oriented AI workflows.

Copy Code

!pip set up -q addict simplejson yapf gast oss2 sortedcontainers requests
!pip set up -q modelscope transformers>=4.37.0 datasets torch torchvision 
   speed up scikit-learn sentencepiece Pillow matplotlib consider optimum[exporters]


import torch, os, sys, json, warnings, numpy as np
warnings.filterwarnings("ignore")


import addict; print(" addict OK")


print(f"PyTorch: {torch.__version__}")
print(f"CUDA obtainable: {torch.cuda.is_available()}")
if torch.cuda.is_available():
   print(f"GPU: {torch.cuda.get_device_name(0)}")


import modelscope
print(f"ModelScope: {modelscope.__version__}")


DEVICE = 0 if torch.cuda.is_available() else -1




from modelscope import snapshot_download
from modelscope.hub.api import HubApi


api = HubApi()
print("n Searching ModelScope Hub for 'bert' fashions...n")
strive:
   fashions = api.list_models(filter_dict={"Search": "bert"}, type="StarCount")
   for i, m in enumerate(fashions):
       if i >= 5:
           break
       print(f"  • {m.get('Name', m.get('id', 'N/A'))}")
besides Exception as e:
   print(f"  (Hub search could also be unavailable exterior China — {e})")


model_dir = snapshot_download(
   "AI-ModelScope/bert-base-uncased",
   cache_dir="./ms_cache",
)
print(f"n Model downloaded to: {model_dir}")
print("   Files:", os.listdir(model_dir)[:8])




from modelscope.msdatasets import MsDataset


print("n Loading 'imdb' dataset...n")
strive:
   ds = MsDataset.load("imdb", break up="prepare")
   print(f"  Dataset dimension: {len(ds)} samples")
   pattern = subsequent(iter(ds))
   print(f"  Keys: {record(pattern.keys())}")
   print(f"  Text preview: {pattern['text'][:120]}...")
   print(f"  Label: {pattern['label']} (0=neg, 1=pos)")
besides Exception as e:
   print(f"  Falling again to HuggingFace datasets: {e}")
   from datasets import load_dataset
   ds = load_dataset("imdb", break up="prepare")
   print(f"  Dataset dimension: {len(ds)} samples")


labels = [row["label"] for row in ds]
print("n  Label distribution:")
for label in sorted(set(labels)):
   depend = labels.depend(label)
   print(f"    Label {label}: {depend} ({depend/len(labels)*100:.1f}%)")

We arrange the entire Colab atmosphere and set up all of the libraries required for the tutorial. We confirm vital dependencies similar to addict, verify the PyTorch and CUDA setup, and affirm that ModelScope is put in appropriately earlier than transferring ahead. We then start working with the ModelScope ecosystem by looking the hub for BERT fashions, downloading a mannequin snapshot regionally, loading the IMDB dataset, and inspecting its label distribution to perceive the information we are going to use later.

Copy Code

from transformers import pipeline as hf_pipeline


print("n NLP PIPELINESn")


print("── 4a. Sentiment Analysis ──")
sentiment = hf_pipeline(
   "sentiment-analysis",
   mannequin="distilbert-base-uncased-finetuned-sst-2-english",
   gadget=DEVICE,
)


test_texts = [
   "ModelScope makes AI model access incredibly easy and intuitive!",
   "The documentation was confusing and the API kept returning errors.",
   "The weather today is partly cloudy with a slight breeze.",
]


for textual content in test_texts:
   end result = sentiment(textual content)[0]
   emoji = "" if end result["label"] == "POSITIVE" else ""
   print(f'  {emoji} {end result["label"]} ({end result["score"]:.4f}): "{textual content[:60]}..."')




print("n── 4b. Named Entity Recognition ──")
ner = hf_pipeline(
   "ner",
   mannequin="dbmdz/bert-large-cased-finetuned-conll03-english",
   aggregation_strategy="easy",
   gadget=DEVICE,
)


ner_text = "Alibaba's ModelScope platform was developed in Hangzhou, China and competes with Hugging Face."
entities = ner(ner_text)
for ent in entities:
   print(f'    {ent["word"]} → {ent["entity_group"]} (rating: {ent["score"]:.3f})')




print("n── 4c. Zero-Shot Classification ──")
zsc = hf_pipeline(
   "zero-shot-classification",
   mannequin="fb/bart-large-mnli",
   gadget=DEVICE,
)


zsc_result = zsc(
   "ModelScope offers pretrained fashions for NLP, CV, and audio duties.",
   candidate_labels=["technology", "sports", "politics", "science"],
)
for label, rating in zip(zsc_result["labels"], zsc_result["scores"]):
   bar = "█" * int(rating * 30)
   print(f"  {label:<12} {rating:.3f} {bar}")




print("n── 4d. Text Generation (GPT-2) ──")
generator = hf_pipeline(
   "text-generation",
   mannequin="gpt2",
   gadget=DEVICE,
)


gen_output = generator(
   "The way forward for open-source AI is",
   max_new_tokens=60,
   do_sample=True,
   temperature=0.8,
   top_p=0.9,
   num_return_sequences=1,
)
print(f"   {gen_output[0]['generated_text']}")




print("n── 4e. Fill-Mask (BERT) ──")
fill_mask = hf_pipeline(
   "fill-mask",
   mannequin=model_dir,
   gadget=DEVICE,
)


mask_results = fill_mask("ModelScope is an open-source [MASK] for AI fashions.")
for r in mask_results[:5]:
   print(f"    [MASK] → '{r['token_str']}' (rating: {r['score']:.4f})")

We concentrate on pure language processing pipelines and discover how simply we are able to run a number of duties with pretrained fashions. We carry out sentiment evaluation, named entity recognition, zero-shot classification, textual content era, and fill-mask prediction, offering a broad view of ModelScope-compatible inference workflows. As we take a look at these duties on pattern inputs, we see how shortly we are able to transfer from uncooked textual content to significant mannequin outputs in a unified pipeline.

Copy Code

print("n  COMPUTER VISION PIPELINESn")


print("── 5a. Image Classification (ViT) ──")
img_classifier = hf_pipeline(
   "image-classification",
   mannequin="google/vit-base-patch16-224",
   gadget=DEVICE,
)


img_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/major/pipeline-cat-chonk.jpeg"
img_results = img_classifier(img_url)


for r in img_results[:5]:
   print(f"    {r['label']:<30} ({r['score']:.4f})")




print("n── 5b. Object Detection (DETR) ──")
detector = hf_pipeline(
   "object-detection",
   mannequin="fb/detr-resnet-50",
   gadget=DEVICE,
)


detections = detector(img_url)
for d in detections[:5]:
   field = d["box"]
   print(f"   {d['label']:<15} rating={d['score']:.3f}  field=({field['xmin']:.0f},{field['ymin']:.0f},{field['xmax']:.0f},{field['ymax']:.0f})")




print("n── 5c. Visualising Detections ──")
from PIL import Image, ImageDraw
import requests, matplotlib.pyplot as plt
from io import BytesIO


img = Image.open(BytesIO(requests.get(img_url).content material))
draw = ImageDraw.Draw(img)
colours = ["#58a6ff", "#3fb950", "#d2a8ff", "#f78166", "#ff7b72"]


for i, d in enumerate(detections[:5]):
   field = d["box"]
   colour = colours[i % len(colors)]
   draw.rectangle([box["xmin"], field["ymin"], field["xmax"], field["ymax"]], define=colour, width=3)
   draw.textual content((field["xmin"]+4, field["ymin"]+2), f"{d['label']} {d['score']:.2f}", fill=colour)


plt.determine(figsize=(10, 7))
plt.imshow(img)
plt.axis("off")
plt.title("DETR Object Detection")
plt.tight_layout()
plt.savefig("detection_result.png", dpi=150, bbox_inches="tight")
plt.present()
print("   Saved detection_result.png")




print("n HUGGINGFACE INTEROPn")


from transformers import AutoTokenizer, AutoModelForSequenceClassification


print("── Approach A: snapshot_download (works for fashions on ModelScope Hub) ──")
print(f"  We already downloaded bert-base-uncased in Section 2: {model_dir}")


print("n── Approach B: Direct HF loading (works globally for any HF mannequin) ──")


hf_model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(hf_model_name)
mannequin = AutoModelForSequenceClassification.from_pretrained(hf_model_name)
mannequin.eval()
print(f"   Loaded '{hf_model_name}' straight from HuggingFace")


print("n── Manual inference with out pipeline ──")
texts = [
   "This open-source framework is a game changer for researchers!",
   "I encountered multiple bugs during installation.",
]


inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")


with torch.no_grad():
   outputs = mannequin(**inputs)
   probs = torch.softmax(outputs.logits, dim=-1)


id2label = mannequin.config.id2label
for textual content, prob in zip(texts, probs):
   pred_id = prob.argmax().merchandise()
   print(f"  ✦ {id2label[pred_id]} ({prob[pred_id]:.4f}): '{textual content[:55]}...'")


print("n── Loading Section 2's ModelScope-downloaded BERT with Transformers ──")
ms_tokenizer = AutoTokenizer.from_pretrained(model_dir)
ms_model = AutoModelForSequenceClassification.from_pretrained(
   model_dir, num_labels=2, ignore_mismatched_sizes=True
)
print(f"   bert-base-uncased from ModelScope loaded into Transformers AutoModel")
print(f"     Vocab dimension: {ms_tokenizer.vocab_size}, Hidden: {ms_model.config.hidden_size}")
del ms_model

We shift from textual content to laptop imaginative and prescient and run picture classification and object detection on a pattern picture. We additionally visualize the detection outcomes by drawing bounding packing containers and labels, which helps us examine the mannequin’s predictions extra intuitively and virtually. After that, we discover Hugging Face interoperability by loading fashions and tokenizers straight, performing handbook inference, and demonstrating {that a} mannequin downloaded from ModelScope may also be used seamlessly with Transformers.

Copy Code

print("n FINE-TUNING (DistilBERT on IMDB subset)n")


from datasets import load_dataset
from transformers import (
   AutoTokenizer,
   AutoModelForSequenceClassification,
   TrainingArguments,
   Trainer,
   DataCollatorWithPadding,
)
import consider


print("  Loading IMDB subset...")
full_train = load_dataset("imdb", break up="prepare").shuffle(seed=42)
full_test  = load_dataset("imdb", break up="take a look at").shuffle(seed=42)
train_ds = full_train.choose(vary(1000))
eval_ds  = full_test.choose(vary(500))
print(f"  Train: {len(train_ds)}, Eval: {len(eval_ds)}")


ckpt = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(ckpt)


def tokenize_fn(batch):
   return tokenizer(batch["text"], truncation=True, max_length=256)


train_ds = train_ds.map(tokenize_fn, batched=True)
eval_ds  = eval_ds.map(tokenize_fn, batched=True)


mannequin = AutoModelForSequenceClassification.from_pretrained(
   ckpt,
   num_labels=2,
   id2label={0: "NEGATIVE", 1: "POSITIVE"},
   label2id={"NEGATIVE": 0, "POSITIVE": 1},
)


accuracy_metric = consider.load("accuracy")
f1_metric = consider.load("f1")


def compute_metrics(eval_pred):
   logits, labels = eval_pred
   preds = np.argmax(logits, axis=-1)
   acc = accuracy_metric.compute(predictions=preds, references=labels)
   f1 = f1_metric.compute(predictions=preds, references=labels, common="weighted")
   return {**acc, **f1}


training_args = TrainingArguments(
   output_dir="./ms_finetuned_model",
   num_train_epochs=2,
   per_device_train_batch_size=16,
   per_device_eval_batch_size=32,
   learning_rate=2e-5,
   weight_decay=0.01,
   eval_strategy="epoch",
   save_strategy="epoch",
   load_best_model_at_end=True,
   metric_for_best_model="accuracy",
   logging_steps=50,
   report_to="none",
   fp16=torch.cuda.is_available(),
   dataloader_num_workers=2,
)


coach = Trainer(
   mannequin=mannequin,
   args=training_args,
   train_dataset=train_ds,
   eval_dataset=eval_ds,
   processing_class=tokenizer,
   data_collator=DataCollatorWithPadding(tokenizer),
   compute_metrics=compute_metrics,
)


print("   Starting coaching...n")
train_result = coach.prepare()
print(f"n   Training full!")
print(f"     Train loss: {train_result.training_loss:.4f}")
print(f"     Train time: {train_result.metrics['train_runtime']:.1f}s")

We transfer into fine-tuning by getting ready a smaller IMDB subset in order that coaching stays sensible inside Google Colab. We tokenize the textual content, load a pretrained DistilBERT classification mannequin, outline analysis metrics, and configure the coaching course of with appropriate arguments for a light-weight however life like demonstration. We then launch coaching and observe how a pretrained checkpoint is tailored right into a task-specific sentiment classifier by the Trainer workflow.

Copy Code

print("n MODEL EVALUATIONn")


eval_results = coach.consider()
print("  Evaluation Results:")
for key, worth in eval_results.gadgets():
   if isinstance(worth, float):
       print(f"    {key:<25}: {worth:.4f}")


from sklearn.metrics import classification_report, confusion_matrix


preds_output = coach.predict(eval_ds)
preds = np.argmax(preds_output.predictions, axis=-1)
labels = preds_output.label_ids


print("n  Classification Report:")
print(classification_report(labels, preds, target_names=["NEGATIVE", "POSITIVE"]))


cm = confusion_matrix(labels, preds)
fig, ax = plt.subplots(figsize=(5, 4))
im = ax.imshow(cm, cmap="Blues")
ax.set_xticks([0, 1]); ax.set_yticks([0, 1])
ax.set_xticklabels(["NEGATIVE", "POSITIVE"])
ax.set_yticklabels(["NEGATIVE", "POSITIVE"])
ax.set_xlabel("Predicted"); ax.set_ylabel("Actual")
ax.set_title("Confusion Matrix — Fine-Tuned DistilBERT")
for i in vary(2):
   for j in vary(2):
       ax.textual content(j, i, str(cm[i, j]), ha="middle", va="middle",
               colour="white" if cm[i, j] > cm.max()/2 else "black", fontsize=18)
plt.colorbar(im)
plt.tight_layout()
plt.savefig("confusion_matrix.png", dpi=150)
plt.present()
print("   Saved confusion_matrix.png")


print("n── Testing Fine-Tuned Model on New Inputs ──")
ft_pipeline = hf_pipeline(
   "sentiment-analysis",
   mannequin=coach.mannequin,
   tokenizer=tokenizer,
   gadget=DEVICE,
)


new_reviews = [
   "An absolutely breathtaking masterpiece with brilliant performances!",
   "Waste of two hours. Terrible script and wooden acting.",
   "Decent popcorn movie but nothing special. Had some fun moments.",
]


for evaluate in new_reviews:
   res = ft_pipeline(evaluate)[0]
   emoji = "" if res["label"] == "POSITIVE" else ""
   print(f'  {emoji} {res["label"]} ({res["score"]:.4f}): "{evaluate}"')




print("n EXPORTING THE FINE-TUNED MODELn")


save_path = "./ms_finetuned_model/last"
coach.save_model(save_path)
tokenizer.save_pretrained(save_path)
print(f"   Model saved to: {save_path}")
print(f"     Files: {os.listdir(save_path)}")


print("n── ONNX Export ──")
strive:
   from optimum.exporters.onnx import main_export
   onnx_path = "./ms_finetuned_model/onnx"
   main_export(save_path, output=onnx_path, process="text-classification")
   print(f"   ONNX mannequin exported to: {onnx_path}")
   print(f"     Files: {os.listdir(onnx_path)}")
besides Exception as e:
   print(f"    ONNX export skipped: {e}")


print("""
── Upload to ModelScope Hub (handbook step) ──


 1. Get a token from https://modelscope.cn/my/myaccesstoken
 2. Run:


    from modelscope.hub.api import HubApi
    api = HubApi()
    api.login('YOUR_TOKEN')
    api.push_model(
        model_id='your-username/my-finetuned-distilbert',
        model_dir='./ms_finetuned_model/last',
    )
""")


print("""
╔══════════════════════════════════════════════════════════════════╗
║                     TUTORIAL COMPLETE!                      ║
╠══════════════════════════════════════════════════════════════════╣
║  ✓ ModelScope Hub — search, browse & obtain fashions            ║
║  ✓ MsDataset — load datasets from the ModelScope ecosystem      ║
║  ✓ NLP pipelines — sentiment, NER, zero-shot, era, masks  ║
║  ✓ CV pipelines — picture classification, object detection, viz   ║
║  ✓ HuggingFace interop — snapshot_download + Transformers       ║
║  ✓ Fine-tuning — DistilBERT on IMDB with Trainer API            ║
║  ✓ Evaluation — accuracy, F1, confusion matrix                  ║
║  ✓ Export — native save, ONNX, Hub add                        ║
╚══════════════════════════════════════════════════════════════════╝
""")

We consider the fine-tuned mannequin intimately and examine its efficiency utilizing normal metrics, a classification report, and a confusion matrix. We additionally take a look at the skilled mannequin on recent evaluate examples to see the way it behaves on unseen inputs in a sensible inference setting. Also, we save the mannequin regionally, export it to ONNX when potential, and evaluate how we are able to add the ultimate checkpoint to the ModelScope Hub for sharing and deployment.

In conclusion, we constructed a whole, hands-on pipeline that demonstrates how ModelScope matches into an actual machine studying workflow relatively than serving solely as a mannequin repository. We searched and downloaded fashions, loaded datasets, ran inference throughout NLP and imaginative and prescient duties, linked ModelScope property with Transformers, fine-tuned a textual content classifier, evaluated it with significant metrics, and exported it for later use. By going by every stage of the code, we noticed how the framework helps each experimentation and sensible deployment, whereas additionally offering flexibility by interoperability with the broader Hugging Face ecosystem. In the top, we got here away with a reusable Colab-ready workflow and a a lot stronger understanding of how to use ModelScope as a severe toolkit for constructing, testing, and sharing AI programs.

Check out the Full Codes here. Also, be happy to comply with us on Twitter and don’t overlook to be part of our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to accomplice with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so on.? Connect with us

The put up A Comprehensive Implementation Guide to ModelScope for Model Search, Inference, Fine-Tuning, Evaluation, and Export appeared first on MarkTechPost.