Project Overview
Conversational AI systems are transforming how users interact with applications. In this project, you will build a complete chatbot pipeline that understands user intents, extracts relevant entities (slots), maintains conversation context across multiple turns, and responds appropriately. Target: Achieve over 90% intent classification accuracy and over 85% slot filling F1-score.
Intent Recognition
Classify user messages into predefined intent categories
Slot Filling
Extract entities like dates, names, locations from text
Context Management
Track conversation state across multiple turns
REST API
Deploy chatbot as a scalable API service
Learning Objectives
Technical Skills
- Build intent classification with transformers
- Implement NER for slot extraction
- Design dialogue state tracking system
- Create response templates with slot substitution
- Deploy REST API with Flask or FastAPI
NLU Concepts
- Understand intent-slot architecture
- Master BIO tagging for entity extraction
- Handle multi-turn conversation flows
- Manage fallback and error handling
- Evaluate NLU performance metrics
Problem Scenario
TechAssist Solutions
You have been hired as an NLU Engineer at TechAssist Solutions, a company building AI-powered customer support systems. The company needs a chatbot that can handle common customer queries for an e-commerce platform - including order tracking, product inquiries, returns, and FAQs. The bot must understand context and maintain coherent conversations.
"Our support team is overwhelmed with repetitive queries. We need a chatbot that can understand what customers want, extract order numbers and product names, remember context during conversations, and provide helpful responses. It should also know when to escalate to a human agent."
Technical Challenges to Solve
- How to classify diverse user queries?
- Handling ambiguous or multi-intent messages
- Confidence thresholds for fallback
- Out-of-scope detection
- Extracting order IDs, dates, product names
- Handling different entity formats
- BIO tagging for sequence labeling
- Dealing with entity variations
- Tracking conversation state
- Handling slot confirmation and correction
- Managing multi-turn flows
- Context carryover between turns
- RESTful endpoint design
- Session management
- Request/response formats
- Error handling and logging
Dataset Resources
You can use these datasets for training your intent classification and slot filling models, or create your own custom dataset for the e-commerce domain.
Recommended Datasets
Choose from these publicly available intent recognition and NLU datasets:
Your chatbot should recognize at least 10-15 intents:
- order_status - Check order tracking
- order_cancel - Cancel an order
- return_request - Request a return
- product_inquiry - Ask about products
- payment_issue - Payment problems
- shipping_info - Delivery questions
- greeting - Hello, hi, etc.
- goodbye - Bye, thanks, etc.
- human_handoff - Talk to agent
- out_of_scope - Unrelated queries
Extract these entities from user messages:
- order_id - Order numbers (e.g., #12345)
- product_name - Product references
- date - Dates and time references
- email - Email addresses
- phone - Phone numbers
- amount - Monetary values
- category - Product categories
- quantity - Number of items
Project Requirements
Your project must include all of the following components. This is a comprehensive NLU project covering the full chatbot development pipeline.
Data Preparation
Prepare training data:
- Collect or create intent-labeled utterances
- Annotate entities with BIO tagging
- Split into train/validation/test sets
- Handle class imbalance if present
- Document data statistics and distribution
Intent Classification
Build intent recognition model:
- Implement text classification model (BERT/DistilBERT)
- Train and evaluate on your dataset
- Add confidence threshold for fallback
- Handle out-of-scope detection
- Achieve over 90% accuracy on test set
Slot Filling (NER)
Build entity extraction model:
- Implement sequence labeling with BIO scheme
- Use token classification with transformers
- Handle entity normalization
- Evaluate with precision, recall, F1
- Achieve over 85% F1-score
Context Management
Implement dialogue state tracking:
- Design conversation state schema
- Track slots across multiple turns
- Handle slot confirmation and updates
- Implement conversation history
- Support context reset
Response Generation
Create response system:
- Design response templates for each intent
- Implement slot substitution in templates
- Add response variations for naturalness
- Handle missing slot prompts
- Implement fallback responses
REST API Deployment
Deploy as API service:
- Create Flask or FastAPI application
- Implement /chat endpoint
- Add session management
- Include API documentation (Swagger)
- Add logging and error handling
Intent Recognition
Intent recognition is the core NLU component that classifies user messages into predefined categories. Modern approaches use transformer-based models fine-tuned on domain-specific data.
Model Architecture Options
| Model | Parameters | Pros | Cons |
|---|---|---|---|
| DistilBERT | 66M | Fast, good accuracy, lightweight | Slightly lower accuracy than BERT |
| BERT-base | 110M | Strong baseline, well-documented | Slower inference |
| RoBERTa | 125M | Often better than BERT | More compute needed |
| ALBERT | 12M | Very lightweight | May need more tuning |
Evaluation Metrics
Accuracy
Overall correct predictions
Precision
Per-class precision
Recall
Per-class recall
F1-Score
Macro F1
Slot Filling (NER)
Slot filling extracts relevant entities from user messages using sequence labeling. The BIO (Beginning-Inside-Outside) tagging scheme is the standard approach.
BIO Tagging Scheme
| Tag | Meaning | Example |
|---|---|---|
B-entity |
Beginning of entity | "B-order_id" for first token of order ID |
I-entity |
Inside (continuation) | "I-product" for middle tokens |
O |
Outside (no entity) | Regular words not part of any entity |
Example Annotation
User Input: "I want to cancel order #12345 placed on Monday"
| I | want | to | cancel | order | #12345 | placed | on | Monday |
| O | O | O | O | O | B-order_id | O | O | B-date |
Context Management
Multi-turn conversations require tracking dialogue state across user turns. The dialogue manager maintains context, filled slots, and conversation history.
Dialogue State Components
- Active intent being processed
- Confidence score
- Intent history
- Extracted entity values
- Slot confirmation status
- Required vs optional slots
- Previous user messages
- Bot responses
- Turn counter
Conversation Flow Example
| Turn | User | Bot | State Update |
|---|---|---|---|
| 1 | "I want to check my order" | "Sure! What's your order number?" | intent: order_status, slots: {} |
| 2 | "It's #12345" | "Order #12345 is out for delivery!" | slots: {order_id: "#12345"} |
| 3 | "When will it arrive?" | "Expected delivery: Today by 5 PM" | intent: shipping_info (context carried) |
REST API Integration
Deploy your chatbot as a REST API using Flask or FastAPI. The API should handle stateful conversations with session management.
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/chat |
POST | Send message and get response |
/session/new |
POST | Create new conversation session |
/session/{id} |
GET | Get session state and history |
/session/{id}/reset |
POST | Reset conversation context |
/health |
GET | API health check |
Request/Response Format
{
"session_id": "abc123",
"message": "Check order #12345",
"metadata": {
"user_id": "user_001",
"channel": "web"
}
}
{
"session_id": "abc123",
"response": "Order #12345 is shipped!",
"intent": "order_status",
"confidence": 0.95,
"slots": {"order_id": "#12345"}
}
Submission Requirements
Create a public GitHub repository with the exact name shown below:
Required Repository Name
ai-chatbot
Required Project Structure
ai-chatbot/
├── notebooks/
│ ├── 01_data_preparation.ipynb # Data loading and preprocessing
│ ├── 02_intent_classification.ipynb # Intent model training
│ ├── 03_slot_filling.ipynb # NER model training
│ └── 04_evaluation.ipynb # Model evaluation
├── src/
│ ├── nlu/
│ │ ├── intent_classifier.py # Intent classification module
│ │ └── slot_filler.py # Entity extraction module
│ ├── dialogue/
│ │ ├── state_tracker.py # Dialogue state management
│ │ └── response_generator.py # Response templates
│ ├── api/
│ │ ├── app.py # Flask/FastAPI application
│ │ └── routes.py # API endpoints
│ └── utils.py # Helper functions
├── data/
│ ├── intents.json # Intent training data
│ └── entities.json # Entity annotations
├── models/
│ ├── intent_model/ # Saved intent classifier
│ └── ner_model/ # Saved NER model
├── tests/
│ └── test_chatbot.py # Unit tests
├── requirements.txt # Python dependencies
├── Dockerfile # Optional: containerization
└── README.md # Project documentation
README.md Required Sections
1. Project Overview
- Your full name and submission date
- Project description
- Supported intents and entities
2. Model Performance
- Intent classification accuracy
- Slot filling F1-score
- Confusion matrix
3. Architecture
- NLU pipeline diagram
- Model choices and reasoning
- Dialogue flow design
4. API Documentation
- Endpoint descriptions
- Request/response examples
- Error codes
5. Demo
- Sample conversations
- Screenshots or GIFs
- Edge case handling
6. How to Run
- Installation instructions
- API startup commands
- Testing instructions
Enter your GitHub username - we will verify your repository automatically
Grading Rubric
Your project will be graded on the following criteria. Total: 800 points.
| Criteria | Points | Description |
|---|---|---|
| Data Preparation | 75 | Quality dataset with proper annotations |
| Intent Classification | 175 | Over 90% accuracy, proper evaluation |
| Slot Filling | 150 | Over 85% F1, BIO tagging implementation |
| Context Management | 125 | Multi-turn tracking, state management |
| Response Generation | 75 | Templates, slot filling, fallbacks |
| REST API | 100 | Working endpoints, documentation |
| Documentation | 100 | README, code quality, reproducibility |
| Total | 800 |
Grading Levels
Excellent
All components, excellent performance
Good
Meets requirements, good docs
Satisfactory
Basic implementation
Needs Work
Missing components