Tech It Yourself


Monday, 10 May 2021

UML Diagrams

May 10, 2021 0

1. Main diagrams:

Behavioral Diagram

  •     Activity Diagram
  •     Use Case Diagram
  •     Timing Diagram
  •     State Machine Diagram
  •     Communication Diagram
  •     Sequence Diagram

Structural Diagram

  •     Class Diagram
  •     Object Diagram
  •     Component Diagram
  •     Composite Structure Diagram
  •     Deployment Diagram
  •     Package Diagram
  •     Profile Diagram

2. Some important diagrams:

Activity Diagram

It is generally used to describe the flow of different activities and actions. These can be both sequential and in parallel. It is used to:

  • Model workflows between/within use cases
  • Model complex workflows in operations on objects
  • Model in detail complex activities in a high level activity Diagram
Activity Diagram - Modeling a Word Processor

  • Open the word processing package.
  • Create a file.
  • Save the file under a unique name within its directory.
  • Type the document.
  • If graphics are necessary, open the graphics package, create the graphics, and paste the graphics into the document.
  • If a spreadsheet is necessary, open the spreadsheet package, create the spreadsheet, and paste the spreadsheet into the document.
  • Save the file.
  • Print a hard copy of the document.
  • Exit the word processing package.
Activity Diagram Example - Student Enrollment
  • An applicant wants to enroll in the university.
  • The applicant hands a filled out copy of Enrollment Form.
  • The registrar inspects the forms.
  • The registrar determines that the forms have been filled out properly.
  • The registrar informs student to attend in university overview presentation.
  • The registrar helps the student to enroll in seminars
  • The registrar asks the student to pay for the initial tuition.

Use Case Diagram

A cornerstone part of the system is the functional requirements that the system fulfills. Use Case diagrams are used to analyze the system’s high-level requirements. These requirements are expressed through different use cases. We notice three main components of this UML diagram:

Functional requirements – represented as use cases; a verb describing an action

Actors – they interact with the system; an actor can be a human being, an organization or an internal or external application

Relationships between actors and use cases – represented using straight arrows


A number of dependency types between use cases are defined in UML. In particular, <<extend>> and <<include>>.

  • <<extend>> is used to include optional behavior from an extending use case in an extended use case.
  • <<include>> is used to include common behavior from an included use case into a base use case in order to support re-use of common behavior. 

The example below depicts the use case UML diagram for an inventory management system. In this case, we have the owner, the supplier, the manager, the inventory clerk and the inventory inspector.

Within the circular containers, we express the actions that the actors perform. Such actions are: purchasing and paying for the stock, checking stock quality, returning the stock or distributing it. As you might have noticed, use case UML diagrams are good for showing dynamic behaviors between actors within a system, by simplifying the view of the system and not reflecting the details of implementation.

Sequence Diagram

Sequence diagrams describe the sequence of messages and interactions that happen between actors and objects. Actors or objects can be active only when needed or when another object wants to communicate with them. All communication is represented in a chronological manner.
It is used in software development to represent the architecture of the system and how the different components are interconnected

Timing diagram 

We are not interested in how the objects interact or change each other, but rather we want to represent how objects and actors act along a linear time axis.

The main components of a timing diagram are:

  • Lifeline – a line forming steps since the individual participant transits from one stage to another.
  • State timeline – a single lifeline can go through different states within a pipeline
  • Duration constraint – a time interval constraint that represents the duration of necessary for a constraint to be fulfilled
  • Time constraint – a time interval constraint during which something needs to be fulfilled by the participant
  • Destruction occurrence – a message occurrence that destroys the individual participant and depicts the end of that participant’s lifeline

The stages of human growth:

State Machine (Statechart) Diagram

Describe the different states of a component and how it changes based on internal and external events within a system.
A chess game state machine:

Statecharts find usage mainly in forward and reverse engineering of different systems.

Class Diagram

Class UML diagram is the most common diagram type for software documentation. Since most software being created nowadays is still based on the Object-Oriented Programming paradigm, using class diagrams to document the software turns out to be a common-sense solution. This happens because OOP is based on classes and the relations between them.

Class diagrams contain classes, alongside with their attributes (also referred to as data fields) and their behaviors (also referred to as member functions). 

The ‘Checkings Account' class and the ‘Savings Account' class both inherit from the more general class, ‘Account'. 

Object Diagram

Object diagrams help software developers check whether the generic abstract structure that they have created (class diagram), represents a viable structure when put into practice, i.e: when the objects of a class are instantiated. Some developers see it as a secondary level of accuracy checking.

The class 'Client' has an instance "James".
The class 'Checkings' and 'Savings' has instances Checkings and Savings account.
- 'account_number' and 'routing_number' are different for the Checkings and Savings account. It makes more sense to put those attributes in their respective classes, rather than in the more generic class 'Account'.
- 'wire_routing_number' and 'bic' are not used.

Component Diagram

Component Diagram describes the organization and wiring of the components in a system. 

Component diagrams help model implementation details and double-check that every aspect of the system's required functions is developed.

The components are less physical and more conceptual stand-alone design elements.

The components provide or require interfaces to interact with other components in the system.

A component is a logical unit block of the system, a slightly higher abstraction than classes.

The Difference Between a Package Diagram and a Component Diagram:

- Package diagram elements are always public, while component diagram elements are private. 

- Components are groups of classes that are deployed together and packages are a general grouping device for model elements. Packages can group any model elements, even things like use cases, but in practice they usually group classes, so components and packages tend to be synonymous.

Parts of a component diagram:



A full circle represents an interface created or provided by the component. A semi-circle represents a required interface (input).

Dependencies among components
A port is to help expose required and provided interfaces of a component.
Component Diagram - Online Shopping

Component Diagram for ATM

Package Diagram

It is used to show the organization and relationship of various model elements in the form of packages.

A package is a grouping of related UML elements, such as diagrams, documents, classes, or even other packages.

Parts of Package Diagram:

There are two sub-types involved in dependency. They are <<import>> & <<access>>. 
Importing a package is equivalent to importing it's all public elements. So, the visibility of import can be thought of the visibility of the elements (imported elements).
import is public
access is private

Package Diagram Example - Order Processing System
The Problem Description
Design package diagram for "Track Order" scenario for an online shopping store. Track Order module is responsible for providing tracking information for the products ordered by customers. Customer types in the tracking serial number, Track Order modules refers the system and updates the current shipping status to the customer.
Identify the packages of the system
There is a track order module, it has to talk with other module to know about the order details, let us call it "Order Details". 
Next after fetching Order Details it has to know about Shipping details, let us call that as "Shipping".
Identify the dependencies in the System
Track order should get order details from "Order Details" and "Order Details" has to know the tracking info given by the customer. Two modules are accessing each other which suffices <<access>> dual dependency

To know shipping information, "Shipping" can import "Track Order" to make the navigation easier.
Track Order dependency to UI Framework.

Read More

Thursday, 6 May 2021

mAP (mean Average Precision)

May 06, 2021 0

1. Introduction

mAP is a popular evaluation metric used for object detection (localisation and classification).

Object detection models such as SSD, YOLO make predictions of a bounding box and a class label.

2. True/False Positive for bounding box

For bounding box, we measure the overlap between the predicted bounding box and the ground truth bounding box IoU (intersection over union).

if the IoU value of prediction > IoU threshold, then we classify the prediction as True Positive (TF). On the other hand, if IoU value of prediction < IoU threshold, we classify it as False Positive (FP).

True or False Positive depends on 
IoU threshold. In the example above if using IoU threshold=0.2 then FP wil become TP.
In object detection:
True Positive: if True Positive for both classification and bounding box
False Positive: if otherwise

3. Calculate mAP
3.1 mAP
For the picture above with IoU threshold is o.5
- Average Precision (AP) is the area under the precision-recall curve.
- mAP (mean average precision) is the average of AP. 
- AP is calculated for each class and averaged to get the mAP.
The mean Average Precision or mAP score is calculated by taking the mean AP over all classes and/or overall IoU thresholds, depending on different detection challenges.

3.2 Interpolated precision

The interpolated precision, p_interp, is calculated at each recall level, r, by taking the maximum precision measured for that r.
Consider a model that predicted a dataset contains 5 apples. We collect all the predictions made for apples in all the images and rank it in descending order according to the predicted confidence level. 

The Precision at Rank 4th =  (1 + 1) / (1 + 1 + 1+ 1) = 0.5
The Recal at Rank 4th =  (1 + 1) / 5= 0.4

we smooth out the zigzag pattern:

The orange line is transformed into the green lines. We replaced the precision value for recall ȓ with the maximum precision for any recall ≥ ȓ.
We replaced all precision in [0.4:0.8] by max precision at 0.8
Pascal VOC2008 used the 11-point interpolated AP, we divide the recall value from 0 to 1.0 into 11 points — 0, 0.1, 0.2, …, 0.9 and 1.0.

AP = (1/11) * (1+1+1+1+1 +0.57+0.57+0.57+0.57+0.5+0.5)

COCO used a 101-point interpolated AP
AP75 is AP@.75 means the AP with IoU threshold=0.75
AP50 is AP@.50 means the AP with IoU threshold=5

Read More

Wednesday, 5 May 2021

Metrics to evaluate a classification model's predictions

May 05, 2021 0

1. Concepts

1.1 Confusion matrix

An NxN table that summarizes how successful a classification model's predictions were.

It shows the correlation between the label and the model's classification. 

N represents the number of classes.

In a confusion matrix, one axis is the predicted label, and one axis is the actual label

An example when N=2

actual tumors: 19
tumors correctly classified (true positives): 18
tumors incorrectly classified (false negative): 1
non-tumors: 458
non-tumors correctly classified (true negatives): 452 
non-tumors incorrectly classified (false positives): 6

1.2. True vs. False and Positive vs. Negative

Considering the example is based on The Boy Who Cried Wolf story.

Let's make the following definitions:

  • "Wolf" is a positive class.
  • "No wolf" is a negative class.

A true positive (TP) is an outcome where the model correctly predicts the positive class. 
A true negative (TN) is an outcome where the model correctly predicts the negative class.

A false positive (FP) is an outcome where the model incorrectly predicts the positive class. 
A false negative (FN) is an outcome where the model incorrectly predicts the negative class.

True/False is predicted
Positive/Negative is ground-truth

1.3 Accuracy

For Binary classification

Dataset has 100 tumor examples, 91 are benign (90 TNs and 1 FP) and 9 are malignant (1 TP and 8 FNs)
Consider the first model:
- In 9 malignant tumors, the model only correctly identifies 1 as malignant
- In 91 benign tumors, the model correctly identifies 90 as benign
Consider the second model that always predicts benign. It also has the same accuracy (91/100 correct predictions). 
The first model is no better than second model that has no predictive ability to distinguish malignant tumors from benign tumors.
=> Accuracy doesn't tell the full story when you're working with a class-imbalanced data set, where there is a significant disparity between the number of positive and negative labels.

1.3 Precision and Recall
Precision: What proportion of positive identifications was actually correct?
In all positive cases how many positive cases are correctly predicted?

when it predicts a tumor is malignant, it is correct 50% of the time.

Recall: What proportion of actual positives was identified correctly?
In all positive predicted cases how many positive cases are correctly predicted?

when it correctly identifies 11% of all malignant tumors.

improving precision typically reduces recall and vice versa.
Consider Classifying email messages as spam or not spam when changing threshold.

When the classification threshold is increased, the certainty is increased and FP is decreased so Precision is increased. In other hand, we predict more cases as Negative but the probability that is False is increased and Recall is decreased.
Q: If model A has better precision and better recall than model B, then model A is probably better.
A: In general, a model that outperforms another model on both precision and recall is likely the better model. Obviously, we'll need to make sure that comparison is being done at a precision / recall point that is useful in practice for this to be meaningful. For example, suppose our spam detection model needs to have at least 90% precision to be useful and avoid unnecessary false alarms. In this case, comparing one model at {20% precision, 99% recall} to another at {15% precision, 98% recall} is not particularly instructive, as neither model meets the 90% precision requirement. But with that caveat in mind, this is a good way to think about comparing models when using precision and recall.
1.4 ROC curve
ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds.
The 2 axes:
  • True Positive Rate
  • False Positive Rate

Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives.
To compute the points in an ROC curve, we could evaluate a classification model many times with different classification thresholds, but this would be inefficient. Fortunately, there's an efficient, sorting-based algorithm that can provide this information for us, called AUC.

How to use the ROC Curve?

We can generally use ROC curves to decide on a threshold value. The choice of threshold value will also depend on how the classifier is intended to be used. So, if the above curve was for a cancer prediction application, you want to capture the maximum number of positives (i.e., have a high TPR) and you might choose a low value of threshold like 0.16 even when the FPR is pretty high here.

This is because you really don’t want to predict “no cancer” for a person who actually has cancer. In this example, the cost of a false negative is pretty high. You are OK even if a person who doesn’t have cancer tests positive because the cost of false positive is lower than that of a false negative. This is actually what a lot of clinicians and hospitals do for such vital tests and also why a lot of clinicians do the same test for a second time if a person tests positive. (Can you think why doing so helps? Hint: Bayes Rule).

Otherwise, in a case like the criminal classifier from the previous example, we don’t want a high FPR as one of the tenets of the justice system is that we don’t want to capture any innocent people. So, in this case, we might choose our threshold as 0.82, which gives us a good recall or TPR of 0.6. That is, we can capture 60 per cent of criminals.

1.5 AUC (Area under the ROC Curve) 

AUC near to the 1 which means it has a good measure of separability. A poor model has AUC near to the 0 which means it has the worst measure of separability. Some importtant features:
  • It is threshold invariant i.e. the value of the metric doesn’t depend on a chosen threshold.
  • It is scale-invariant i.e. It measures how well predictions are ranked, rather than their absolute values.

Consider some cases when plotting the distributions of the classification probabilities.
A perfect classification when when two curves don’t overlap. It can distinguish between positive class and negative class. 

The ROC curve:
In this case the AUC = 1
When two distributions overlap
The ROC curve:

In this case the AUC is 0.7, it means there is a 70% chance that the model will be able to distinguish between positive class and negative class.
When two distributions completely overlap. This is the worst case.
The ROC curve:

In this case the AUC is approximately 0.5, the model has no discrimination capacity to distinguish between positive class and negative class.
The AUC=0
The model is predicting a negative class as a positive class and vice versa.
Finally, we have:

In a multi-class model, we can plot N number of AUC ROC Curves for N number classes using the One vs ALL methodology. So for example, If you have three classes named X, Y, and Z, you will have one ROC for X classified against Y and Z, another ROC for Y classified against X and Z, and the third one of Z classified against Y and X.

Read More
Thường mất vài phút để quảng cáo xuất hiện trên trang nhưng thỉnh thoảng, việc này có thể mất đến 1 giờ. Hãy xem hướng dẫn triển khai mã của chúng tôi để biết thêm chi tiết. Ðã xong