6 min readJun 6, 2021

CYBER Attack Detection Based On CONFUSION MATRIX

HELLO FOLKS…

Cyber attack is becoming a critical issue of organizational information systems. A number of cyber attack detection and classification methods have been introduced with different levels of success that is used as a countermeasure to preserve data integrity and system availability from attacks. The classification of attacks against computer network is becoming a harder problem to solve in the field of network security.

What is cybercrime?

Cybercrime, also called computer crime, the use of a computer as an instrument to further illegal ends, such as committing fraud, trafficking in child pornography and intellectual property, stealing identities, or violating privacy. Cybercrime, especially through the Internet, has grown in importance as the computer has become central to commerce, entertainment, and government.

Types of cybercrime

Here are some specific examples of the different types of cybercrime:

Email and internet fraud.
Identity fraud (where personal information is stolen and used).
Theft of financial or card payment data.
Theft and sale of corporate data.
Cyberextortion (demanding money to prevent a threatened attack).
Ransomware attacks (a type of cyberextortion).
Cryptojacking (where hackers mine cryptocurrency using resources they do not own).
Cyberespionage (where hackers access government or company)

What is a Confusion Matrix?

A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. This gives us a holistic view of how well our classification model is performing and what kinds of errors it is making.

For a binary classification problem, we would have a 2 x 2 matrix as shown below with 4 values:

Let’s decipher the matrix:

The target variable has two values: Positive or Negative
The columns represent the actual values of the target variable
The rows represent the predicted values of the target variable

Understanding True Positive, True Negative, False Positive and False Negative in a Confusion Matrix

True Positive (TP)

The predicted value matches the actual value
The actual value was positive and the model predicted a positive value

True Negative (TN)

The predicted value matches the actual value
The actual value was negative and the model predicted a negative value

False Positive (FP) — Type 1 error

The predicted value was falsely predicted
The actual value was negative but the model predicted a positive value
Also known as the Type 1 error

False Negative (FN) — Type 2 error

The predicted value was falsely predicted
The actual value was positive but the model predicted a negative value
Also known as the Type 2 error

Cyber Attack Detection and Classification using Parallel Support Vector Machine

Support Vector Machines (SVM) are the classifiers that were originally designed for binary c1assification. The c1assificatioin applications can solve multi-class problems. The result shows that pSVM gives more detection accuracy for classes and comparable to the false alarm rate.

Cyberattack detection is a classification problem, in which we classify the normal pattern from the abnormal pattern (attack) of the system.

The SDF is a very powerful and popular data mining algorithm for decision-making and classification problems. It has been using in many real-life applications like medical diagnosis, radar signal classification, weather prediction, credit approval, and fraud detection, etc.

A parallel Support Vector Machine (pSVM) algorithm was proposed for the detection and classification of cyber attack datasets.

The performance of the support vector machine is greatly dependent on the kernel function used by SVM. Therefore, we modified the Gaussian kernel function in a data-dependent way in order to improve the efficiency of the classifiers. The relative results of both the classifiers are also obtained to ascertain the theoretical aspects. The analysis is also taken up to show that PSVM performs better than SDF.

The classification accuracy of PSVM remarkably improve (accuracy for Normal class as well as DOS class is almost 100%) and comparable to false alarm rate and training, testing times.

KDD CUP ‘’99 Data Set Description

In the 1998 DARPA intrusion detection evaluation program [6], an environment was setup to acquire raw TCP/IP dump data for a network by simulating a typical U.S. Air Force LAN. The LAN was operated like a true environment, but being blasted with multiple attacks. For each TCP/IP connection, 41 various quantitative (continuous data type) and qualitative (discrete data type) features were extracted among the 41 features, 34 features are numeric and 7 features are symbolic. The data contains 24 attack types that could be classified into four main categories:

DOS: Denial of Service attack.
• R2L: Remote to Local (User) attack.
• U2R: User to Root attack.
• Probing: Surveillance and other probing.

A. Denial of service Attack (DOS)

Denial of service (DOS) is class of attack where an attacker makes a computing or memory resource too busy or too full to handle legitimate requests, thus denying legitimate user access to a machine.

B. Remote to Local (User) Attacks

A remote to local (R2L) attack is a class of attacks where an attacker sends packets to a machine over network, then exploits the machine’s vulnerability to illegally gain local access to a machine

C. User to Root Attacks

User to root (U2R) attacks is a class of attacks where an attacker starts with access to a normal user account on the system and is able to exploit vulnerability to gain root access to the system

D. Probing

Probing is class of attacks where an attacker scans a network to gather information or find known vulnerabilities. An attacker with map of machine and services that are available on a network can use the information to notice for exploit.

In parallel SVM machine first we reduced non classified features data by distance matrix of binary pattern. From this concept, the cascade structure is developed by initializing the problem with a number of independent smaller optimizations and the partial results are combined in later stages in a hierarchical way, as shown in figure 1, supposing the training data subsets and are independent among each other.