by Akanksha Singh

k-mean clustering and its use-case in the security domain

K-means clustering is a clustering algorithm for determining the structure of a dataset while solving clustering problems…

Akanksha Singh
4 min readSep 5, 2021

--

K-means’ purpose is to divide data into discrete, non-overlapping segments. K-means is a simple and robust algorithm that makes clustering very easy. It is highly scalable, can be applied to both small and large datasets.

It is an unsupervised machine learning algorithm that divides the given data into the given number of clusters. Here, the “K” is the given number of predefined clusters, that need to be created.

This algorithm takes raw unlabeled data as an input and divides the dataset into clusters and the process is repeated until the best clusters are found.

The main idea is to define k centers, one for each cluster. These centers should be placed in a cunning way because of different location causes different result.

So, the better choice is to place them as much as possible far away from each other.

K-Means algorithm, the result obtained is a level of visits to the website in “Cyber-Profiling Criminals”. The visit is divided into three groups: low, medium, and high.

K-mean is used in “cyber-profiling criminals”

Cyber-profiling is the process of collecting data from individuals and groups to identify significant co-relations.

The idea of cyber profiling is derived from criminal profiles, which provide information on the investigation division to classify the types of criminals who were at the crime scene.

The data of internet users access at an institution can be categorized as a large data type so that the analysis can be done with data mining.

In this case, the cluster algorithm as one of data mining techniques can be used to find groups (clusters) of a useful object, which the used are depends on the purpose of data analysis.

Clustering analysis is one of the most useful methods for the acquisition of knowledge and is used to find clusters that are a fundamental and important pattern for the distribution of the data itself.

Log :-
Log is a file that records events in the computer program. Meanwhile, according to the definition of the log is a record of daily activities.

Activities that are recorded directly called the transaction log. The log file can be used as a support in the process of cyber forensics to obtain digital evidence during the investigation stage.

The idea of cyber profiling is derived from criminal profiles, which provide information on the investigation division to classify the types of criminals who were at the crime scene.

Profiling is more specifically based on what is known and not known about the criminal.

Profiling is information about an individual or group of individuals that are accumulated, stored, and used for various purposes, such as by monitoring their behavior through their internet activity.

Cyber profiling process that has been done shows that the search for information more frequently accessed by users coming from educational institutions.

Cyber profiling indicates that environmental factors and daily activities affect on what is accessed by the user.

Cyber Profiling process can be directed to the benefit of:

➡ Identification of users of computers that have been used previously. 

➡ Mapping the subject of family, social life, work, or network-based organizations, including those for whom he/she worked. 

➡ Provision of information about the user regarding his ability, level of threat, and how vulnerable to threats 

➡ Identify the suspected abuser

K-Means algorithm is used as an algorithm for the cyber profiling process. K-Means algorithm being used is in line with expectations from this study, because it has a simple algorithmic process with a good degree of accuracy.

But the K-Means algorithm has disadvantages, namely the process of making an initial value initial random center. This can lead to differences in the results of the cluster.

This article is summary of what I researched on Internet about K-mean clustering and it’s use-case like Cyber Profiling.

Connect on LinkedIn for more such informative material:

Thanks for reading, hope it will help you!!

--

--

Akanksha Singh

Platform Engineer | Kubernetes | Docker | Terraform | Helm | AWS | Azure | Groovy | Jenkins | Git, GitHub | Sonar | NMAP and other Scan and Monitoring tool