Photo by Andrea Piacquadio from Pexels
Kymatio, as the leading global platform in employee cyber risk management, has to respond to multiple requirements, and among many other lines of work, and in particular the one that interests us for this article, determines the needs for strengthening and cyber awareness of the people who work in our clients’ companies, organizations in which we are going to strengthen the cybersecurity posture by adding the last variable, the human factor, to that equation.
Equating ourselves to other platforms of considerable power that incorporate Artificial Intelligence (AI), we are to process complete data lakes to learn from the correlations they hide. In our case, we face the challenge of determining the probability that a user belongs to a certain “family of needs” or internal risk group (IRG). Each IRG faces potential information security problems; consequently, each will have associated specific and personalized support, training and awareness measures.
Like any other system, we need to have personal data; in the case at hand, knowing to a certain degree some customs and characteristics of the employee, information about their position, etc. The greater the information available to the platform, the richer the profile analysis will be and, consequently, it will reach a higher level of success, presenting a high degree of customization in the actions that the platform can recommend to each person.
In the present case, the key is machine learning (ML). Machine learning is actually a way of teaching computers to recognize patterns and make decisions based on those patterns, often faster and more accurately than humans. Our case is not an exception, when it comes to outlining a person’s strengthening needs, obtaining information is always a multifaceted task, it is necessary to obtain from the user a “high” pool of elements that can be complex.
Kymatio applies ML techniques to help narrow down that set of unknowns. For example, when one of our chatbots interacts with a human in one of their virtual interviews, the ML database allows us to guide them to the next question of interest. This is done through an autocomplete feature that can dramatically reduce the length of the interview. Even when working on certain complex issues, it is not necessary to ask certain questions because there is a direct correlation with others already answered.
Association Rule Mining
Among the arsenal of possible ML tools, there are very different techniques to find correlations between different items: Correlation Matrix, WOE/IV, Feature Importance, PCA, etc. Specifically in Python language, a Decission Tree includes by default the ability to capture the most important features, as well as their coefficient. Each question would be introduced as a variable to the model and it would do its magic.
However, to solve this problem, it seems appropriate to use techniques related to the Association Rule Mining field. Models such as APRIORI or FP-growth allow finding regularities between the different items in the “shopping basket” of different customer transactions.
IF Onions and potatoes;
THEN Burger
A typical example of a rule:
“{onions,potatoes} -> {burger}”
It infers that customers who bought onions and potatoes also bought minced meat. An immediate use, through the association rules, is to design a recommendation engine, be it these movies, telephone rates or cybersecurity recommendations. Continuing with the initial example, if a customer has bought onions and potatoes, he might be advised to buy minced meat. In addition, the distribution of the supermarket chain’s shelves could be modified to place certain products closer together, combine certain products to create packs, etc.; in short, the possibilities are increasingly interesting.
But what do potatoes and onions really have to do with neuroscience and cybersecurity? We will follow the same principle to apply APRIORI, and thus be able to recommend the value of the answer to the user before they read the question.
Suppose we have a DataSet that contains the information on questions answered by users at the request of one of our chatbots:
df = pd.read_csv('questions.csv', sep=';')
df.head(5)
InterviewId | questionKey | questionValue |
1 | ky01 | B |
2 | ky01 | C |
3 | ky01 | C |
4 | ky01 | C |
5 | ky01 | A |
The first thing we have to do is “create” the item for the shopping cart. To do this, we concatenate the identifier of the question with the answered value. In Python it could be something like this:
df['item'] = df['questionKey'] + '_' + df['questionValue'
To execute the APRIORI model in Python there are two libraries, we recommend using the Efficient APRIORI package due to its good performance.
Now we only have the stage of adaptation to the type of data structure that APRIORI expects to process:
records = []
for surveyId in df['surveyId'].unique():
records.append(df[df['surveyId']==surveyId]['item'].tolist())
itemsets, rules = apriori(records)
Voilà! We have already obtained the association rules between the different Items.
You can find such curious results as:
{s09_A} -> {s56_A} (conf: 0.949}
If a user answers (A) to question s09, it is highly probable that they will also answer (A) to question s56.
From here, either based on previous data or thanks to the answers to previous questions, it is feasible to design other components, from “autocompletion” engines to selection probability evaluators.
For those of you who dare to do the exercise, we want to highlight the four macro tasks that should be incorporated to complete it successfully:
-
Persist association rules in a cache system (Redis) in an understandable format for the BackEnd
-
Return to the FrontEnd the recommended value
-
Persist in the DB the recommended value and the one used by the user
-
Automate the periodic rule generation process
If you are interested in technology, artificial intelligence and in particular machine learning, contact Kymatio here https://blog.kymatio.com/en/contact/
Article by David Caballero (CIO Kymatio) and Fernando Mateus (CEO Kymatio)
Related information:
New Kymatio module prepares employees for social engineering techniques
Related information: