Generative AI (GenAI): Security

Generative AI (GenAI): Security

Generative artificial intelligence (generative AI) is a new buzzword across the industries. Generative AI is an artificial intelligence technology that can produce various types of content, including text, imagery, audio, and synthetic data.

All organizations are investing large amounts of their budget in GenAI technology. Recently Amazon completed a $4 billion investment in generative AI development. As per a recent study barely scratching the Generative AI use case and opportunity.

Before implementing any Generative AI solution make sure you completely understand the organization’s business problem to implement Gen AI solution, because any generative AI solution takes a lot of money, time, and brain power.

Evolution of LLMs

Generative AI has just blown up within the last year or two years, but it has been around for decades. Generative AI is based on large language models (LLM).  LLM has been evolving for a while technically five to ten years approx. All companies (like AWS, Microsoft, and Open AI) are presenting their standard based on their business requirements. Here is the evolution story of LLMs & GenAI.

AI Attacks

There are four types of AI attacks.

  1. Poisoning – This AI attack can lead to the loss of reputation and capital. This is a classic example of thrill-seekers and hacktivists injecting malicious content which subsequently disrupts the retraining process.
  2. Inference – This AI attack can result in the leakage of sensitive information. This attack aims to probe the machine learning model with different input data and weigh the output.
  3. Evasion – This AI attack can harm physical safety. This type of attack is usually carried out by Hacktivists aiming to get the product of a competitive company down and has the potential to seriously harm the physical safety of people.
  4. Extraction – This AI attack can lead to insider threats or cybercriminals. Based on this the attacker can extract the original model and create a stolen model to find evasion cases and fool the original model.

Type of AI Malware

  • Black Mamba – Black Mamba utilizes a benign executable that reaches out to a high-reputation API (OpenAI) at runtime, so it can return synthesized, malicious code needed to steal an infected user’s keystrokes. It has the below properties.
    • ChatGPT Polymorphic Malware
    • Dynamically Generates Code
    • Unique Malware code
  • Deep Locker – The Deep Locker class of malware stands in stark contrast to existing evasion techniques used by malware seen in the wild. It hides its malicious payload in benign carrier applications, such as video conference software, to avoid detection by most antivirus and malware scanners. It has the below properties.
    • Targeted identification
    • Logic detonation Mechanism
    • Facial and voice recognition
  • MalGAN – Generative Adversarial Networks serve as the foundation of Malware GAN and are used to create synthetic malware samples. For Mal-GAN’s complex design to function, it is made up of three essential parts: the generator, substitute detector, and malware detection system based on machine learning. It has the below properties.
    • Generative Adversarial Malware
    • Bypass ML-based Detections
    • Feed-forward Neural Networks

AI Security Threats

  • Deepfake Attacks
  • Mapping and Stealing AI Models
  • Spear Phishing (Deep Phishing)
  • Advanced Persistent Threats (APTs)
  • DDoS and Scanning of the Internet.
  • Data poisoning AI Models
  • PassGAN and MalGAN
  • Auto Generation of Exploit code
  • Ransom Negotiation Automation
  • Social Engineering

AI Security Defense Strategy

As we learned in AI several AI malware and threats are impacting different parts of the AI ecosystem. Our AI must be smart enough that it detects its threats and mitigates risk. ML-based malware detectors detect risk and generate insights into its severity. Here are a few approaches should implement to protect your AI systems.

  • Intelligent Automation
    • Automated response and Mitigation
    • Indicators of Compromise (IOCs) extraction and correlation
    • Behavioral and anomaly detection
  • Precision Approach
    • High Accuracy and Precision
    • Identify, Understand, and Neutralize
    • Prioritize Risk
  • Define the Area for defense
    • Identify the most vulnerable area.
    • Apply a broad spectrum of defense.
    • System resiliency

AI involvement in security

  • Malware detection – AI systems help prevent phishing, malware, and other malicious activities, ensuring a high-security posture and analyzing any unusual behavior.
  • Breach risk prediction – Identify the most vulnerable system and protect against any data leak.
  • Prioritize critical defense – AI-powered risk analysis can produce incident summaries for high-fidelity alerts and automate incident responses, accelerating alert investigations.
  • Correlating attack patterns – AI models can help balance security with user experience by analyzing the risk of each login attempt and verifying users through behavioral data, simplifying access for verified users
  • Adaptive response – AI model automated response and generate an alert if the system identifies any threats. This creates the first layer of security defense.
  • Applied Machine learning – AI models are self-train. If models identify any new risk pattern apply new security models to all protected systems.

Security for Critical Data

When organization is migrating to digital transformation, data security is a big concern. Digital transformation impacts every aspect of business operation and execution. The volume of data that any organization creates, manipulates, and stores digitally is growing, and that drives a greater need for data governance. Large volume of data security is the biggest challenge for any organization for their entire data lifecycle. 

Data security is a process to protect sensitive data from unauthorized access, corruption, or theft  during the entire data lifecycle.

Here are a few steps to mitigate data risk and implement data security.

  • Event Monitoring
  • Data Detection
  • Data Encryption
  • Data Audit Trail

Event Monitoring – This activity includes Prevention, mitigation, and monitoring threats to sensitive data.

  • Monitor user activity – Know who is accessing data from where with real-time event streaming and min 3-6 months of event history.
  • Prevent and mitigate threats – Define and  build Transaction Security policies using declarative conditions or code to prevent and mitigate threats.
  • Drive adoption and performance – Analyze user behavior to enable security training for organization and find security bottlenecks to improve user experience.
  • Event Log Files – Create event log file for rich visibility into your org for security, adoption and performance purposes

Data Detection –  Find and classify the sensitive data quickly and mitigate data risk. 

  • Monitor Data Classification Coverage – Determine which data in your organization have been categorized versus uncategorized. High sensitive data needs to be secure properly. Label data appropriately to manage data security.

Data Encryption – Encrypt sensitive data at rest while preserving business functionality.

  • Encrypt data and maintain functionality – Protect data and attachments while data search, lookups, transportation and storage.
  • Key Management – Data encryption key management is very important to secure organization data. It includes control and authorization of data encryption keys.
  • Policy Management – Data policy management is defining and managing rules or procedures for accessing data. It allows individuals to follow certain processes to access data during storing or transit.. 

Data Audit Trail – It allows strengthening data integrity for an extended period. This strengthening data integrity process enables compliance and gains insights.

  • Data History – Store data as long as you can use this historical data for audit Trail or delete if you do not need this data.
  • Data retention policy – Data retention policy defines what data or how long this historical data needs to be stored for audit. Based on sensitivity of data you can archive from 3-6 months or more.
  • Insight of data – Create insight and dashboard for data audit transparency. It allows any organization to track any compliance or data security issue.

ChatGPT: A Intro & Company Use-Case

The internet is full of buzz about the new AI based chatbot, chatGPT. ChatGPT reminds me of the early days of  google, how google came and changed our internet search forever. We were using lycos search engine but google gave a new definition of search engine. Similarly I am seeing chatGPT is trying to define our search which is based on AI and AI models. It is coming as a new disruptive technology. Suddenly google is looking like old school.

Generative Pretrained Transformer 3 (GPT-3)  from OpenAI, is the main component for Jasper.ai and other cloud based content writing, chatbot and machine learning applications. GPT-3 was first publicly released by OpenAI on June 11, 2020.  GPT-3 is based on the concept of natural language processing (NLP) tasks and “generative pretraining”, which involves predicting the next token in a context of up to 2,048 tokens. 

GPT-3 is based on Large language models (LLMs). Large language models (LLMs) are AI tools that can read, summarize, and translate text. They can predict words and craft sentences that reflect how humans write and speak.Three popular and powerful large language models include Microsoft ’s Turing NLG, DeepMind’s Gopher, and OpenAI ’s GPT-3.

ChatGPT was first publicly released by OpenAI on November 30, 2022 based on the GPT-3 framework. Initially developed as part of the GPT-3 research program, ChatGPT was built on top of the powerful GPT-3.5 language model to specifically address natural language processing tasks that involve customer service chat interactions.

OpenAI’s Chat GPT3 has demonstrated the capability of performing professional tasks such as writing software code and preparing legal documents. It has also shown a remarkable ability to automate some of the skills of highly compensated knowledge workers in general. ChatGPT has immense potential for ecommerce customer experience automation. ChatGPT allows customers to personalized shopping and fully automated 24 x 7 customer service on-demand.

In spite of chatGPT buzzwords, ability to content writing and customer service on-demand, I am little careful to use this technology for my business. I tested a few use-cases in chatGPT. It is working fine with some simple use-case and problem solving. But as soon as I added a few more variables to my problem, the chatGPT response was not correct.

Here is screenshot from ChatGPT for my problem and solution from chatGPT

The problem shown above chatGPT directly calculated from equation and response came as 5 min.

In chatGPT’s response it is not calculating a person’s waiting time in the queue. 

So from above question right answer would be

Average Waiting Time = Average Processing Time x Utilization / (1-Utilization).

Average Waiting Time = 5 x (5/6) / (1 – 5/6) = 25 minutes

So, the correct answer is 25 minutes waiting in line. If we add the 5 minutes at the kiosk, we obtain a total of 30 minutes.

So from the above issue, I would like to highlight a few points if your company is trying to implement any ChatGPT solution.

  1. Does the ChatGPT AI model is configured based on your company use case?
  2. Do you have enough historical data to run and test AI based chatGPT LLM models?
  3. ChatGPT runs on the big model like LLM model. Big models incur a big cost, and LLM are expensive.
  4. Since ChatGPT runs on a big model (LLM), ChatGPT  needs to overcome performance constraints.

Keep an eye out for GPT-4, which may be released as early as the first half of 2023. This next generation of GPT may be better at their results and more realistic. 

Mulesoft integration with MongoDB (NOSQL)

Mulesoft with MongoDB is one of the best combinations for big data processing.  MongoDB is leading open source NoSQL database and Mule Soft is leading open source ESB.  This blog is dedicated to integration of Mulesoft with MongoDB .

Installation and configuration of MongoDB

Install MongoDB in your system. I am using window installation of mongoDB. I installed mongoDB 3.0 in my C:\ MongoDB folder. I created data folder to store MongoDB data in C:\data.
start MongoDB server with command

> \MongoDB\Server\3.0\bin\mongod.exe" --dbpath  \ data

MongoDB server started in port –27017 and userId—admin

Now download test data from MongoDB website
 https://raw.githubusercontent.com/mongodb/docs-assets/primer-dataset/dataset.json 
save this to a file named primer-dataset.json.

Import this data into MongoDB with test instance
>  mongoimport --db test --collection restaurants --drop --file \temp\primer-dataset.json
Start mongo editor to test loaded data  
 > MongoDB\Server\3.0\bin\mongo.exe
Run this command to test your database is configured
 > db.restaurants.find().count() 
This will return result.

Configuration of MongoDB connector in Mulesoft

Now configure MongoDB connection in Mule. I am using Mule 3.7
I created small Mule flow to work with MongoDB integration with Mule. This flow getting http request to get data from MongoDB and sending those data in json object.

managoDBFlow

MongoDB connection configuration screenshot
mangoConfig

Now we have restaurants collection in MongoDB , So I use collection  as restaurants.To execute condition query I am using operation – Find object using query map

In Query Attributes I am using  — Create Object manuallybuilder

Here is full code of this implementation

       <?xml version="1.0" encoding="UTF-8"?>

<mule xmlns:json="http://www.mulesoft.org/schema/mule/json" xmlns:mongo="http://www.mulesoft.org/schema/mule/mongo" xmlns:http="http://www.mulesoft.org/schema/mule/http" xmlns="http://www.mulesoft.org/schema/mule/core" xmlns:doc="http://www.mulesoft.org/schema/mule/documentation"
	xmlns:spring="http://www.springframework.org/schema/beans" version="EE-3.7.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-current.xsd
http://www.mulesoft.org/schema/mule/core http://www.mulesoft.org/schema/mule/core/current/mule.xsd
http://www.mulesoft.org/schema/mule/http http://www.mulesoft.org/schema/mule/http/current/mule-http.xsd
http://www.mulesoft.org/schema/mule/mongo http://www.mulesoft.org/schema/mule/mongo/current/mule-mongo.xsd
http://www.mulesoft.org/schema/mule/json http://www.mulesoft.org/schema/mule/json/current/mule-json.xsd">
    <http:listener-config name="HTTP_Listener_Configuration" host="0.0.0.0" port="8081" basePath="/mongo" doc:name="HTTP Listener Configuration"/>
    <mongo:config name="Mongo_DB" database="test" doc:name="Mongo DB" username="admin"/>
    <flow name="mongodbprojectFlow">
        <http:listener config-ref="HTTP_Listener_Configuration" path="/mongoproject" doc:name="HTTP"/>
        <logger message="This is Http Request and Response" level="INFO" doc:name="Logger"/>
        <mongo:find-objects-using-query-map config-ref="Mongo_DB" doc:name="Mongo DB" collection="restaurants" >
        	<mongo:query-attributes>                
            	<mongo:query-attribute key="restaurant_id">40360045</mongo:query-attribute>
        	</mongo:query-attributes>
        	<mongo:fields>
            	<mongo:field>name</mongo:field>
            	<mongo:field>cuisine</mongo:field>
            	<mongo:field>borough</mongo:field>
        </mongo:fields>
        </mongo:find-objects-using-query-map>
        <mongo:mongo-collection-to-json doc:name="Mongo DB"/>
        <json:object-to-json-transformer doc:name="Object to JSON"/>
    </flow>
</mule>