Type to learn web

5/1/2023

While JavaScript provides good scalability for web pages, it also introduces security risks. Unlike traditional feature engineering, these pre-trained word vectors obtained from URL data can better help the downstream model complete the detection task. proposed an unsupervised URL embedding model. To construct task-relevant word vectors, Yan et al. The experimental results on the HTTP CSIC2010 dataset show that this approach performs well. Their approach involves using an auto-encoder to generate feature vectors, followed by a convolutional neural network that incorporates skip connections to classify the extracted features. proposed a malicious URL detection method based on a composite neural network. used Bi-IndRNN and CapsNet as the backbone networks to achieve richer feature mining from different perspectives. Furthermore, based on word-level and character-level features, Yuan et al. They used dynamic convolutional layers for feature extraction, which enables deeper feature mining from a larger perceptual view than the original convolutional layers. Wang proposed a malicious URL detection method based on the fusion of word-level features and character-level features. Detection Methods Based on Single-Modal FeaturesĪtrees used AdaBoost to integrate SVM, Naive Bayesian algorithm, and decision tree model, and used lexical features obtained through CfsSubsetEval filtering as the input of the model, which can effectively detect four types of malicious URLs including spam, phishing, malware, and defacement URLs. These techniques have the potential to improve flexibility and accuracy, making them promising for real-world applications.

To solve these problems, researchers have proposed self-learning techniques for malicious web page detection based on various machine learning or deep learning detection methods. As a result, they require high maintenance and update costs and are easily bypassed by encryption and obfuscation, leading to high rates of false positives and false negatives. Although these techniques have been widely used in anti-virus tools or browser security plug-ins, they can only detect known types of malicious web pages and rely on large-scale features or rule bases. Various approaches have been proposed to identify such malicious web pages, including blacklist-based approaches, heuristic rule-based approaches, and interactive host behavior approaches. Malicious web pages pose a great threat to the privacy and property security of users as they can steal private information without users’ knowledge, often by disguising themselves as legitimate web pages or embedding malicious scripts in the pages. Experimental results on synthetic datasets show that our proposed method outperforms traditional single-modal detection methods in general, and has advantages over baseline models in terms of accuracy and reliability. In addition, a coarse-grained modal matching loss is added to the model optimization objective to assist the models in learning the cross-modal association features. This activation function effectively increases the classification boundary and improves the robustness. For the output part of the model, a linear layer based on large margin softmax is applied to the decision-making. Next, a single-stream neural network based on the ConvBERT pre-trained model is used as the backbone classifier, and it learns the representation of multi-modal features through fine-tuning. To help the subsequent model learn the relationship between the two modalities and avoid information confusion, modal-type encoding, and positional encoding are introduced. First, in the input stage, the raw URL and HTML tag sequences of web pages are used as input features. To address this limitation, we propose a malicious web page detection method based on multi-modal learning and pre-trained models. As a result, these techniques are not capable of effectively fusing features from different modalities, ultimately limiting the detection effectiveness. However, these methods are commonly based on single-modal features or simple stacking of classifiers built on various features. While current machine learning-based detection methods have emerged as a promising alternative to traditional detection techniques. In recent years, the number of malicious web pages has increased dramatically, posing a great challenge to network security.

0 Comments

Type to learn web

Leave a Reply.

Author

Archives

Categories