Analyzing Website Characteristics and Their Impact on Web Traffic and Legitimacy Classification for Phishing Detection: A Structural Equation Modeling Approach
Phishing attacks continue to pose a significant threat in the digital age by leveraging deceptive websites that imitate legitimate platforms to gain user trust and steal sensitive information. This study explores the relationship between various website characteristics and their effect on web traffic and legitimacy classification using a dataset of over 11,000 websites from the Kaggle “Phishing Website Detector.” Employing structural equation modeling (SEM), the analysis focuses on 11 specific features, including pop-up windows, iframe redirection, domain age, DNS recording, and status bar customization. Results reveal that DNS recording, iframe redirection, and pop-up usage are positively associated with higher web traffic, which in turn is linked to legitimate websites. Conversely, features such as status bar customization, older domain age, and a low number of backlinks are more commonly found in phishing sites. Notably, traditionally cited indicators like PageRank and disabling right-click functionality showed no significant impact on traffic or legitimacy. The model’s R² value of 0.130 suggests that while these variables are relevant, additional behavioral and dynamic data may be required to improve predictive power. This research enhances phishing detection strategies by identifying meaningful indicators and emphasizes the future need for real-time analytics and machine learning in cybersecurity defense systems.