| S.No | Project Code | Project Title | Abstract |
|---|---|---|---|
Artifical Intelligence |
|||
| 1 | VTPAI01 | Eye-LRCN: A Long-Term Recurrent Convolutional Network for Eye Blink Completeness Detection | |
| 2 | VTPAI02 | Explainable AI Based Neck Direction Prediction and Analysis During Head Impacts | |
| 3 | VTPAI03 | A Novel Digital Audio Encryption and Water Marking Scheme | |
| 4 | VTPAI04 | Advanced Encryption for Quantum-Safe Video Transmission | |
| 5 | VTPAI05 | Secret Image Sharing using Shamir Secret Rule | |
| 6 | VTPAI06 | AI Model for Identification of Micro-Nutrient Deficiency in Banana Crop | |
| 7 | VTPAI07 | A Novel Medical Image Encryption Scheme Based on Deep Learning Feature Encoding and Decoding | |
| 8 | VTPAI08 | Efficient Anomaly Detection Algorithm for Heart Sound Signal | |
| 9 | VTPAI09 | Stroke Prediction Using XGBoost and a fusion of XGBoost with Random Forest | |
| 10 | VTPAI10 | Road Object Detection in Foggy Complex Scenes Based on Improved YOLOv10 | |
| 11 | VTPAI11 | Jellyfish Detection using Improved YOLO Algorithm | |
| 12 | VTPAI12 | Novel Animal Detection System using YOLO With Adaptive Preprocessing and Feature Extraction | |
| 13 | VTPAI13 | Image Translation and Reconstruction with advanced Neural Networks | |
| 14 | VTPAI14 | Improved YOLO Algorithm to detect Marine Debris in Surveillance | |
| 15 | VTPAI15 | Railway Objects Detection by Using Improved YOLO Algorithm | |
| 16 | VTPAI16 | Automatic Detection of Foreign Object Debris on Airport Runway by Using YOLO | |
| 17 | VTPAI17 | Personalized Book Intelligent Recommendation System | |
| 18 | VTPAI18 | Safety Helmet Detection Based on Improved YOLO | |
| 19 | VTPAI19 | Efficient Pomegranate Growth Stage Detection Using YOLOv10: A Novel Object Detection Approach | |
| 20 | VTPAI20 | EC-YOLO: Advanced Steel Strip Surface Defect Detection Model Based on YOLOv10 | |
| 21 | VTPAI21 | Palm Oil Counter: State-of-the-Art Deep Learning Models for Detection and Counting in Plantations | |
| 22 | VTPAI22 | Detection of Hand Bone Fractures in X-Ray Images Using Hybrid YOLO NAS | |
| 23 | VTPAI23 | SUNet: Coffee Leaf Disease Detection Using Yolo | |
| S.No | Project Code | Project Title | Abstract |
|---|---|---|---|
NATURAL LANGUAGE PROCESSING |
|||
| 1 | VTPNLP01 | Gun Sound Recognition Using NLP and YAMNET model | |
| 2 | VTPNLP02 | Classification and Recognition of Lung Sounds Based on Improved Bi-ResNet Model | |
| 3 | VTPNLP03 | Deep Learning Algorithms for Cyber-Bulling Detection in Social Media Platforms | |
| 4 | VTPNLP04 | Novel Meta Learning Approach for Detecting Postpartum Depression Disorder Using Questionnaire Data | |
| 5 | VTPNLP05 | Online Recruitment Fraud (ORF) Detection Using Deep Learning Approaches | |
| 6 | VTPNLP06 | Fake News Detection Using Deep Learning | |
| 7 | VTPNLP07 | Live Event Detection for People's Safety Using NLP and Deep Learning | |
| 8 | VTPNLP08 | Socially Aware Synthetic Data Generation for Suicidal Ideation Detection Using Natural Language Processing | |
| 9 | VTPNLP09 | Climate Change Sentiment Analysis Using Natural Language Processing and LSTM Model | |
| 10 | VTPNLP10 | Explainable Detection of Depression in Social Media Contents Using Natural Language Processing | |
| 11 | VTPNLP11 | Natural Language Processing and CNN Model for Indonesian Sarcasm Detection | |
| 12 | VTPNLP12 | A Novel Customer Review Analysis System Based on Balanced Deep Review and Rating Differences in User Preference | |
| 13 | VTPNLP13 | How Do Crowd-Users Express Their Opinions Against Software Applications in Social Media? A Fine-Grained Classification Approach | |
| 14 | VTPNLP14 | Advanced NLP Models for Technical University Information Chatbots: Development and Comparative Analysis | |
Image Processing |
|||
|---|---|---|---|
| 1 | VTPIP01 | On Enhancing Crack Semantic Segmentation Using StyleGAN DeepLabV3 & ResNet50 | |
| 2 | VTPIP02 | Underwater Image Enhancement Based on Conditional Denoising Diffusion Probabilistic Model | |
| 3 | VTPIP03 | A Universal Field-of-View Mask Segmentation Method on Retinal Images From Fundus Cameras | |
| 4 | VTPIP04 | Deep learning algorithms for Optimized Thyroid Nodule Classification | |
| 5 | VTPIP05 | Multi-Class Medical Image Classification Using Deep learning with Xception Model | |
| 6 | VTPIP06 | Advancing Malaria Identification From Microscopic Blood Smears Using Hybrid Deep Learning Frameworks | |
| 7 | VTPIP07 | Segmentation of Aerial Images using U-Net model | |
| 8 | VTPIP08 | Development of Convolutional Neural Network to Segment Ultrasound Images of Histotripsy Ablation | |
| 9 | VTPIP09 | Automated Detection of Spinal Lesions from CT Scans via Deep Transfer Learning | |
| 10 | VTPIP10 | Image Processing Techniques for Emotion Recognition | |
| 11 | VTPIP11 | A Lightweight and Multi-Branch Module in Facial Semantic Segmentation Feature Extraction | |
| 12 | VTPIP12 | A Plant Leaf Disease Image Detection and Classification with Convolutional Neural Networks (CNN) and Opencv | |
| 13 | VTPIP13 | Dual-Branch Fully Convolutional Segment Anything Model for Lesion Segmentation in Endoscopic Images | |
| 14 | VTPIP14 | Segnet Algorithm Guided Image Channel Selection for Skin Lesion Segmentation | |
| 15 | VTPIP15 | Image Encryption and Decryption Using AES in CBC Mode with Flask | |
| 16 | VTPIP16 | Efficient Single Infrared Image Super-Resolution | |
Eye blink detection using OpenCV, Python, and dlib is an advanced technique that plays a crucial role in applications like driver drowsiness detection, human-computer interaction, and fatigue monitoring. The process begins with face detection, where OpenCV's Haar cascades or deep learning-based detectors isolate the face region from an image or video stream. Subsequently, facial landmark detection is performed using dlib's pre-trained shape predictor, which identifies 68 key points on the face, including those around the eyes. From these landmarks, specific points outlining the eye regions are extracted, typically involving six points per eye. The Eye Aspect Ratio (EAR) is then calculated by measuring the vertical eye-opening relative to its horizontal width. The EAR significantly decreases when the eyes close during a blink. Continuous monitoring of the EAR allows for blink detection when the ratio falls below a set threshold for a brief period before returning to normal. This threshold and duration are adjustable based on the application and user requirements. The system is designed for real-time operation, ensuring immediate feedback on blink events through efficient processing of video frames and quick EAR calculations. By combining OpenCV for image processing, Python for programming, and dlib for facial landmark detection, this approach provides a robust and efficient solution for real-time eye blink detection, leveraging computer vision and machine learning to enhance safety and user interaction in various fields.
The research focuses on detecting and analyzing the neck rotation during head impacts using the YOLOv8 model, with the aim of providing preventive healthcare measures. The neck's position and orientation need to be monitored and measured to predict the direction of the neck during such impacts. The experiment involves simulating mild head impacts, replicating movements such as flexion and lateral rotation, based on American football scenarios, with data collected from ten subjects (five male and five female). The YOLOv8 model is employed to detect and track the neck's rotation in real-time from video footage, providing high accuracy in determining the direction of the neck during head impacts. By utilizing YOLOv8, this research aims to achieve a more efficient and precise system for detecting neck rotation compared to traditional methods. This approach, integrated with explainable AI, helps in offering meaningful interpretations of the results, which can support clinical systems in making informed decisions regarding head and neck injury prevention.
To enhance the privacy and security of audio signals stored in third-party storage centers, a robust digital audio encryption and forensics watermarking scheme is proposed. The scheme incorporates the AES-GCM (Advanced Encryption Standard in Galois/Counter Mode) algorithm for authenticated encryption, ensuring both confidentiality and integrity of the audio data. In addition, we utilize Fernet symmetric encryption and PBKDF2HMAC (Password-Based Key Derivation Function 2 with HMAC) for key generation, supported by generative hash passwords to further strengthen security. The signal energy ratio feature of audio signals is defined and used in the watermark embedding method through feature quantification, improving the resilience of the watermarking system. First, the original audio is encrypted using scrambling, multiplication, and AES-GCM to generate the encrypted data. The encrypted data is then divided into frames, each compressed through sampling. The compressed data, along with frame numbers, is embedded into the encrypted audio, forming the watermarked signal which is uploaded to third-party storage. Authorized users retrieve the encrypted data and verify its authenticity. If intact, the data is decrypted directly using Fernet to recover the original audio. In the case of an attack, the compromised frames are identified, and the embedded compressed data is used to reconstruct the audio approximately. The reconstructed signal is subsequently decrypted to retain the expression meaning of the original audio. Experimental results demonstrate the effectiveness of the proposed scheme in providing quantum-safe encryption, secure watermarking, and forensics capabilities.
This project enables secure video processing, encryption, and watermark embedding, focusing on user authentication, video encryption, and decryption capabilities. Users can register, log in, and upload videos along with watermarks for processing. Using the cryptography library, each uploaded video is encrypted, and its encryption key is split using Shamir's Secret Sharing, ensuring secure key distribution and storage. The encrypted frames are stored separately for later retrieval and decryption. Decryption occurs through reassembling key shares, allowing the original video to be reconstructed, with the watermark extracted from the first frame. The application further provides options to download the decrypted video, view split frames, and explore contact and performance information pages. Employing OpenCV for video processing and secure file handling techniques, this system ensures data confidentiality and integrity through a user-friendly interface and robust back-end encryption mechanisms. The application uses secure upload and storage mechanisms for sensitive data, like key shares and encrypted frames, storing them in predefined folders. Key shares are stored separately, further protecting the decryption process from unauthorized access.
The safeguarding of digitized data against unwanted access and modification has become an issue of utmost importance as a direct result of the rapid development of network technology and internet applications. In response to this challenge, numerous secret image sharing (SIS) schemes have been developed. SIS is a method for protecting sensitive digital images from unauthorized access and alteration. The secret image is fragmented into a large number of arbitrary shares, each of which is designed to prevent the disclosure of any information to the trespassers. In this paper, we present a comprehensive survey of SIS schemes along with their pros and cons. We review various existing verifiable secret image sharing (VSIS) schemes that are immune to different types of cheating. We have identified various aspects of developing secure and efficient SIS schemes. In addition to that, a comparison and contrast of several SIS methodologies based on various properties is included in this survey work. We also highlight some of the applications based on SIS. Finally, we present open challenges and future directions in the field of SIS.
This research presents an advanced convolutional neural network (CNN) model for diagnosing micro-nutrient deficiencies in banana crops through the analysis of leaf images. Proper nutrition is essential for optimal crop growth and yield, and deficiencies in vital nutrients can severely impact plant health and productivity. To address this, we have developed a specialized CNN model designed to detect and classify various nutrient deficiencies based on detailed leaf images. The study involves a comprehensive dataset of banana leaves exhibiting different deficiency symptoms, which was used to train and evaluate the model. The CNN architecture was carefully optimized to enhance feature extraction and classification capabilities, enabling precise identification of nutrient-related issues. The findings demonstrate the model’s effectiveness in distinguishing between different types of nutrient deficiencies, providing a valuable tool for precision agriculture. This approach aims to improve nutrient management practices and contribute to better crop health monitoring, highlighting the significant role of machine learning technologies in advancing agricultural research and practices.
Medical image encryption is critical to safeguarding patient privacy and maintaining the confidentiality of sensitive medical records. Leveraging advancements in artificial intelligence, we propose an innovative medical image encryption and decryption system that integrates deep learning-based encryption with QR code technology. This system enables users to upload a medical image, which is encrypted into a QR code format and paired with a uniquely generated key. Both the QR code and key are securely stored for subsequent retrieval decryption, users can upload the QR code and the corresponding key to reconstruct the original image with high fidelity. The encryption process employs advanced neural network-based feature encoding, ensuring robustness against attacks such as noise, cropping, and brute force. Additionally, the system incorporates a reversible neural network to optimize decryption accuracy and reconstruction quality. Experimental results highlight the system's efficiency in preserving image integrity, resisting various attacks, and maintaining end-to-end security in medical image encryption. This approach not only strengthens the privacy and security of medical data but also provides a user-friendly framework for securely transmitting and storing sensitive medical images.
Cardiovascular disease (CVD) continues to be a leading cause of death globally, claiming approximately 17.9 million lives each year, as reported by the World Health Organization (WHO). This high mortality rate underscores the need for effective early detection and intervention strategies. Heart sound signals, also known as phonocardiograms (PCGs), hold essential information about cardiac health, providing a non-invasive method to assess heart function. Recent advancements in deep learning have enabled the development of models capable of analyzing heart sounds to detect abnormal features, assisting in early diagnosis and disease prevention. However, the challenges in heart sound data, including imbalanced class distributions, complex feature characteristics, and limited differentiation between sounds like systolic and diastolic murmurs, have restricted the effectiveness of traditional deep learning models. This project presents a novel heart sound anomaly detection algorithm based on the Deep Neural Network Model. The DNN ability to capture both local and global features within a signal makes it particularly well-suited for analyzing heart sound data. The proposed algorithm was tested on the PhysioNet/CinC 2016 public dataset, a widely used dataset for heart sound classification. Experimental results demonstrated a high classification accuracy of 99%, with a specificity of 98.5% and a sensitivity of 98.9%. These metrics signify a substantial improvement over existing methods, highlighting the model’s effectiveness in detecting anomalies in heart sounds. The high sensitivity and specificity rates underscore the model's potential to serve as a reliable tool for early screening and diagnosis of cardiovascular diseases.
Stroke is a life-threatening medical condition caused by disrupted blood flow to the brain, representing a major global health concern with significant health and economic consequences. Researchers are working to tackle this challenge by developing automated stroke prediction algorithms, which can enable timely interventions and potentially save lives. As the global population ages, the risk of stroke increases, making the need for accurate and reliable prediction systems more critical. In this study, we evaluate the performance of an advanced machine learning (ML) approach, focusing on XGBoost and a hybrid model combining XGBoost with Random Forest, by comparing it against six established classifiers. We assess the models based on their generalization ability and prediction accuracy. The results show that more complex models outperform simpler ones, with the best-performing model achieving an accuracy of 96%, while other models range from 84% to 96%. Additionally, the proposed framework integrates both global and local explainability techniques, providing a standardized method for interpreting complex models. This approach enhances the understanding of decision-making processes, which is essential for improving stroke care and treatment. Finally, we suggest expanding the model to a web-based platform for stroke detection, extending its potential impact on public health.
Foggy weather presents substantial challenges for vehicle detection systems due to reduced visibility and the obscured appearance of objects. To overcome these challenges, a novel vehicle and Humans detection algorithm based on an improved lightweight YOLOv10 model is introduced. The proposed algorithm leverages advanced preprocessing techniques, including data transformations, Dehaze Formers, and dark channel methods, to improve image quality and visibility. These preprocessing steps effectively reduce the impact of haze and low contrast, enabling the model to focus on meaningful features. An enhanced attention module is incorporated into the architecture to improve feature prioritization by capturing long-range dependencies and contextual information. This ensures that the model emphasizes relevant spatial and channel features, crucial for detecting small or partially visible vehicles in foggy scenes. Furthermore, the feature extraction process has been optimized, integrating an advanced lightweight module that improves the balance between computational efficiency and detection performance. This research addresses critical issues in adverse weather conditions, providing a robust framework for foggy weather vehicle and Humans detection.
Massive jellyfish outbreaks pose serious threats to human safety and marine ecosystems, prompting the need for effective detection methods. This work focuses on utilizing optical imagery and CNN-based deep-learning object detection models for jellyfish identification. Due to the limited availability of labelled jellyfish datasets, we developed a novel dataset using a model-assisted labelling approach, significantly reducing the reliance on manual annotation. Building on this dataset, we propose an improved YOLOv11- CoordConv model, integrating advanced mechanisms such as Global Attention Mechanism (GAM) and modules to enhance its detection capabilities. Experimental evaluations demonstrate that the proposed model outperforms several state-of-the-art object detection frameworks, highlighting its potential as a reliable solution for jellyfish detection in underwater environments.
Massive jellyfish outbreaks pose serious threats to human safety and marine ecosystems, prompting the need for effective detection methods. This work focuses on utilizing optical imagery and CNN-based deep-learning object detection models for jellyfish identification. Due to the limited availability of labelled jellyfish datasets, we developed a novel dataset using a model-assisted labelling approach, significantly reducing the reliance on manual annotation. Building on this dataset, we propose an improved YOLOv11- CoordConv model, integrating advanced mechanisms such as Global Attention Mechanism (GAM) and modules to enhance its detection capabilities. Experimental evaluations demonstrate that the proposed model outperforms several state-of-the-art object detection frameworks, highlighting its potential as a reliable solution for jellyfish detection in underwater environments.
In this work, we propose a novel Dual-Mode Web-Based Image Processor designed to address the challenges of image translation across different modalities. Traditional computer vision models often rely on a single sensor modality, such as RGB or thermal images, but fail to fully exploit the complementary strengths of both. Our architecture leverages a single lightweight encoder that efficiently encodes both grayscale and thermal images into compact latent vectors. This encoding enables cross-modal image translation, including grayscale image colorization and thermal image reconstruction, facilitating flexibility in handling multiple downstream tasks. Our approach reduces the computational burden by utilizing a compact encoder and optimizing for both data compression and robust image translation across varied lighting conditions. The model employs four distinct generators and two discriminators in an adversarial framework, incorporating reconstruction error terms to ensure consistency and contrast preservation. Experimental results demonstrate competitive quality in translation and reconstruction across various lighting scenarios, with comprehensive evaluations across multiple metrics. Additionally, ablation studies validate the effectiveness of the proposed loss terms, confirming their role in improving model performance.
Marine debris poses a critical threat to environmental ecosystems, necessitating effective methods for its detection and localization. This study addresses the existing limitations in the literature by proposing an innovative approach that combines the instance detection capabilities of YOLOv11 with various attention mechanisms to enhance efficiency and broaden applicability. Coordinate attention and YOLOv11 demonstrate solid performance across various scenarios, while the bottleneck transformer, though slightly less consistent overall, proves valuable in identifying debris areas overlooked during manual annotation. Additionally, the bottleneck transformer shows enhanced precision in detecting larger debris pieces, indicating its potential utility in specific applications. This study underscores the versatility and efficiency of attention-enhanced YOLOv11 models for marine debris detection, demonstrating their ability to address diverse challenges in environmental monitoring. The results also emphasize the importance of aligning detection models with specific operational requirements, as unique characteristics of each model can offer advantages in targeted scenarios.
The efficient and accurate detection of foreign objects on railway tracks is critical to ensuring the safety and smooth operation of train systems. This work addresses the limitations of existing foreign object detection methods, including low efficiency and suboptimal accuracy, by proposing an enhanced railway foreign object intrusion detection framework leveraging YOLOv8 and Overhaul Knowledge Distillation. The proposed method consists of a two-stage architecture. In the first stage, a lightweight image classification network quickly determines whether a railway image contains foreign objects. This stage minimizes reliance on computationally intensive object detection models, thereby enhancing detection speed. In the second stage, YOLOv8 is employed to precisely detect and localize foreign objects in images flagged by the classification network. The choice of YOLOv8 provides notable improvements in accuracy and inference speed over previous versions such as YOLOv3. Additionally, the Overhaul Knowledge Distillation algorithm is applied to train the lightweight classification network under the supervision of a larger, more robust network, ensuring competitive classification performance while maintaining efficiency. Experimental results demonstrate that the proposed method achieves state-of-the-art performance in both detection accuracy and speed, with significant improvements in FPS and detection robustness compared to earlier approaches.
The detection and removal of Foreign Object Debris (FOD) on airport runways pose significant challenges due to small object sizes and the impact of complex weather conditions on visibility and equipment efficiency. To address these challenges, this paper proposes an advanced FOD detection model leveraging YOLOv8, alongside a geolocation prediction model based on machine learning regression algorithms.. By integrating a self-attention mechanism with Convolutional Neural Networks (CNNs), the proposed FOD detection system delivers impressive results. Ablation studies demonstrate notable improvements in the Mean Average Precision (mAP) across different model components. Comparative evaluations against existing models like YOLOv5, YOLOX, and YOLOv7 highlight the superior performance of YOLOv8, particularly in detecting small objects and maintaining accuracy with diverse input data. Furthermore, the geolocation prediction model, built using machine learning regression techniques, showcases significant potential for practical FOD detection and removal applications in real-world scenarios.
With the digital transformation and improvement of university library information technology, readers’ demands for library services are increasingly diversified and personalized. They are no longer satisfied with traditional borrowing services but hope that the library can provide more accurate and personalized recommendation services. To address these challenges, this study first proposes an improved item-based collaborative filtering recommendation algorithm based on the mean model representation. In addition, the system incorporates Neural Collaborative Filtering (NCF), which uses neural networks to model user-item interactions, providing more expressive and nonlinear representations compared to traditional methods, thereby enhancing recommendation quality. This algorithm is implemented using Django (4.1.2), a high-level web framework that promotes rapid development and clean design, ensuring a structured backend for the web application. The system leverages Pandas (1.5.0) for extensive data manipulation and analysis, allowing for effective handling of user data and preferences, while NumPy (1.23.3) facilitates numerical computations essential for the recommendation algorithms. For API integration, the system employs requests (2.28.1) and requests-oauthlib (1.3.1). The deployment is managed using gunicorn (20.1.0), a WSGI server that prepares the application for production environments, while virtualenv (20.16.5) and nodeenv (1.7.0) assist in managing Python and Node.js environments effectively.
Wearing safety helmets can effectively reduce the risk of head injuries for construction workers in high-altitude falls. In order to address the low detection accuracy of existing safety helmet detection algorithms for small targets and complex environments in various scenes, this study proposes an improved safety helmet detection algorithm based on YOLOv8, named YOLOv8n. For data augmentation, the mosaic data augmentation method is employed, which generates many tiny targets. In the backbone network, a coordinate attention (CA) mechanism is added to enhance the focus on safety helmet regions in complex backgrounds, suppress irrelevant feature interference, and improve detection accuracy. In the neck network, a slim-neck structure fuses features of different sizes extracted by the backbone network, reducing model complexity while maintaining accuracy. In the detection layer, a small target detection layer is added to enhance the algorithm’s learning ability for crowded small targets. Experimental results indicate that, through these algorithm improvements, the detection performance of the algorithm has been enhanced not only in general scenarios of real-world applicability but also in complex backgrounds and for small targets at long distances. Compared to the YOLOv8n algorithm, YOLOv8n in precision, recall, mAP50, and mAP50-95 metrics, respectively. Additionally, YOLOv8n-SLIM-CA reduces the model parameters by 6.98% and the computational load by 9.76%. It is capable of real-time and accurate detection of safety helmet wear. Comparison with other mainstream object detection algorithms validates the effectiveness and superiority of this method.
Pomegranates, revered for their nutritional richness and medicinal properties, are integral to global agriculture. Accurate identification of their growth stages is a pivotal step in modern farming, enabling optimized resource management, timely interventions, and the prevention of losses caused by pests, diseases, or environmental challenges. Traditionally, manual monitoring of crop growth stages is labor-intensive, prone to errors, and inefficient on a large scale. With advancements in deep learning and computer vision, automated solutions offer the potential to revolutionize crop management systems. This study presents a novel application of YOLOv10, an advanced object detection algorithm, for the precise identification of pomegranate growth stages. Leveraging YOLOv10’s ability to perform real-time detection and capture intricate spatial features, the proposed approach focuses on classifying five critical growth stages: Bud, Early-Fruit, Flower, Mid-growth, and Ripe. The model's architecture is tailored to ensure robust detection under diverse conditions, including variations in lighting, angles, and environmental settings. Data augmentation and hyperparameter tuning techniques are integrated into the training pipeline to further improve model generalization and reliability. By automating the growth stage detection process, this approach significantly reduces the reliance on manual labor and enhances the efficiency of agricultural operations. Farmers and agricultural stakeholders can use the insights generated by this system to implement precise interventions, maximize yield, and maintain consistent quality standards. The application of YOLOv10 in this domain represents a significant step forward in the adoption of artificial intelligence for sustainable agriculture, providing an efficient, scalable, and reliable solution for crop monitoring and management. This work highlights the transformative potential of integrating deep learning into agricultural practices, paving the way for improved productivity and reduced resource wastage in pomegranate farming.
The objective of this project is to enhance defect detection capabilities in the metal industry, particularly for identifying small and elongated defects on steel strips, which pose significant challenges for traditional detection methods. These defects are difficult to detect due to the small pixel percentage they occupy, as well as the repeated downsampling in convolutional networks, which can lead to the loss of minute features. To address these issues, we propose an advanced real-time defect detection network based on YOLOv10, specifically designed to overcome these challenges. The proposed system utilizes an efficient channel attention bottleneck (EB) module with 1D convolution to enhance feature extraction, focusing on small and elongated defects. Furthermore, the model incorporates Context Transformation Networks with cross-stage localized blocks (CC modules), which improve the understanding of semantic contextual information and the relationships between features. This methodology helps preserve critical defect details that might otherwise be missed. In addition, the model is trained on a self-constructed dataset tailored to include small and elongated defects. This dataset refinement allows for better feature fusion and extraction, ultimately improving the model's ability to detect and classify various defect types. The YOLOv10-based system is evaluated on several defect detection datasets, including GC10-DET, NEU-DET, and the self-constructed SLD-DET dataset, demonstrating enhanced performance and robustness for defect detection, particularly in industrial applications.
Traditional methods for evaluating fresh fruit bunches (FFBs) in palm oil production are inefficient, costly, and lack scalability. This study evaluates the performance of YOLOv10 and other state-of-the-art object detection models using a novel dataset of oil palm FFBs captured in plantations of Central Kalimantan Province, Indonesia. The dataset consists of five ripeness classes: abnormal, ripe, underripe, unripe, and flower, with challenges such as partially visible objects, low contrast, occlusions, small object sizes, and blurry images. YOLOv10 was compared with other models, including YOLOv6s, YOLOv7 Tiny, YOLOv8s, and Faster R-CNN. YOLOv10 demonstrated superior performance, with a compact size of 10.5 MB, fast inference time (0.022 seconds), and high detection accuracy, achieving mAP50 at 0.82 and mAP50-95 at 0.58. The model completed training in just 1 hour, 35 minutes, with low training loss, indicating efficient convergence. Additionally, YOLOv10 achieved low Mean Absolute Error (MAE) of 0.10 and Root Mean Square Error (RMSE) of 0.32, suggesting high precision in FFB counting. Hyperparameter tuning revealed that using the SGD optimizer, a batch size of 16, and a learning rate of 0.001 achieved optimal performance, balancing both accuracy and efficiency. Data augmentation techniques significantly enhanced model performance, improving accuracy across different ripeness classes. When evaluated against state-of-the-art models, including Faster R-CNN, SSD MobileNetV2, YOLOv4, and EfficientDet-D0, YOLOv10 outperformed these models in speed, accuracy, and efficiency, making it highly suitable for real-time applications in palm oil harvesting. This study demonstrates the potential of YOLOv10 for automating the evaluation of FFBs, improving both the efficiency and sustainability of palm oil production in large-scale plantations.
In the detection of bone fractures from X-ray images, accurate and timely diagnosis is crucial to prevent further complications and ensure proper healing. Existing models, such as YOLO NAS (You Only Look Once - Neural Architecture Search), have shown potential in object detection but have limitations in detecting subtle bone fractures, particularly small or hairline fractures. To address these shortcomings, the proposed model utilizes YOLO V8, an advanced version of the YOLO framework, which builds on previous models by offering improved accuracy, speed, and efficiency in real-time object detection tasks. YOLO V8 enhances the detection capabilities by refining the architecture and optimizing performance, making it better suited for medical image analysis. The model is trained on a comprehensive dataset of 1200 hand-bone X-ray images, classified into six distinct fracture categories. A comparison of the YOLO V8 model with YOLO NAS highlights the improved ability of V8 to detect complex and subtle fractures, ensuring faster and more reliable diagnoses. This advancement is essential for clinical settings, where delays or misdiagnoses could lead to severe outcomes for patients.
Coffee plants are susceptible to several diseases, including Brown Eye, Leaf Rust, Leaf Miner, and Red Spider Mite, which significantly impact both yield and quality. Early detection and timely intervention are vital for minimizing crop losses and improving coffee production. This paper introduces a novel approach using the YOLOv8 (You Only Look Once version 8) deep learning model for real-time detection and classification of these diseases on coffee leaves. YOLOv8 is an advanced, efficient object detection model known for its speed and accuracy, making it suitable for on-field deployment. It processes images of coffee leaves and provides fast, accurate localization of disease symptoms, classifying them into one of the four disease categories: Brown Eye, Leaf Rust, Leaf Miner, and Red Spider Mite. The model is trained on a custom dataset consisting of a wide variety of coffee leaf images, ensuring robust performance across different environmental conditions. YOLOv8's lightweight architecture allows for deployment on mobile devices and drones, enabling immediate disease identification. Experimental results show that YOLOv8 achieves high accuracy, precision, and recall, outperforming traditional models in terms of detection speed and robustness. The use of YOLOv8 for disease detection can aid farmers in quickly identifying and treating affected plants, ultimately improving coffee quality, reducing crop losses, and minimizing the reliance on harmful pesticides. The proposed approach offers a practical, scalable solution for disease monitoring in coffee plantations.
Inspection of structural cracks is critical for maintaining the safety and longevity of bridges and other infrastructure. Traditional methods for crack detection are often manual, labor-intensive, and prone to human error. Recent advances in deep learning and semantic segmentation provide a promising alternative, but obtaining high-quality annotated data remains a significant challenge. This paper introduces an enhanced approach to crack detection using deep learning, leveraging synthetic data generation and advanced semantic segmentation techniques. We propose the use of DeepLabV3 with a ResNet50 backbone, an extension of the DeepLabV3 architecture that incorporates a robust ResNet50 feature extractor to improve segmentation. Our approach involves generating synthetic crack images to address the data scarcity issue. This is achieved using the StyleGAN3 for realistic image synthesis. By integrating these synthetic datasets with the DeepLabV3+ model, we aim to boost segmentation performance beyond the capabilities of standard models. Hyperparameter tuning is performed to optimize the DeepLabV3 with ResNet50 configuration, achieving significant improvements in segmentation. We employ data augmentation techniques such as motion blur, zoom, and defocus to further refine model performance. The proposed method is evaluated against existing state-of-the-art techniques, demonstrating superior accuracy. The results indicate that our approach not only enhances the crack detection but also offers a novel application of synthetic data generation in deep learning for semantic segmentation. This research provides new insights into leveraging advanced neural networks and synthetic data for improved structural crack analysis.
Underwater imaging is often affected by light attenuation and scattering in water, leading to degraded visual quality, such as color distortion, reduced contrast, and noise. Existing underwater image enhancement (UIE) methods, such as Contrast Limited Adaptive Histogram Equalization (CLAHE), Dark Channel Prior (DCP), and Maximum Intensity Projection (MIP), have shown some success but often lack generalization capabilities, making them unable to adapt to various underwater images captured in different aquatic environments and lighting conditions. To address these challenges, a UIE method based on the conditional denoising diffusion probabilistic model (DDPM) is proposed, called DiffWater, which leverages the advantages of DDPM and trains a stable and well-converged model capable of generating high-quality and diverse samples. While methods like CLAHE improve contrast and DCP helps recover depth information by reducing haze, they may not handle all distortion issues in underwater imaging. Therefore, DiffWater introduces a color compensation method that performs channel-wise compensation in the RGB color space, tailored to different water conditions and lighting scenarios. This compensation guides the denoising process, ensuring high-quality restoration of degraded underwater images. Additionally, methods like the Rayleigh Distribution (RAY), Retinex-based Global and Local Image Enhancement (RGHS), and Unsharp Masking with Laplacian Pyramid (ULAP) have been explored to handle noise reduction, contrast enhancement, and edge sharpening, but these methods often struggle with varying lighting conditions and water environments. In DiffWater, the integration of such principles, combined with the conditional guidance provided by the degraded underwater image with color compensation, offers a more adaptive and robust approach. The experimental results show that DiffWater, when tested against existing methods including DCP, RGHS, and ULAP, on four real underwater image datasets, outperforms these comparison methods in terms of enhancement quality and effectiveness. DiffWater exhibits stronger generalization capabilities and robustness, addressing the complex visual distortions present in various underwater conditions more effectively than traditional algorithms like CLAHE, DCP, and MIP.
In this approach, we replace the traditional Otsu method for Field of View (FOV) segmentation in retinal images with a deep learning-based model utilizing U-Net architectures. The preprocessing phase begins by converting the retinal image to grayscale, specifically using the red channel, which offers better contrast for the fine vascular structures. A logarithmic transformation is applied to the grayscale image to further enhance the visibility of small features such as micro aneurysms and capillaries. This step prepares the image for more accurate segmentation by emphasizing details essential for the detection of diabetic retinopathy and other retinal abnormalities. The core of the segmentation process relies on U-Net, a Convolutional neural network designed for medical image segmentation. U-Net consists of a contracting path that captures high-level contextual features through successive Convolutional layers and down sampling operations. This is followed by an expanding path that progressively up samples the feature maps and concatenates them with corresponding layers from the contracting path, enabling precise localization of the FOV region. The final step in the U-Net architecture involves a 1x1 convolution layer that produces the binary mask of the FOV region, followed by a sigmoid activation function to output the probability map of the segmented area.
The increasing prevalence of thyroid cancer underscores the critical need for efficient classification and early detection of thyroid nodules. Automated systems can significantly aid physicians by expediting diagnostic processes. However, achieving this goal remains challenging due to limited medical image datasets and the complexity of feature extraction. This study addresses these challenges by emphasizing the extraction of meaningful features essential for tumor detection. The proposed approach integrates advanced techniques for feature extraction, enhancing the capability to classify thyroid nodules in ultrasound images. The classification framework includes distinguishing between benign and malignant nodules, as well as identifying specific suspicious classifications. The combined classifiers provide a comprehensive characterization of thyroid nodules, demonstrating promising accuracy in preliminary evaluations. These results mark a significant advancement in thyroid nodule classification methodologies. This research represents an innovative approach that could potentially offer valuable support in clinical settings, facilitating more rapid and accurate diagnosis of thyroid cancer.
In the domain of medical image classification, the Xception model stands out for its advanced performance in analyzing intricate image data. This research applies the Xception model to classify chest computed tomography (CT) images into four distinct categories: adenocarcinoma, large cell carcinoma, normal, and squamous cell carcinoma. Xception, renowned for its use of depthwise separable convolutions, enhances feature extraction by effectively capturing complex patterns with reduced computational cost. The study encompasses a rigorous evaluation of the Xception model through extensive training, validation, and testing phases on a specialized multi-class chest CT image dataset. The dataset includes a balanced representation of the four classes to ensure robust model performance across varying conditions. Key aspects of the evaluation include the model's accuracy in distinguishing between different types of carcinoma and normal tissues, as well as its efficiency in handling computational demands. The results demonstrate that the Xception model provides superior classification accuracy and reliable diagnostic performance. By leveraging its advanced architecture, the approach significantly improves the precision of medical image classification, offering valuable insights for enhanced diagnostic support in clinical settings. This work underscores the effectiveness of the Xception model in advancing medical imaging analysis and its potential impact on improving patient care through more accurate disease classification.
Malaria, a life-threatening disease transmitted by mosquitoes, remains a major public health challenge, claiming thousands of lives each year. Limited access to reliable detection tools, combined with challenges such as insufficient laboratory resources and inexperienced personnel, contribute to its high mortality rate. Recently, advancements in image analysis of malaria-infected red blood cells (RBCs) have provided promising alternatives for more accessible detection methods. By leveraging digital microscopy and innovative machine learning approaches, researchers aim to develop practical solutions that can improve diagnostic accuracy and accessibility. This approach not only enables a faster response in clinical settings but also highlights the potential for integration with IoT-enabled devices, facilitating wider deployment in resource-constrained regions. Such advancements underscore the potential of image-based malaria detection methods to enhance early diagnosis and treatment, especially in areas with limited medical resources.
To address the challenges of few-shot aerial image semantic segmentation, where unseen-category objects in query aerial images need to be parsed with only a few annotated support images, we propose a novel approach by integrating a U-Net architecture with Efficient Net. Typically, in few-shot segmentation, category prototypes are extracted from support samples to segment query images in a pixel-wise matching process. However, the arbitrary orientations and distribution of aerial objects in such images often result in significant feature variations, making conventional methods, which do not account for orientation changes, ineffective. The rotation sensitivity of aerial images causes substantial feature distortions, leading to low confidence scores and misclassification of same-category objects with different orientations. To overcome these limitations, we propose an enhanced solution by combining U-Net for robust semantic segmentation with Efficient Net for efficient feature extraction and scale adaptability. This architecture, which we refer to as Efficient U-Net, introduces rotation-invariant feature extraction to handle the varying orientations of aerial objects. By leveraging Efficient Net’s scalable Convolutional layers for feature extraction, we ensure that the network can capture orientation-varying yet category-consistent information from support images. This approach enhances the segmentation accuracy by aligning same-category objects, irrespective of their orientation, thereby minimizing the oscillation of confidence scores and improving the detection of rotated semantic objects. This Efficient U-Net model provides a scalable, rotation-invariant solution to the few-shot segmentation of aerial images.
Histotripsy, a focused ultrasound therapy, effectively ablates tissue by leveraging bubble clouds and has potential for treating conditions such as renal tumors. To enhance monitoring and evaluation, this study combines classification and segmentation techniques using deep learning models. A convolutional neural network (CNN) was employed for classification, distinguishing treated and untreated tissue regions, while the U-Net model was utilized for precise segmentation of ablation zones in ultrasound images. The U-Net architecture was fine-tuned using transfer learning to optimize segmentation accuracy. Ultrasound images of ablated red blood cell phantoms and ex vivo kidney tissues were used for training and testing, with digital photographs serving as ground truth. The performance of these models was compared to manual annotations by expert radiologists. The CNN achieved high accuracy in classifying tissue states, and the U-Net demonstrated robust segmentation, closely matching expert manual annotations. Segmentation performance improved with increased treatment exposure, achieving a Dice similarity coefficient exceeding 85% for 750+ applied pulses. Application of the U-Net to ex vivo kidney tissue revealed morphological shifts consistent with histology findings, confirming targeted tissue ablation. The integration of CNN-based classification and U-Net segmentation demonstrated significant potential for automating and enhancing the monitoring of histotripsy outcomes. This combined approach offers a reliable and efficient means of visualizing treatment progress, supporting real-time decision-making during therapeutic procedures. The study highlights the capability of deep learning models to automate and improve treatment monitoring in histotripsy, paving the way for real-time, data-driven interventions.
Automated detection of spinal lesions from Computed Topographies (CTs) can significantly enhance diagnostic efficiency and accuracy. This project implements a Computer-Aided Detection (CADe) system for spinal lesion analysis using the Xception model. The system is equipped with an intuitive web-based interface developed using the Flask framework, allowing physicians to seamlessly integrate it into their diagnostic workflow. The CADe system processes input CT scans by extracting vertebral regions of interest, converting them into 2D slices, and performing preprocessing to ensure optimal input quality for the Xception model. The model, fine-tuned with Transfer Learning, classifies vertebrae as either healthy conditions such as metastases, primary tumors, or sclerotic lesions. The training and testing datasets were created from CT scans of patients Records. Data augmentation techniques were applied to expand the dataset and improve model generalization. The Xception model achieved a high accuracy of and a recall of 92.99%, demonstrating its effectiveness in spinal lesion detection. This system aims to provide a reliable and efficient tool to assist medical professionals in spinal lesion diagnosis, enhancing clinical decision-making and patient outcomes.
Emotion recognition through facial expressions is a critical area of computer vision, enabling systems to understand human emotions for applications such as human-computer interaction, healthcare, and security. Recent advancements have led to the development of various deep learning models for emotion recognition. While transformers have shown promise, they often come with high computational costs, particularly in handling space-time attention mechanisms. To address this, we propose a novel approach for emotion recognition using a Convolutional Neural Network (CNN), which effectively extracts spatial features from facial images while being computationally efficient. Our CNN-based model is designed to focus on learning discriminative facial features that are crucial for recognizing a wide range of emotions. The model leverages a frame-wise deep learning architecture, allowing it to process each frame independently while capturing important facial patterns. We evaluate the performance of the proposed CNN-based model on benchmark dataset, Fer-2013plus (Facial-Emotion-Recognition), with geometric transformations used for data augmentation to address class imbalances. The results demonstrate that our CNN-based approach achieves competitive performance, either outperforming or matching the accuracy of techniques in emotion recognition. Furthermore, an ablation study on the challenging Fer2013+ dataset highlights the potential and effectiveness of the proposed model for handling complex emotion recognition tasks in real-world applications.
Face Segmentation is a critical research area in computer vision, with facial feature extraction playing a key role in improving accuracy. This paper focuses on the application of semantic segmentation methods for facial feature extraction. The structure and parameter count of the model significantly impact the performance of these tasks. To enhance accuracy and efficiency, we propose the use of the U-Net architecture for semantic segmentation in face recognition. The U-Net model is employed due to its ability to effectively capture spatial information through an encoder-decoder structure, which is crucial for precise segmentation of facial features. In our approach, we incorporate multi-scale feature extraction to balance between accuracy and the number of parameters, using large convolutional kernels for an expanded receptive field. Additionally, we use a channel attention mechanism to optimize feature aggregation from different depths, and depthwise separable convolution to reduce the computational burden. Our experimental results demonstrate that the proposed model, with fewer parameters, achieves high accuracy in semantic segmentation tasks for facial feature extraction.
This paper addresses the challenge of accurately classifying plant leaf diseases by proposing a novel deep learning approach based on Convolutional Neural Networks (CNN). Traditional CNN models often struggle to effectively capture the spatial and posture relationships of plant disease lesions, leading to issues with recognition accuracy and robustness. To overcome this limitation, we introduce an optimized CNN architecture designed specifically for plant leaf disease image classification. The proposed system enhances feature extraction by incorporating advanced convolutional layers that better capture the fine-grained details of leaf lesions. Additionally, a channel attention mechanism is integrated into the network to improve its focus on the most critical features associated with disease detection. To further improve performance, the architecture is designed to handle image transformations such as rotations, scaling, and flipping, ensuring the model's robustness across diverse real-world conditions. In addition to classification, the proposed approach also incorporates disease lesion detection using OpenCV. By utilizing OpenCV for image processing, such as drawing bounding boxes around given image, the model not only classifies the diseases but also locates them accurately the plant leaf images. This step enhances the interpretability of the model and provides more detailed information about the affected regions, which can be useful for precision agriculture applications. The model is trained and tested on multiple plant disease datasets, demonstrating significant improvements in classification accuracy, robustness, and generalization compared to traditional CNN models. The proposed method provides a reliable and efficient solution for automatic plant disease diagnosis, offering significant potential for agricultural applications and crop management.
Lesion area segmentation in endoscopic images plays a vital role in the early detection and diagnosis of diseases, aiding doctors in locating and identifying abnormal areas, which is crucial for improving patient outcomes. The U-Net model, known for its encoder-decoder architecture with skip connections, has achieved great success in medical image segmentation by capturing fine details and preserving spatial information. However, U-Net may face challenges in capturing global context information, which is essential for identifying larger and more complex lesions. In this paper, we propose an enhanced version of the U-Net model specifically for endoscopic image segmentation. The model is designed to improve the accuracy and precision of lesion boundary detection, ensuring better localization of abnormal areas in the images. To further enhance its performance, we integrate the OpenCV method, utilizing image preprocessing techniques such as noise reduction, contrast enhancement, and image normalization, which improve the model's robustness and efficiency in handling real-world medical data. The proposed method demonstrates promising segmentation performance, offering significant potential for improving clinical analysis, diagnosis, and decision-making in medical applications.
Skin Lesion segmentation remains the most prevalent form of cancer worldwide, and early detection significantly enhances the effectiveness of treatment. While deep learning techniques have substantially improved segmentation, challenges such as variability in lesion sizes, shapes, colors, and differences in contrast levels persist. This paper introduces a robust approach leveraging the SegNet architecture for precise skin lesion segmentation. SegNet, with its encoder-decoder structure, effectively preserves spatial information while capturing intricate lesion details, making it highly suitable for handling diverse lesion characteristics. To further enhance performance, OpenCV is employed for preprocessing tasks, including resizing, noise reduction, contrast enhancement, and augmentation techniques. These steps improve the model’s ability to handle real-world variability in skin lesion imaging. The proposed framework incorporates features from multiple data sources, including distinct color bands, grayscale images immune to illumination variations, and shading-reduced images, in combination with standard RGB channels. This fusion of features enables the model to address challenges related to shading and lighting inconsistencies. The results demonstrate the effectiveness of the SegNet-based approach in accurately delineating lesion boundaries, even in challenging cases with irregular shapes and varying contrast. This methodology highlights the potential of combining SegNet architecture with advanced preprocessing techniques to improve skin Lesion segmentation.
In today’s digital age, the rapid and widespread use of images across various platforms raises significant concerns about the security and confidentiality of visual data. Images often contain sensitive information, and their vulnerability to unauthorized access or tampering makes it imperative to adopt robust encryption methods. This study presents an innovative approach to securing digital images through the implementation of the Advanced Encryption Standard (AES) algorithm in Cipher Block Chaining (CBC) mode. The AES encryption algorithm is widely recognized for its efficiency and robustness in securing sensitive data, and by utilizing CBC mode, the security of the image data is further enhanced through block chaining, ensuring that each block of the encrypted data is dependent on the previous one. This provides better protection against common cryptographic attacks, making it more difficult for an unauthorized entity to access or alter the image.The proposed system is built using Flask, a lightweight web framework, offering users a seamless and user-friendly interface for image encryption and decryption. Through this system, users can easily upload an image, encrypt it with a secret key, and securely store or transmit the encrypted image. The decryption process is equally straightforward, allowing users to retrieve the original image using the correct key. Additionally, user authentication is integrated into the system with a registration and login mechanism, ensuring that only authorized individuals can access the encryption and decryption functionalities. The system is designed to handle a variety of image formats, providing flexibility and adaptability in real-world applications.This approach to image encryption combines ease of use with high levels of security, making it an ideal solution for anyone looking to protect their images from unauthorized access. By applying AES in CBC mode, the proposed method effectively addresses the growing need for secure image handling in a world where the security of digital data is of paramount importance. Furthermore, the solution’s adaptability and simplicity make it accessible to both technical and non-technical users, promoting widespread use in different industries where image security is crucial.
Single Infrared Image Super-Resolution (SISR) aims to enhance the spatial resolution of low-quality infrared images. This task is particularly challenging due to the inherent noise and limited information content in infrared images. To address these limitations, we propose a novel approach that leverages advanced deep learning techniques to effectively restore high-resolution details. Our method effectively captures and exploits the underlying structure of infrared images. By employing advanced feature extraction and reconstruction techniques, we are able to generate significantly improved image quality. Extensive experiments on various benchmark datasets demonstrate the superior performance of our proposed method in terms of both quantitative and qualitative metrics. An edge-point classification method using the radius of the shortest distance between the whale and the current global optimum in each iteration is presented to enhance a preliminary edge. The experimental results show that the proposed edge detection method has the advantages of strong denoising, fast speed, and good quality.
This presents a hybrid approach to gunshot sound detection by integrating Mel-Frequency Cepstral Coefficients (MFCC), Support Vector Machines (SVM), and YAMNet, a pre-trained deep learning model. The process begins with the extraction of MFCC features from audio data, which capture the essential characteristics of the sound spectrum. These features are then used to train an SVM model to classify sounds as gunshots or non-gunshots. To enhance detection accuracy, YAMNet is employed to classify the audio into a broader range of categories, providing an additional layer of validation or complementing the SVM's predictions. The combination of SVM's precision with YAMNet's extensive sound classification capabilities results in a robust system capable of accurately identifying gunshot sounds in real-time audio streams. This hybrid approach leverages both traditional machine learning and state-of-the-art deep learning techniques, offering a reliable solution for gunshot detection in various applications.
This study presents an advanced approach to detecting lung auscultation sounds using Mel-frequency Cepstral Coefficients (MFCC), Chroma features, and neural networks. Lung auscultation, a key diagnostic tool in identifying respiratory conditions, often relies on the expertise of medical professionals to interpret subtle sound patterns. However, automated systems that accurately classify these sounds can greatly assist in early diagnosis and treatment. To achieve this, we employed MFCC, which captures the power spectrum of sounds and effectively models the way humans perceive auditory signals, focusing on the critical frequency ranges for lung sounds. Additionally, Chroma features, which represent the tonal content of audio signals, were used to capture harmonic aspects that could be indicative of specific lung conditions. These features were then fed into a neural network designed to classify lung sounds into various diagnostic categories, such as normal breathing, wheezing, crackles, and other abnormal respiratory sounds. The neural network, trained on a comprehensive dataset of lung sounds, was able to learn complex patterns and correlations within the MFCC and Chroma features, leading to high accuracy in sound classification. This automated approach offers a powerful tool for enhancing the precision of lung sound diagnosis, potentially leading to earlier detection of respiratory conditions and improved patient outcomes.
Information and Communication Technologies have propelled social networking and communication, but cyber bullying poses significant challenges. Existing user-dependent mechanisms for reporting and blocking cyber bullying are manual and inefficient. Conventional Machine Learning and Transfer Learning approaches were explored for automatic cyber bullying detection. The study utilized a comprehensive dataset and structured annotation process. Textual, sentiment and emotional, static and contextual word embeddings, psycholinguistics, term lists, and toxicity features were employed in the Conventional Machine Learning approach. This research introduced the use of toxicity features for cyber bullying detection. Contextual embeddings of word Convolutional Neural Network (Word CNN) demonstrated comparable performance, with embeddings chosen for its higher F-measure. Textual features, embeddings, and toxicity features set new benchmarks when fed individually. The model achieved a boosted F-measure combining textual, sentiment, embeddings, psycholinguistics, and toxicity features in a Logistic Regression model. This outperformed Linear SVC in terms of training time and handling high-dimensionality features. Transfer Learning utilized Word CNN for fine-tuning, achieving a faster training computation compared to the base models. Additionally, cyber bullying detection through Flask web was implemented, yielding an accuracy. The reference to the specific dataset name was omitted for privacy.
Postpartum depression (PPD) is a widespread mental health disorder impacting new mothers worldwide, arising from a complex interplay of emotional, social, and physiological changes following childbirth. Early detection is crucial, as timely intervention can significantly improve maternal and child well-being. In this study, we propose a hyperparameter-optimized XGBoost classifier aimed at accurately predicting PPD risk based on responses to a standardized questionnaire. Our research uses a dataset of 1,503 participants collected from a medical institution through a digital survey platform (Google Forms), capturing key demographic, social, and health-related factors.Our approach applies extensive hyperparameter tuning to optimize the XGBoost classifier's performance, which we then benchmarked against ten alternative machine learning models to determine its efficacy. The XGBoost classifier, when optimized, demonstrated a substantial accuracy increase, making it a strong predictive tool for clinical applications. To validate its robustness, we employed k-fold cross-validation, which confirmed the model's reliability and consistency. This study underscores the importance of specific risk factors in PPD onset, positioning our optimized XGBoost model as an efficient predictive solution in maternal healthcare for early PPD risk assessment and prevention planning.
Most companies nowadays are using digital platforms for the recruitment of new employees to make the hiring process easier. The rapid increase in the use of online platforms for job posting has resulted in fraudulent advertising. Scammers exploit these platforms to make money through fraudulent job postings, making online recruitment fraud a critical issue in cybercrime. Therefore, detecting fake job postings is essential to mitigate online job scams. Traditional machine learning and deep learning algorithms have been widely used in recent studies to detect fraudulent job postings. This research focuses on employing Long Short-Term Memory (LSTM) networks to address this issue effectively. A novel dataset of fake job postings is proposed, created by combining job postings from three different sources. Existing benchmark datasets are outdated and limited in scope, restricting the effectiveness of existing models. To overcome this limitation, the proposed dataset includes the latest job postings. Exploratory Data Analysis (EDA) highlights the class imbalance problem in detecting fake jobs, which can cause the model to underperform on minority classes. To address this, the study implements ten top-performing Synthetic Minority Oversampling Technique (SMOTE) variants. The performances of the models, balanced by each SMOTE variant, are analyzed and compared. Among the approaches implemented, the LSTM model achieved a remarkable accuracy of 97%, demonstrating its superior performance in detecting fake job postings.
Addressing the intricate challenge of fake news detection, traditionally reliant on the expertise of professional fact-checkers due to the inherent uncertainty in fact-checking processes, this research leverages advancements in language models to propose a novel Long Short-Term Memory (LSTM)-based network. The proposed model is specifically tailored to navigate the uncertainty inherent in the fake news detection task, utilizing LSTM's capability to capture long-range dependencies in textual data. The evaluation is conducted on the well-established LIAR dataset, a prominent benchmark for fake news detection research, yielding an impressive accuracy of 99%. Moreover, recognizing the limitations of the LIAR dataset, we introduce LIAR2 as a new benchmark, incorporating valuable insights from the academic community. Our study presents detailed comparisons and ablation experiments on both LIAR and LIAR2 datasets, establishing our results as the baseline for LIAR2. The proposed approach aims to enhance our understanding of dataset characteristics, contributing to refining and improving fake news detection methodologies by effectively leveraging the strengths of LSTM architecture.
In today’s world, personal safety in environments such as remote or isolated areas, where individuals may be working alone, has become a critical concern. Threats such as robbery, assault, and other criminal activities are often accompanied by specific sounds, which can serve as early indicators of potential danger. While traditional security systems are available, they often fail to detect or classify these sounds with the necessary accuracy or in real-time. This project aims to address this challenge by developing a system that classifies different types of surrounding audio events, allowing for a deeper understanding of the environment in real-time. The focus of this project is on accurately detecting and classifying various audio signals, which may include common environmental sounds such as footsteps, vehicle noise, or background chatter. By applying a deep learning model, specifically a 1D Convolutional Neural Network (CNN), the system processes audio data from real-world environments to classify these sounds into distinct categories. The 1D-CNN model is well-suited for this task, as it can effectively capture time-dependent features from the audio signals. The model is trained using a dataset of labeled audio events, where each audio clip is associated with a specific sound category. The deep learning model analyzes these signals, extracting key features that help distinguish between different audio events. This approach offers a powerful tool for understanding environmental audio in various settings, such as urban areas, workplaces, or isolated locations. By focusing on real-time sound classification, this project contributes to improving situational awareness and providing a foundation for further advancements in sound-based monitoring and analysis systems.
Detecting suicidal ideation through social media content is a critical initiative to support mental health intervention strategies. This study presents an explainable framework that leverages advanced Natural Language Processing (NLP) techniques to address the challenges of identifying suicidal intent in user-generated content. A significant innovation in this work is the creation of synthetic datasets informed by psychological and social factors associated with suicidal ideation, designed to supplement limited real-world data while maintaining ethical considerations. The proposed system classifies social media content into two categories: Non-Suicidal or Suicidal. The hybrid approach of combining synthetic and real-world data enhances model performance, achieving superior accuracy and robustness compared to traditional methods. The framework emphasizes explainability by incorporating techniques that identify key linguistic and contextual features driving model predictions, ensuring interpretability for mental health professionals and researchers. This approach underscores the potential of integrating synthetic data and NLP in addressing real-world challenges such as data scarcity, diversity, and ethical concerns. By providing actionable insights and ensuring transparency, the proposed framework contributes to building reliable and scalable solutions for suicide prevention in digital environments.
Climate change remains one of the most pressing global challenges, and understanding public sentiment surrounding this issue is critical for shaping effective policy and response strategies. Social media platforms, particularly Twitter, have become key venues for individuals to express their opinions, concerns, and reactions to climate change-related topics. To capture and analyze this sentiment, this study employs Natural Language Processing (NLP) techniques combined with Long Short-Term Memory (LSTM) networks, a type of recurrent neural network, to analyze tweets related to climate change. The LSTM model, renowned for its ability to capture long-range dependencies in text data, is utilized to classify sentiment and extract meaningful insights from the discourse. By applying advanced NLP techniques and deep learning, this study aims to provide a comprehensive understanding of public sentiment on climate change, enabling stakeholders, including policymakers and environmental organizations, to better grasp public perceptions and inform strategies to tackle the climate crisis.
Depression is a prevalent mental health condition that significantly impacts individuals’ lives, and its timely detection is crucial for effective intervention. Traditional machine learning approaches often struggle due to the limitations in annotated data and the lack of transparency in model predictions. This study aims to address these challenges by employing advanced natural language processing (NLP) techniques and deep learning algorithms, specifically Long Short-Term Memory (LSTM) networks, to develop an explainable model for depression detection in social media content. The primary objective is to classify social media text into two categories: depression and control, based on linguistic patterns indicative of depressive symptoms. The model leverages LSTM to capture the sequential dependencies in text, making it capable of identifying nuanced patterns that distinguish between depression-related and non-depression-related content. Additionally, the study incorporates interpretability methods such as attention mechanisms to provide insights into the features influencing the model's predictions, thus ensuring transparency and trust in the decision-making process. The proposed model is evaluated using a publicly available Mental Health dataset, which contains labeled social media posts. The results demonstrate the effectiveness of LSTM in classifying text into depression and control categories, contributing to the field of mental health by offering a scalable and interpretable approach for early depression detection. This research has the potential to assist mental health professionals by enabling the automated identification of depression in social media content, facilitating timely intervention and improving overall well-being.
Sarcasm detection presents unique challenges due to the complex linguistic and contextual nature of sarcastic expressions. Understanding sarcasm in text requires advanced methods capable of capturing nuanced patterns that are not easily detectable by traditional approaches. In this study, we propose the use of Convolutional Neural Networks (CNN) for sarcasm detection. The CNN model is designed to identify and classify sarcastic content by learning intricate patterns from the text. Our approach demonstrates that CNNs, with their ability to effectively capture spatial hierarchies and context in textual data, offer superior performance in detecting sarcasm compared to simpler methods. The study also emphasizes the importance of sophisticated data augmentation techniques to address issues like data imbalance, further enhancing the model’s effectiveness. This work contributes to advancing sarcasm detection, providing valuable insights for applications in sentiment analysis and natural language understanding.
The rapid growth of mobile applications and online e-commerce platforms has made it increasingly easy to gather large amounts of data, providing valuable insights into consumer behavior. Analyzing user reviews has become essential in assisting users with purchasing decisions. In the proposed system, we introduce a solution by combining NLP (Natural Language Processing) techniques with a CNN (Convolutional Neural Network) model for review classification. The model incorporates text preprocessing, tokenization, and word embedding techniques to better understand the nuances of review content. The CNN-based architecture enhances the ability to detect meaningful patterns and relationships in the data, significantly improving prediction accuracy and computational efficiency. This approach overcomes the limitations of previous methods by providing a more accurate and scalable model for review analysis. It can be easily adapted to handle large-scale datasets and diverse textual data. Through experimental evaluation, the proposed system demonstrates superior performance, showing better classification results compared to existing approaches. By focusing on key patterns and relationships within the text data, the system offers an efficient and effective solution for predicting helpful reviews and enhancing decision-making confidence in e-commerce platforms.
In our proposed approach, BERT is used to capture the contextual meaning of the user feedback, significantly improving the model's ability to understand subtle details and intricacies in the textual data. The output of BERT is then passed through an LSTM layer, which is capable of capturing sequential dependencies in the data, making it ideal for analyzing user feedback over time and identifying patterns in emotional expressions.This model aims to classify the emotional tone of user feedback into five categories: Angry, Sad, Fear, Surprise, and Happy. By processing the crowd-user feedback from low-ranked software applications, we can identify prevalent issues while also classifying the emotional reactions to those issues. This allows software developers to prioritize bug fixes based on both the frequency of the issues and the intensity of user emotions.Our results demonstrate an accuracy of 92%, outperforming traditional ML algorithms like MNB, LR, RF, and MLP. The improved accuracy is attributed to the combination of BERT's deep contextual understanding and LSTM's ability to model sequential dependencies. Moreover, the approach provides a powerful tool for software vendors to not only identify critical issues in their applications but also gain insights into the emotional impact of these issues on users. This will enable software vendors to take more informed actions in improving their products, enhancing user satisfaction, and prioritizing fixes in a timely manner.
To achieve quality education, a key goal of sustainable development, it is essential to provide stakeholders with accurate and relevant information about educational institutions. Prospective students often face challenges in obtaining consistent and reliable information about universities or institutes, especially regarding unique courses and opportunities. These inconsistencies, stemming from sources such as websites, rankings, and brochures, can lead to confusion and influence decision-making. A robust solution to address this challenge is the implementation of a chatbot application on the university's official website. A chatbot, powered by artificial intelligence, can simulate human-like conversations and respond promptly to student inquiries. By leveraging Natural Language Processing (NLP) techniques, a chatbot can provide predefined, accurate, and uniform information 24/7, making it a valuable tool for the counseling process. In this research, a chatbot was developed using NLP concepts, specifically the NLTK library, and trained using neural networks to achieve exceptional performance. The system processed and structured user queries by creating an intents.json file, tokenizing and lemmatizing input text, and converting data into a bag-of-words representation. The neural network, optimized using advanced techniques, achieved an impressive accuracy of 99%. This approach demonstrated the effectiveness of sequential models, which prevent overfitting and excel in handling contextual queries. Additionally, the chatbot incorporated pattern matching and semantic analysis to enhance real-time query resolution. By integrating advanced NLP methods and neural networks, this research provides a robust and scalable chatbot solution, offering precise, consistent, and accessible information to prospective students, ultimately aiding them in making well-informed academic decisions.