Abstract:Lithology identification is regarded as a critical component of formation subdivision and reservoir evaluation during the process of oil and gas exploration and development. To address the challenges posed by high-dimensional non-linear characteristics of well-logging data and the imbalanced distribution of lithological samples under complex geological conditions, a novel identification method based on a data balancing strategy and an improved TabNet network is proposed. Firstly, the Borderline-SMOTE (BSMOTE) algorithm is introduced for data equalization to mitigate model training biases caused by the scarcity of minority lithological samples in the raw logging data. Secondly, an improved TabNet model integrated with a Vision Transformer (ViT) is constructed. By embedding the ViT encoder module into the Attentive Transformer of the original TabNet architecture, a composite attention transformation module with global feature modeling capabilities is formed. Experimental results based on field data from the Mahu Sag (M block) in Xinjiang demonstrate that the improved TabNet model, following BSMOTE-based data balancing, achieves a lithology identification accuracy of 92.79%. The comprehensive performance is found to be significantly superior to traditional machine learning methods, such as Random Forest (RF), as well as deep learning benchmarks like Convolutional Neural Networks (CNN).