Abstract:The lack of visual capability in large language models and the insufficient accuracy of existing multimodal models in Traditional Chinese Medicine (TCM) tongue-image analysis are identified as current challenges. To address these issues, a TCM tongue-image assisted consultation system was constructed, enabling automatic classification of tongue images and the generation of auxiliary diagnostic schemes. Based on UNet, tongue-image segmentation was performed, and a multi-label classification system was built using the TransNeXt backbone to achieve both tongue-body segmentation and multi-label classification. Furthermore, the classification system was integrated with a large language model through a fine-tuned function-calling framework, thereby realizing the TCM tongue-image assisted consultation system.In the validation experiments, the UNet model demonstrated favorable performance in tongue-image segmentation, with mean Intersection over Union, recall, and precision reaching 97.58%, 98.61%, and 96.25%, respectively. The multi-label classification model developed with the TransNeXt backbone exhibited superior performance in subset accuracy, precision, recall, and F1-score, achieving 75.69%, 91.18%, 91.41%, and 91.28%, respectively. In the generation of TCM auxiliary diagnostic schemes, the best-performing large language model outperforms the multimodal model in Bleu-4, Rouge-1, Rouge-2, and Rouge-L metrics, achieving 79.03%, 82.46%, 76.00%, and 86.46%, respectively. This study demonstrates that by combining the tongue-image multi-label classification system with a large language model, a TCM tongue-image assisted consultation system is realized. The system is capable of supporting TCM tongue diagnosis and provides technical support for the generation of auxiliary diagnostic schemes.