Abstract:Content extraction is a kind of data mining technology which is widely used in Internet. The main purpose is to extract the topic content and provide the data for Web Data Mining .In this paper, to improve web-based tree structure, First of all the Webpage divided into blocks , to each block of which is stored in the tree structure, then all the blocks of variance and threshold calculation, choose the topic information. In comparison with traditional methods based on Regular Expressions, this method is more simple and useful. Experimental results show that the extraction precision is higher than 96%, and the method has good value of practice.