Java Sax ContentHandler 解析超大的XML
解析超大的XML文件或者文本使用如果用常用的方法,100M的文件可能要1个多小时,甚至于还可能出现内存溢出等问题。
本文介绍ContentHandler解析超大的xml内容,100M的内容1~2秒左右就解析并入库成功
1、构造一个ContentHandler
import java.util.ArrayList;
import java.util.List;
import org.springframework.util.CollectionUtils;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class RecipientHandler extends DefaultHandler {
private List<Recipient> recipients;
private List<NameValue> fields;
private Recipient recipient;
private NameValue field;
private TestService testService;
private StringBuilder sb = new StringBuilder();
public RecipientHandler(TestService testService) {
this.testService = testService;
}
@Override
public void startDocument() throws SAXException {
recipients = new ArrayList<>();
fields = new ArrayList<>();
}
@Override
public void endDocument() throws SAXException {
saveRecipients();
}
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
sb.delete(0, sb.length()); // 清空sb
if('recipients'.equals(qName)) {
recipient = new Recipient();
} else if('fields'.equals(qName)) {
field = new NameValue();
}
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if ('recipients'.equals(qName)) {
// 将fields整理成recipient对象
for (NameValue field : fields) {
String name = field.getName();
if('name'.equals(name)) {
recipient.setName(field.getValue());
} else if('create_date'.equals(name)) {
recipient.setCreateDate(field.getValue());
}
}
recipients.add(recipient);
fields.clear();
if (recipients.size() >= 1000) {
saveRecipients(); // 保存
}
} else if('fields'.equals(qName)) {
fields.add(field);
} else if('id'.equals(qName)){
recipient.setId(sb.toString());
} else if('name'.equals(qName)){
field.setName(sb.toString());
} else if('value'.equals(qName)){
field.setValue(sb.toString());
}
}
private void saveRecipients() {
if(CollectionUtils.isEmpty(recipients) || testService == null) {
return;
}
testService.addRecipients(recipients);
recipients.clear();
}
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
String data = new String(ch, start, length).trim();
sb.append(data); // 当文本过大时,可能不能一次取到完整的data值,会分多次获取
}
}
2、使用ContentHandler
XMLReader parser = XMLReaderFactory.createXMLReader();
// RecipientHandler 实现了解析数据,并保存到数据库
parser.setContentHandler(new RecipientHandler(testService));
StringReader stringReader = new StringReader(xmlString);
InputSource is = new InputSource(stringReader);
is.setEncoding('UTF-8');
parser.parse(is);
总体代码特别少,SAX一次解析就获取到所有的业务数据。重点在于理解startElement()和 endElement(),startDocument() 和 endDocument()这两个关键点。
startElement()和 endElement()需要由这两个方法构造出业务对象,每个xml的格式不一样,需要自己去领会; startDocument() 和 endDocument()做初始化和最后的清理工作。
|