介绍
我正在构建一个合并几个大型排序的csv文件的进程.我目前正在研究使用Univocity来做到这一点.我设置合并的方法是使用实现类似接口的bean.
特定
简化文件如下所示:
id,data
1,aa
2,bb
3,cc
bean看起来像这样(getter和setters ommited):
public class Address implements Comparable<Address> {
@Parsed
private int id;
@Parsed
private String data;
@Override
public int compareTo(Address o) {
return Integer.compare(this.getId(), o.getId());
}
}
比较器如下所示:
public class AddressComparator implements Comparator<Address>{
@Override
public int compare(Address a, Address b) {
if (a == null)
throw new IllegalArgumentException("argument object a cannot be null");
if (b == null)
throw new IllegalArgumentException("argument object b cannot be null");
return Integer.compare(a.getId(), b.getId());
}
}
由于我不想读取内存中的所有数据,我想读取每个文件的顶级记录并执行一些比较逻辑.这是我的简化示例:
public class App {
private static final String INPUT_1 = "src/test/input/address1.csv";
private static final String INPUT_2 = "src/test/input/address2.csv";
private static final String INPUT_3 = "src/test/input/address3.csv";
public static void main(String[] args) throws FileNotFoundException {
BeanListProcessor<Address> rowProcessor = new BeanListProcessor<Address>(Address.class);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setRowProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);
CsvParser parser = new CsvParser(parserSettings);
List<FileReader> readers = new ArrayList<>();
readers.add(new FileReader(new File(INPUT_1)));
readers.add(new FileReader(new File(INPUT_2)));
readers.add(new FileReader(new File(INPUT_3)));
// This parses all rows, but I am only interested in getting 1 row as a bean.
for (FileReader fileReader : readers) {
parser.parse(fileReader);
List<Address> beans = rowProcessor.getBeans();
for (Address address : beans) {
System.out.println(address.toString());
}
}
// want to have a map with the reader and the first bean object
// Map<FileReader, Address> topRecordofReader = new HashMap<>();
Map<FileReader, String[]> topRecordofReader = new HashMap<>();
for (FileReader reader : readers) {
parser.beginParsing(reader);
String[] row;
while ((row = parser.parseNext()) != null) {
System.out.println(row[0]);
System.out.println(row[1]);
topRecordofReader.put(reader, row);
// all done, only want to get first row
break;
}
}
}
}
题
在上面的示例中,我如何以这样的方式解析它遍历每一行并返回每行一个bean,而不是解析整个文件?
我正在寻找这样的东西(这不工作的代码只是为了表明我正在寻找的解决方案):
for (FileReader fileReader : readers) {
parser.beginParsing(fileReader);
Address bean = null;
while (bean = parser.parseNextRecord() != null) {
topRecordofReader.put(fileReader, bean);
}
}
最佳答案: 有两种方法可以迭代读取而不是将所有内容加载到内存中,第一种方法是使用BeanProcessor而不是BeanListProcessor:
settings.setRowProcessor(new BeanProcessor<Address>(Address.class) {
@Override
public void beanProcessed(Address address, ParsingContext context) {
// your code to process the each parsed object here!
}
为了在没有回调的情况下迭代地读取bean(并执行一些其他常见过程),我们创建了一个CsvRoutines类(从AbstractRoutines开始 – 更多示例here):
File input = new File("/path/to/your.csv")
CsvParserSettings parserSettings = new CsvParserSettings();
//...configure the parser
// You can also use TSV and Fixed-width routines
CsvRoutines routines = new CsvRoutines(parserSettings);
for (Address address : routines.iterate(Address.class, input, "UTF-8")) {
//process your bean
}
希望这可以帮助! 来源:http://www./content-1-191851.html
|