Lucene学习笔记--关于索引的主要函数

maomao 2005-12-10

展开全文

Lucene学习笔记- -

Lucene学习笔记 /*关于索引的主要函数*/
File file=new File(argv[]);
IndexWriter writer = new IndexWriter("index", new StandardAnalyzer(), true);

Document doc = new Document();
doc.add(Field.Text("path", file.getPath()));
doc.add(Field.Keyword("modified",DateField.timeToString(file.lastModified())));
FileInputStream is = new FileInputStream(f);
Reader reader = new BufferedReader(new InputStreamReader(is));
doc.add(Field.Text("contents", reader));

writer.addDocument(doc);

writer.optimize();
writer.close();

/*关于检索的主要函数*/
Searcher searcher = new IndexSearcher("index");
Analyzer analyzer = new StandardAnalyzer();
Query query = QueryParser.parse(lineforsearch, "contents", analyzer);
Hits hits = searcher.search(query);
for (int i = start; i < hits.length(); i++) {
Document doc = hits.doc(i);
String path = doc.get("path");
System.out.println(i + ". " + path);
}

怎么查看Lucene中分词器的分词效果.
How can I find the effect of an analyzer on a given text ?
Write a simple test code that creates a TokenStream over a string (using java.io.StringReader) and prints its output.

Here is a sample (non tested) code:

final java.io.Reader reader = new StringReader("Testing Test Tester"); final TokenStream in = myAnalyzer.tokenStream(reader); for(;;) { final Token token = in.next(); if (token == null) { break; } System.out.println("[" + token.termText() + "]"); }

在Apache Lucene中如何删除记录
-----------------------------

有两种办法删除,如下:

The delete(int) method is used when the sequential number of the document to be deleted within the index is known. For example, when iterating the document list and deleting documents that match certain criteria, the sequential number of the current document is available for the document number iterator.

The delete(Term) method can be used when a term that matches exactly the document(s) you want to delete can be specified. For example, if you know the location (URL) of the document, you can use it to delete the document of that location. Or, if you want to delete all the documents from a specific site, have in your index a 'site' field that contains the host name of the site so you can delete all the documents of that site in a single and quick operation.

第一种方法说明: 新建一个全文检索库后, 添加的记录默认从0开始计算, 即第一条编号为0,第二条为1,依次递增.delete(int)此中的参数即为此编号, 比如你添加了三条记录后,delete(1)则删除了第二条.

代码如下:

IndexReader indexReader = IndexReader.open("g://indexdb//db1");
indexReader.delete(1);
indexReader.close();

第二种方法说明: 适合于批量次删除, 删除符合条件的一批(条)记录, 该方法较之前者更为常用. term是lucene中的一个基本概念, 采用键值对形式表示.

代码如下:

IndexReader indexReader = IndexReader.open("g://indexdb//db1");
   Term term = new Term("filename","doc1.txt");
   indexReader.delete(term);
   indexReader.close();

用term表示需要删除记录的条件,如上,即删除凡文件叫为doc1.txt全删除(并不见得只有一条记录哦).

入库后的记录无法修改, 若要修改只有先删再加.