程式心得筆記工程屍的日子 | Mr. 一顆痣 a.k.a. bigd : Lucene

顯示具有 Lucene 標籤的文章。顯示所有文章

星期一, 1月 30, 2012

[Alfresco] Pagination of Lucene

使用javascript API的search.query(searchParameters)方法時，
當skipCount超過一千筆時，會無法取得回傳的結果，
導致客製化分頁會失敗。

找到以下這篇跟我遇到一樣的問題：
Improve the skipCount function not to check the permissions.

記錄一下：

This is an enhancement request for paging offset in FTS query called "skipcount". Alfresco don't have to check permission of skipped items when we specify the skipCount.

(*1 system.max.permissionChecks 1000 as the default)

I know the paging search itself won't be affected by the permission check as far as I tested it with an out of box webscripts named "children.get. js". So, in this case it works very fine, because this webscript uses "group.getChildGroups(maxItems, skipCount)" with ModelUtil.paging internally, so with this webscripts I can correctly get the result more than the specified number to the permission check. I attached the webscript, please find the sample-webscripts.zip for your reference.

But, the problem is that if we use the paging offset in FTS query called "skipcount" combined with the Lucene search query in the WebScripts as follows, then it will be affected by the permission check. I attached the webscirpts named paging-result.zip for your reference.

    var skipCount = 0 + args["skip"];
    var searchParams = {};
    searchParams.query = "cm\:name:document*";
    searchParams.language = "fts-alfresco";
    var paging = {};
    paging.maxItems = 100;
    paging.skipCount = skipCount;
    paging.totalItems = 100,
    searchParams.page = paging;
    var results = search.query(searchParams);
    model["length"] = results.length;
    model["results"] = results;

In this case, when we specify the skipcount below the number of system.max.permissionCheck for example 1000 as the default, and set a proper paging value, then it will return the correct result, but the problem is when we specify skipcount over 1,000, then webscripts returns no results (zero items).
So, the work around is to set over 1,000 to the "system.acl.maxPermissionChecks" , then we can get the results correctly. But increasing this parameter will give more stress to Alfresco server, so that would be nice if we could improve the function of skipCount since Alfresco don't have to check the permission of skipped items when we specify the skipCount.

請在repository.properties修正以下二個參數

#
# Properties to limit resources spent on individual searches
#
# The maximum time spent pruning results
system.acl.maxPermissionCheckTimeMillis=100000
# The maximum number of results to perform permission checks against
system.acl.maxPermissionChecks=10000

不過當存取超過10000的時候發生以下例外錯誤!!(待續)
目前限制只能取到一層的一萬筆，當該層超過一萬筆後就拿不到資料!!
Transactional update cache 'org.alfresco.cache.node.aspectsTransactionalCache' is full (10000)

星期三, 12月 07, 2011

[Lucene] Stopwords

Lucene不支援以下的stopwords做搜尋：

"a", "an", "and", "are", "as", "at", "be", "but", "by",

"for", "if", "in", "into", "is", "it",

"no", "not", "of", "on", "or", "such",

"that", "the", "their", "then", "there", "these",

"they", "this", "to", "was", "will", "with"

[Alfresco] Lucene Search: Escaping special characters

今天測試開了特殊字元的帳號查看垃圾桶的檔案，在alfresco的web ui也爆炸了。

發現UID含特殊字元時，也要記得跳脫!!

you are using Lucene 1.4 or prior, there is no escape convenience utility. Instead, you must write your own. The characters that need to be escaped are: + - ! ( ) { } [ ] ^ " ~ * ? : \

Lucene 1.4 Escaping (More Complete)

// Some constants.
private static final String LUCENE_ESCAPE_CHARS = "[\\\\+\\-\\!\\(\\)\\:\\^\\]\\{\\}\\~\\*\\?]";
private static final Pattern LUCENE_PATTERN = Pattern.compile(LUCENE_ESCAPE_CHARS);
private static final String REPLACEMENT_STRING = "\\\\$0";
 
// ... Then, in your code somewhere...
String userInput = // ...
String escaped = LUCENE_PATTERN.matcher(userInput).replaceAll(REPLACEMENT_STRING);
Query query = QueryParser.parse(escaped);
// ...

Reference:
Lucene: Escaping Special Characters

星期三, 12月 29, 2010

[Alfresco] Lucene Language Note

Lucene Language

This is the recommended language as it is supported by the recommended indexer.
The query language is described on the Lucene site http://lucene.apache.org/java/2_4_0/queryparsersyntax.html. The QueryParser has been modified to allow wild cards at the start of wild card query elements otherwise the syntax is the same.
Note that certain characters need to be escaped in the query string. There is support to do this on a static method on the LuceneQueryParser.
The following fields are available

閱讀全文

程式心得筆記工程屍的日子 | Mr. 一顆痣 a.k.a. bigd