XML Search Engine Evaluation Criteria
February 22, 2002
Search
Capabilities
- Boolean operators
- logical (AND, OR, NOT)
- comparison (<, <=,
>, >=, ==, != )
- for text
- for numbers
- Proximity
- Near
- Before
- After
- Within x-bytes or x-characters
- Stop Words
- Character Sets
- Unicode support
- Mappings for special characters
(e.g. Ko equivalent to Kö in a search )
- Wildcard support
- Tag Searching
- Ability to search XML elements
and attributes as well as content
- Sorting
- Multi-level
- Ascending
- Descending
- Aggregation
- Ability to search within
a single collection/document
- Ability to search across
multiple collections
- Ability to search across
multiple databases
- Retrieval
- Ability to control what
is returned when a hit is encountered
- Ability to return hit count
with returned
- Ability to transform XML
results into other formats(e.g., HTML)
Ingestion Requirements
- Well formed xml
- Valid XML w/ DTD
- Valid XML w/ XML Schema
- Plain text
Indexing
- Support for incremental indexing
- Support for multiple indexing
- Ability to search across multiple
indexes
- Size limits on index files
- Limits on number of index files
Performance
- Ability to handle large single
files (e.g., OED)
- Ability to handle many small
files (e.g., Repository, metadata )
- Ability to load large numbers
of documents
- Ability to edit documents in
a large repository/index
- Time required to index new documents
Programmer APIs
- HTTP
- Java
- C
- Perl
- C++
- Unix command line interface
Available Platforms
Cost
- Initial cost
- Annual maintenance
Company Stability
|
|