A token is identified by appropriate wordbreakers , following the linguistic rules of the specified language 令牌由相应的断字符所标识,后面是指定语言的语言规则。
For a given language , a word breaker tokenizes text based on the lexical rules of the language 对于某种给定的语言,断字符可基于该语言的词汇规则对文本进行标记化。
With the neutral word breaker , words are broken at neutral characters such as spaces and punctuation marks 使用非特定语言断字符时,词将在非特定语言字符(如空格和标点符号)处断开。
In sql server 2000 , new word breakers and filters could only be added as global operating system level components 在sql server 2000中,新断字符和筛选器只能作为全局操作系统级组件添加。
Indicates whether or not operating system word breakers and filters are registered and used with this instance of sql server 指示此sql server实例中是否注册并使用了操作系统断字符和筛选器。
In some specific cases , changes made to the word breakers have the potential to impact how some data is tokenized 在某些特定情况下,对断字符所做的更改可能会影响对某些数据进行标记化的方式。
The full - text engine will not find words with the asterisk character because word breakers typically ignore such characters 由于断字符通常忽略( * )这样的字符,因此全文引擎将不会查找带此类字符的词。
In the case where there is a word breaker for the language family , but not for the specific sub - language , the major language is used 如果断字符用于整个语系而不是特定的子语言,将使用该语系中的主要语言。
A word breaker is the component that determines where word boundaries exist in a stream of text in the row being full - text indexed 断字符是用于确定在进行全文索引的行中单词边界位于文本流中什么位置的组件。
As part of processing , the gathered text data is passed through a word breaker to separate the text into individual tokens , or keywords 在处理过程中,通过断字符将收集到的文本数据分隔成各个单独的标记或关键字。