2015年1月23日星期五

Muddiest points for week 2

How does the search engine deal with the word segmentation which does not guarantee a unique tokenization. For example the slides in the week 2 provided some example in chinese language to show that different methods for word segmentation lead to different results. The method called n-gram was introduced and the example of"新西兰花" was presented. And I want to know if we deal with a sentence with more words, can we use that n-gram?
It goes to fast when instructor introduced the phrase recognition, so I failed to catch up and quite confused about the content about that.
what's the differences between terms and tokens?

没有评论:

发表评论