javascript / intermediate
Snippet
Locale-Aware Text Segmentation
The Intl.Segmenter API allows for linguistically correct text splitting. Unlike split(' '), it understands language rules for word boundaries, punctuation, and emojis across different locales.
snippet.js
1
2
3
4
5
6
7
const text = "Node.js ist super! 🚀";const segmenter = new Intl.Segmenter('de', { granularity: 'word' });const segments = segmenter.segment(text);for (const { segment, isWordLike } of segments) {if (isWordLike) console.log(`Word: ${segment}`);}
nodejs
Breakdown
1
new Intl.Segmenter('de', { granularity: 'word' });
Initializes a segmenter for German that focuses on word-level boundaries.
2
isWordLike
A boolean property indicating if the segment is an actual word (vs whitespace or punctuation).