capypad
0 day streak
go / beginner
Snippet

Understanding Strings, Runes, and Bytes in Go

In Go, a string is a read-only slice of bytes. Each character in a string may occupy 1 to 4 bytes depending on the Unicode code point. The built-in len() function returns the number of bytes, not characters. To get the actual character count, convert to []rune first. When iterating with range, Go decodes runes one at a time, making it safe for multi-byte characters.

snippet.go
go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
package main
 
import (
"fmt"
"unicode"
)
 
func main() {
str := "Hello, 世界"
fmt.Printf("String: %s\n", str)
fmt.Printf("Length in bytes: %d\n", len(str))
fmt.Printf("Length in runes: %d\n", len([]rune(str)))
 
fmt.Println("\nIterating by byte:")
for i := 0; i < len(str); i++ {
fmt.Printf(" [%d] = 0x%X\n", i, str[i])
}
 
fmt.Println("\nIterating by rune:")
for i, r := range str {
fmt.Printf(" [%d] = '%c' (U+%04X)\n", i, r, r)
}
 
fmt.Printf("\nIs '世' a Chinese character? %v\n", unicode.Is(unicode.Han, '世'))
}
Breakdown
1
str := "Hello, 世界"
Creates a string containing both ASCII and multi-byte Unicode characters
2
len([]rune(str))
Converts string to rune slice to count actual characters instead of bytes
3
for i, r := range str
Range automatically decodes runes, providing both index and Unicode code point
4
unicode.Is(unicode.Han, '世')
Uses unicode package to check if a rune belongs to a character category