Asterisk - The Open Source Telephony Project  18.5.0
Enumerations | Functions
utf8.h File Reference

UTF-8 information and validation functions. More...

This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Enumerations

enum  ast_utf8_validation_result { AST_UTF8_VALID, AST_UTF8_INVALID, AST_UTF8_UNKNOWN }
 

Functions

void ast_utf8_copy_string (char *dst, const char *src, size_t size)
 Copy a string safely ensuring valid UTF-8. More...
 
int ast_utf8_init (void)
 Register UTF-8 tests. More...
 
int ast_utf8_is_valid (const char *str)
 Check if a zero-terminated string is valid UTF-8. More...
 
int ast_utf8_is_validn (const char *str, size_t size)
 Check if the first size bytes of a string are valid UTF-8. More...
 
void ast_utf8_validator_destroy (struct ast_utf8_validator *validator)
 Destroy a UTF-8 validator. More...
 
enum ast_utf8_validation_result ast_utf8_validator_feed (struct ast_utf8_validator *validator, const char *data)
 Feed a zero-terminated string into the UTF-8 validator. More...
 
enum ast_utf8_validation_result ast_utf8_validator_feedn (struct ast_utf8_validator *validator, const char *data, size_t size)
 Feed a string into the UTF-8 validator. More...
 
int ast_utf8_validator_new (struct ast_utf8_validator **validator)
 Create a new UTF-8 validator. More...
 
void ast_utf8_validator_reset (struct ast_utf8_validator *validator)
 Reset the state of a UTF-8 validator. More...
 
enum ast_utf8_validation_result ast_utf8_validator_state (struct ast_utf8_validator *validator)
 Get the current UTF-8 validator state. More...
 

Detailed Description

UTF-8 information and validation functions.

Definition in file utf8.h.

Enumeration Type Documentation

◆ ast_utf8_validation_result

Enumerator
AST_UTF8_VALID 

The consumed sequence is valid UTF-8.

The bytes consumed thus far by the validator represent a valid sequence of UTF-8 bytes. If additional bytes are fed into the validator, it can transition into either AST_UTF8_INVALID or AST_UTF8_UNKNOWN

AST_UTF8_INVALID 

The consumed sequence is invalid UTF-8.

The bytes consumed thus far by the validator represent an invalid sequence of UTF-8 bytes. Feeding additional bytes into the validator will not change its state.

AST_UTF8_UNKNOWN 

The validator is in an intermediate state.

The validator is in the process of validating a multibyte UTF-8 sequence and requires additional data to be fed into it to determine validity. If additional bytes are fed into the validator, it can transition into either AST_UTF8_VALID or AST_UTF8_INVALID. If you have no additional data to feed into the validator the UTF-8 sequence is invalid.

Definition at line 71 of file utf8.h.

71  {
72  /*! \brief The consumed sequence is valid UTF-8
73  *
74  * The bytes consumed thus far by the validator represent a valid sequence of
75  * UTF-8 bytes. If additional bytes are fed into the validator, it can
76  * transition into either \a AST_UTF8_INVALID or \a AST_UTF8_UNKNOWN
77  */
79 
80  /*! \brief The consumed sequence is invalid UTF-8
81  *
82  * The bytes consumed thus far by the validator represent an invalid sequence
83  * of UTF-8 bytes. Feeding additional bytes into the validator will not
84  * change its state.
85  */
87 
88  /*! \brief The validator is in an intermediate state
89  *
90  * The validator is in the process of validating a multibyte UTF-8 sequence
91  * and requires additional data to be fed into it to determine validity. If
92  * additional bytes are fed into the validator, it can transition into either
93  * \a AST_UTF8_VALID or \a AST_UTF8_INVALID. If you have no additional data
94  * to feed into the validator the UTF-8 sequence is invalid.
95  */
97 };
The consumed sequence is invalid UTF-8.
Definition: utf8.h:86
The consumed sequence is valid UTF-8.
Definition: utf8.h:78
The validator is in an intermediate state.
Definition: utf8.h:96

Function Documentation

◆ ast_utf8_copy_string()

void ast_utf8_copy_string ( char *  dst,
const char *  src,
size_t  size 
)

Copy a string safely ensuring valid UTF-8.

Since
13.36.0, 16.13.0, 17.7.0, 18.0.0

This is similar to ast_copy_string, but it will only copy valid UTF-8 sequences from the source string into the destination buffer. If an invalid UTF-8 sequence is encountered, or the available space in the destination buffer is exhausted in the middle of an otherwise valid UTF-8 sequence, the destination buffer will be truncated to ensure that it only contains valid UTF-8.

Parameters
dstThe destination buffer.
srcThe source string
sizeThe size of the destination buffer
Returns
Nothing.

Definition at line 133 of file utf8.c.

References ast_assert, decode(), UTF8_ACCEPT, and UTF8_REJECT.

Referenced by test_copy_and_compare().

134 {
135  uint32_t state = UTF8_ACCEPT;
136  char *last_good = dst;
137 
138  ast_assert(size > 0);
139 
140  while (size && *src) {
141  if (decode(&state, (uint8_t) *src) == UTF8_REJECT) {
142  /* We _could_ replace with U+FFFD and try to recover, but for now
143  * we treat this the same as if we had run out of space */
144  break;
145  }
146 
147  *dst++ = *src++;
148  size--;
149 
150  if (size && state == UTF8_ACCEPT) {
151  /* last_good is where we will ultimately write the 0 byte */
152  last_good = dst;
153  }
154  }
155 
156  *last_good = '\0';
157 }
static uint32_t decode(uint32_t *state, uint32_t byte)
Definition: utf8.c:98
#define UTF8_REJECT
Definition: utf8.c:61
#define ast_assert(a)
Definition: utils.h:695
#define UTF8_ACCEPT
Definition: utf8.c:60

◆ ast_utf8_init()

int ast_utf8_init ( void  )

Register UTF-8 tests.

Since
13.36.0, 16.13.0, 17.7.0, 18.0.0

Does nothing unless TEST_FRAMEWORK is defined.

Returns
Always returns 0

Definition at line 362 of file utf8.c.

References ast_register_cleanup(), AST_TEST_REGISTER, and test_utf8_shutdown().

Referenced by asterisk_daemon().

363 {
364  AST_TEST_REGISTER(test_utf8_is_valid);
365  AST_TEST_REGISTER(test_utf8_copy_string);
366  AST_TEST_REGISTER(test_utf8_validator);
367 
369 
370  return 0;
371 }
#define AST_TEST_REGISTER(cb)
Definition: test.h:127
static void test_utf8_shutdown(void)
Definition: utf8.c:355
int ast_register_cleanup(void(*func)(void))
Register a function to be executed before Asterisk gracefully exits.
Definition: clicompat.c:19

◆ ast_utf8_is_valid()

int ast_utf8_is_valid ( const char *  str)

Check if a zero-terminated string is valid UTF-8.

Since
13.36.0, 16.13.0, 17.7.0, 18.0.0
Parameters
strThe zero-terminated string to check
Return values
0if the string is not valid UTF-8
Non-zeroif the string is valid UTF-8

Definition at line 110 of file utf8.c.

References decode(), and UTF8_ACCEPT.

Referenced by AST_TEST_DEFINE().

111 {
112  uint32_t state = UTF8_ACCEPT;
113 
114  while (*src) {
115  decode(&state, (uint8_t) *src++);
116  }
117 
118  return state == UTF8_ACCEPT;
119 }
static uint32_t decode(uint32_t *state, uint32_t byte)
Definition: utf8.c:98
#define UTF8_ACCEPT
Definition: utf8.c:60

◆ ast_utf8_is_validn()

int ast_utf8_is_validn ( const char *  str,
size_t  size 
)

Check if the first size bytes of a string are valid UTF-8.

Since
13.36.0, 16.13.0, 17.7.0, 18.0.0

Similar to ast_utf8_is_valid() but checks the first size bytes or until a zero byte is reached, whichever comes first.

Parameters
strThe string to check
sizeThe number of bytes to evaluate
Return values
0if the string is not valid UTF-8
Non-zeroif the string is valid UTF-8

Definition at line 121 of file utf8.c.

References decode(), and UTF8_ACCEPT.

Referenced by AST_TEST_DEFINE().

122 {
123  uint32_t state = UTF8_ACCEPT;
124 
125  while (size && *src) {
126  decode(&state, (uint8_t) *src++);
127  size--;
128  }
129 
130  return state == UTF8_ACCEPT;
131 }
static uint32_t decode(uint32_t *state, uint32_t byte)
Definition: utf8.c:98
#define UTF8_ACCEPT
Definition: utf8.c:60

◆ ast_utf8_validator_destroy()

void ast_utf8_validator_destroy ( struct ast_utf8_validator validator)

Destroy a UTF-8 validator.

Since
13.36.0, 16.13.0, 17.7.0, 18.0.0
Parameters
validatorThe validator instance to destroy

Definition at line 215 of file utf8.c.

References ast_free.

Referenced by AST_TEST_DEFINE().

216 {
217  ast_free(validator);
218 }
#define ast_free(a)
Definition: astmm.h:182

◆ ast_utf8_validator_feed()

enum ast_utf8_validation_result ast_utf8_validator_feed ( struct ast_utf8_validator validator,
const char *  data 
)

Feed a zero-terminated string into the UTF-8 validator.

Since
13.36.0, 16.13.0, 17.7.0, 18.0.0
Parameters
validatorThe validator instance
dataThe zero-terminated string to feed into the validator
Returns
The ast_utf8_validation_result indicating the current state of the validator.

Definition at line 189 of file utf8.c.

References ast_utf8_validator_state(), decode(), and ast_utf8_validator::state.

Referenced by AST_TEST_DEFINE().

191 {
192  while (*data) {
193  decode(&validator->state, (uint8_t) *data++);
194  }
195 
196  return ast_utf8_validator_state(validator);
197 }
static uint32_t decode(uint32_t *state, uint32_t byte)
Definition: utf8.c:98
uint32_t state
Definition: utf8.c:160
enum ast_utf8_validation_result ast_utf8_validator_state(struct ast_utf8_validator *validator)
Get the current UTF-8 validator state.
Definition: utf8.c:176

◆ ast_utf8_validator_feedn()

enum ast_utf8_validation_result ast_utf8_validator_feedn ( struct ast_utf8_validator validator,
const char *  data,
size_t  size 
)

Feed a string into the UTF-8 validator.

Since
13.36.0, 16.13.0, 17.7.0, 18.0.0

Similar to ast_utf8_validator_feed but will stop feeding in data if a zero byte is encountered or size bytes have been read.

Parameters
validatorThe validator instance
dataThe string to feed into the validator
sizeThe number of bytes to feed into the validator
Returns
The ast_utf8_validation_result indicating the current state of the validator.

Definition at line 199 of file utf8.c.

References ast_utf8_validator_state(), decode(), and ast_utf8_validator::state.

201 {
202  while (size && *data) {
203  decode(&validator->state, (uint8_t) *data++);
204  size--;
205  }
206 
207  return ast_utf8_validator_state(validator);
208 }
static uint32_t decode(uint32_t *state, uint32_t byte)
Definition: utf8.c:98
uint32_t state
Definition: utf8.c:160
enum ast_utf8_validation_result ast_utf8_validator_state(struct ast_utf8_validator *validator)
Get the current UTF-8 validator state.
Definition: utf8.c:176

◆ ast_utf8_validator_new()

int ast_utf8_validator_new ( struct ast_utf8_validator **  validator)

Create a new UTF-8 validator.

Since
13.36.0, 16.13.0, 17.7.0, 18.0.0
Parameters
[out]validatorThe validator instance
Return values
0on success
-1on failure

Definition at line 163 of file utf8.c.

References ast_malloc, ast_utf8_validator::state, tmp(), and UTF8_ACCEPT.

Referenced by AST_TEST_DEFINE().

164 {
165  struct ast_utf8_validator *tmp = ast_malloc(sizeof(*tmp));
166 
167  if (!tmp) {
168  return 1;
169  }
170 
171  tmp->state = UTF8_ACCEPT;
172  *validator = tmp;
173  return 0;
174 }
uint32_t state
Definition: utf8.c:160
static int tmp()
Definition: bt_open.c:389
#define ast_malloc(len)
A wrapper for malloc()
Definition: astmm.h:193
#define UTF8_ACCEPT
Definition: utf8.c:60

◆ ast_utf8_validator_reset()

void ast_utf8_validator_reset ( struct ast_utf8_validator validator)

Reset the state of a UTF-8 validator.

Since
13.36.0, 16.13.0, 17.7.0, 18.0.0

Resets the provided UTF-8 validator to its initial state so that it can be reused.

Parameters
validatorThe validator instance to reset

Definition at line 210 of file utf8.c.

References ast_utf8_validator::state, and UTF8_ACCEPT.

211 {
212  validator->state = UTF8_ACCEPT;
213 }
uint32_t state
Definition: utf8.c:160
#define UTF8_ACCEPT
Definition: utf8.c:60

◆ ast_utf8_validator_state()

enum ast_utf8_validation_result ast_utf8_validator_state ( struct ast_utf8_validator validator)

Get the current UTF-8 validator state.

Since
13.36.0, 16.13.0, 17.7.0, 18.0.0
Parameters
validatorThe validator instance
Returns
The ast_utf8_validation_result indicating the current state of the validator.

Definition at line 176 of file utf8.c.

References AST_UTF8_INVALID, AST_UTF8_UNKNOWN, AST_UTF8_VALID, ast_utf8_validator::state, UTF8_ACCEPT, and UTF8_REJECT.

Referenced by ast_utf8_validator_feed(), and ast_utf8_validator_feedn().

178 {
179  switch (validator->state) {
180  case UTF8_ACCEPT:
181  return AST_UTF8_VALID;
182  case UTF8_REJECT:
183  return AST_UTF8_INVALID;
184  default:
185  return AST_UTF8_UNKNOWN;
186  }
187 }
The consumed sequence is invalid UTF-8.
Definition: utf8.h:86
uint32_t state
Definition: utf8.c:160
#define UTF8_REJECT
Definition: utf8.c:61
The consumed sequence is valid UTF-8.
Definition: utf8.h:78
#define UTF8_ACCEPT
Definition: utf8.c:60
The validator is in an intermediate state.
Definition: utf8.h:96