More tvbuff API deprecation, comment expansion, and documentation updates.

Do with tvb_get_stringz() what was done with tvb_get_string(). Redo the comments for the string get routines to try to give more detail in a fashion that's a bit less hard to read. Warn, in comments, of the problems with using tvb_get_string()/tvb_get_stringz() (i.e., if your strings are non-ASCII, all bytes with the 8th bit set are going be replaced by the Unicode REPLACEMENT CHARACTER, and displayed as such). Warn, in a comment, of the problems with tvb_get_const_stringz() (i.e., it gives you raw bytes, rather than guaranteed-to-be-valid UTF-8). Update documentation and release notes appropriately. Change-Id: Ibd3efb92a203861f507ce71bc8d04d19d9d38a93 Reviewed-on: https://code.wireshark.org/review/327 Reviewed-by: Guy Harris <guy@alum.mit.edu>
author: Guy Harris <guy@alum.mit.edu> 2014-02-23 14:16:24 -0800
committer: Guy Harris <guy@alum.mit.edu> 2014-02-26 22:04:08 +0000
commit: 8d234a0d8c6c974a374e36a58cd7b3d699866464 (patch)
tree: 9306a7d3b231b5134e1773de238b89fb6dfe9cc8 /epan/tvbuff.h
parent: 1dff4e309d036e23c316f2cf9a6d05d5a4449ff2 (diff)
download: wireshark-8d234a0d8c6c974a374e36a58cd7b3d699866464.tar.gz
1 files changed, 80 insertions, 33 deletions
diff --git a/epan/tvbuff.h b/epan/tvbuff.h
index 9e15612105..d1eb66216e 100644
--- a/epan/tvbuff.h
+++ b/epan/tvbuff.h
@@ -474,17 +474,21 @@ extern gchar *tvb_format_stringzpad(tvbuff_t *tvb, const gint offset,
 extern gchar *tvb_format_stringzpad_wsp(tvbuff_t *tvb, const gint offset,
     const gint size);
 
-
 /**
- * Given a tvbuff, an offset, and a length, allocate a buffer big enough
- * to hold a string of length characters plus a trailing '\0'. Copy length
- * characters, starting at offset, from the tvbuff into the buffer and return
- * a pointer to the buffer.
+ * Given an allocator scope, a tvbuff, a byte offset, a byte length, and
+ * a string encoding, with the specified offset and length referring to
+ * a string in the specified encoding:
  *
- * Throws an exception if the tvbuff ends before the string does.
+ *    allocate a buffer using the specified scope;
+ *
+ *    convert the string from the specified encoding to UTF-8, possibly
+ *    mapping some characters or invalid octet sequences to the Unicode
+ *    REPLACEMENT CHARACTER, and put the resulting UTF-8 string, plus a
+ *    trailing '\0', into that buffer;
  *
- * Takes a string encoding as well, and converts to UTF-8 from the encoding,
- * possibly mapping some characters to the Unicode REPLACEMENT CHARACTER.
+ *    and return a pointer to the buffer.
+ *
+ * Throws an exception if the tvbuff ends before the string does.
  *
  * If scope is set to NULL it is the user's responsibility to wmem_free()
  * the memory allocated. Otherwise memory is automatically freed when the scope
@@ -493,16 +497,31 @@ extern gchar *tvb_format_stringzpad_wsp(tvbuff_t *tvb, const gint offset,
 WS_DLL_PUBLIC guint8 *tvb_get_string_enc(wmem_allocator_t *scope,
     tvbuff_t *tvb, const gint offset, const gint length, const guint encoding);
 
-/* DEPRECATED, do not use in new code, call tvb_get_string_enc directly! */
+/*
+ * DEPRECATED, do not use in new code, call tvb_get_string_enc directly with
+ * the appropriate extension!  Do not assume that ENC_ASCII will work
+ * with arbitrary string encodings; it will map all bytes with the 8th
+ * bit set to the Unicode REPLACEMENT CHARACTER, so it won't show non-ASCII
+ * characters as anything other than an ugly blob.
+ */
 #define tvb_get_string(SCOPE, TVB, OFFSET, LENGTH) \
     tvb_get_string_enc(SCOPE, TVB, OFFSET, LENGTH, ENC_ASCII)
 
 /**
- * Given a tvbuff, a bit offset, and a number of characters, allocate
- * a buffer big enough to hold a non-null-terminated string of no_of_chars
- * encoded according to 3GPP TS 23.038 7bits encoding at that offset,
- * plus a trailing zero, copy the string into it, and return a pointer
- * to the string.
+ * Given an allocator scope, a tvbuff, a bit offset, and a length in
+ * 7-bit characters (not octets!), with the specified offset and
+ * length referring to a string in the 3GPP TS 23.038 7bits encoding:
+ *
+ *    allocate a buffer using the specified scope;
+ *
+ *    convert the string from the specified encoding to UTF-8, possibly
+ *    mapping some characters or invalid octet sequences to the Unicode
+ *    REPLACEMENT CHARACTER, and put the resulting UTF-8 string, plus a
+ *    trailing '\0', into that buffer;
+ *
+ *    and return a pointer to the buffer.
+ *
+ * Throws an exception if the tvbuff ends before the string does.
  *
  * If scope is set to NULL it is the user's responsibility to g_free()
  * the memory allocated by tvb_memdup(). Otherwise memory is
@@ -512,6 +531,45 @@ WS_DLL_PUBLIC gchar *tvb_get_ts_23_038_7bits_string(wmem_allocator_t *scope,
     tvbuff_t *tvb, const gint bit_offset, gint no_of_chars);
 
 /**
+ * Given an allocator scope, a tvbuff, a byte offset, a pointer to a
+ * gint, and a string encoding, with the specified offset referring to
+ * a null-terminated string in the specified encoding:
+ *
+ *    find the length of that string (and throw an exception if the tvbuff
+ *    ends before we find the null);
+ *
+ *    allocate a buffer using the specified scope;
+ *
+ *    convert the string from the specified encoding to UTF-8, possibly
+ *    mapping some characters or invalid octet sequences to the Unicode
+ *    REPLACEMENT CHARACTER, and put the resulting UTF-8 string, plus a
+ *    trailing '\0', into that buffer;
+ *
+ *    if the pointer to the gint is non-null, set the gint to which it
+ *    points to the length of the string;
+ *
+ *    and return a pointer to the buffer.
+ *
+ * Throws an exception if the tvbuff ends before the string does.
+ *
+ * If scope is set to NULL it is the user's responsibility to wmem_free()
+ * the memory allocated. Otherwise memory is automatically freed when the scope
+ * lifetime is reached.
+ */
+WS_DLL_PUBLIC guint8 *tvb_get_stringz_enc(wmem_allocator_t *scope,
+    tvbuff_t *tvb, const gint offset, gint *lengthp, const guint encoding);
+
+/*
+ * DEPRECATED, do not use in new code, call tvb_get_string_enc directly with
+ * the appropriate extension!  Do not assume that ENC_ASCII will work
+ * with arbitrary string encodings; it will map all bytes with the 8th
+ * bit set to the Unicode REPLACEMENT CHARACTER, so it won't show non-ASCII
+ * characters as anything other than an ugly blob.
+ */
+#define tvb_get_stringz(SCOPE, TVB, OFFSET, LENGTHP) \
+    tvb_get_stringz_enc(SCOPE, TVB, OFFSET, LENGTHP, ENC_ASCII)
+
+/**
  * Given a tvbuff and an offset, with the offset assumed to refer to
  * a null-terminated string, find the length of that string (and throw
  * an exception if the tvbuff ends before we find the null), allocate
@@ -519,27 +577,16 @@ WS_DLL_PUBLIC gchar *tvb_get_ts_23_038_7bits_string(wmem_allocator_t *scope,
  * and return a pointer to the string.  Also return the length of the
  * string (including the terminating null) through a pointer.
  *
- * tvb_get_stringz() handles 7-bit ASCII strings, with characters
- *                   with the 8th bit set are converted to the
- *                   Unicode REPLACEMENT CHARACTER.
- *
- * tvb_get_stringz_enc() takes a string encoding as well, and converts to UTF-8
- *                   from the encoding, possibly mapping some characters
- *                   to the REPLACEMENT CHARACTER.
- *
- * tvb_get_const_stringz() returns a constant (unmodifiable) string that does
- *                   not need to be freed, instead it will automatically be
- *                   freed once the next packet is dissected.  It is slightly
- *                   more efficient than the other routines.
+ * This returns a constant (unmodifiable) string that does not need
+ * to be freed; instead, it will automatically be freed once the next
+ * packet is dissected.
  *
- * If scope is set to NULL it is the user's responsibility to g_free()
- * the memory allocated by tvb_memdup(). Otherwise memory is
- * automatically freed when the scope lifetime is reached.
+ * It is slightly more efficient than the other routines, but does *NOT*
+ * do any translation to UTF-8 - the string consists of the raw octets
+ * of the string, in whatever encoding they happen to be in, and, if
+ * the string is not valid in that encoding, with invalid octet sequences
+ * as they are in the packet.
  */
-WS_DLL_PUBLIC guint8 *tvb_get_stringz(wmem_allocator_t *scope, tvbuff_t *tvb,
-    const gint offset, gint *lengthp);
-WS_DLL_PUBLIC guint8 *tvb_get_stringz_enc(wmem_allocator_t *scope,
-    tvbuff_t *tvb, const gint offset, gint *lengthp, const guint encoding);
 WS_DLL_PUBLIC const guint8 *tvb_get_const_stringz(tvbuff_t *tvb,
     const gint offset, gint *lengthp);
author	Guy Harris <guy@alum.mit.edu>	2014-02-23 14:16:24 -0800
committer	Guy Harris <guy@alum.mit.edu>	2014-02-26 22:04:08 +0000
commit	8d234a0d8c6c974a374e36a58cd7b3d699866464 (patch)
tree	9306a7d3b231b5134e1773de238b89fb6dfe9cc8 /epan/tvbuff.h
parent	1dff4e309d036e23c316f2cf9a6d05d5a4449ff2 (diff)
download	wireshark-8d234a0d8c6c974a374e36a58cd7b3d699866464.tar.gz