📝 add PlantUML

This commit is contained in:
Niels Lohmann 2020-05-24 21:05:35 +02:00
parent 3400af21cd
commit ddf92606ab
No known key found for this signature in database
GPG key ID: 7F3CEA63AE251B69
6 changed files with 323 additions and 9 deletions

View file

@ -2,10 +2,10 @@
Though JSON is a ubiquitous data format, it is not a very compact format suitable for data exchange, for instance over a network. Hence, the library supports
- [BSON](bson) (Binary JSON),
- [CBOR](cbor) (Concise Binary Object Representation),
- [MessagePack](messagepack), and
- [UBJSON](ubjson) (Universal Binary JSON Specification)
- [BSON](bson.md) (Binary JSON),
- [CBOR](cbor.md) (Concise Binary Object Representation),
- [MessagePack](messagepack.md), and
- [UBJSON](ubjson.md) (Universal Binary JSON Specification)
to efficiently encode JSON values to byte vectors and to decode such vectors.

View file

@ -1,11 +1,24 @@
# Binary Values
The library implements several [binary formats](binary_formats/index) that encode JSON in an efficient way. Most of these formats support binary values; that is, values that have semantics define outside the library and only define a sequence of bytes to be stored.
The library implements several [binary formats](binary_formats/index.md) that encode JSON in an efficient way. Most of these formats support binary values; that is, values that have semantics define outside the library and only define a sequence of bytes to be stored.
JSON itself does not have a binary value. As such, binary values are an extension that this library implements to store values received by a binary format. Binary values are never created by the JSON parser, and are only part of a serialized JSON text if they have been created manually or via a binary format.
## API for binary values
```plantuml
class json::binary_t {
-- setters --
+void set_subtype(std::uint8_t subtype)
+void clear_subtype()
-- getters --
+std::uint8_t subtype() const
+bool has_subtype() const
}
"std::vector<uint8_t>" <|-- json::binary_t
```
By default, binary values are stored as `std::vector<std::uint8_t>`. This type can be changed by providing a template parameter to the `basic_json` type. To store binary subtypes, the storage type is extended and exposed as `json::binary_t`:
```cpp
@ -105,7 +118,7 @@ JSON does not have a binary type, and this library does not introduce a new type
### BSON
[BSON](binary_formats/bson) supports binary values and subtypes. If a subtype is given, it is used and added as unsigned 8-bit integer. If no subtype is given, the generic binary subtype 0x00 is used.
[BSON](binary_formats/bson.md) supports binary values and subtypes. If a subtype is given, it is used and added as unsigned 8-bit integer. If no subtype is given, the generic binary subtype 0x00 is used.
!!! example
@ -145,7 +158,7 @@ JSON does not have a binary type, and this library does not introduce a new type
### CBOR
[CBOR](binary_formats/cbor) supports binary values, but no subtypes. Any binary value will be serialized as byte strings. The library will choose the smallest representation using the length of the byte array.
[CBOR](binary_formats/cbor.md) supports binary values, but no subtypes. Any binary value will be serialized as byte strings. The library will choose the smallest representation using the length of the byte array.
!!! example
@ -183,7 +196,7 @@ JSON does not have a binary type, and this library does not introduce a new type
### MessagePack
[MessagePack](binary_formats/messagepack) supports binary values and subtypes. If a subtype is given, the ext family is used. The library will choose the smallest representation among fixext1, fixext2, fixext4, fixext8, ext8, ext16, and ext32. The subtype is then added as singed 8-bit integer.
[MessagePack](binary_formats/messagepack.md) supports binary values and subtypes. If a subtype is given, the ext family is used. The library will choose the smallest representation among fixext1, fixext2, fixext4, fixext8, ext8, ext16, and ext32. The subtype is then added as singed 8-bit integer.
If no subtype is given, the bin family (bin8, bin16, bin32) is used.
@ -224,7 +237,7 @@ If no subtype is given, the bin family (bin8, bin16, bin32) is used.
### UBJSON
[UBJSON](binary_formats/ubjson) neither supports binary values nor subtypes, and proposes to serialize binary values as array of uint8 values. This translation is implemented by the library.
[UBJSON](binary_formats/ubjson.md) neither supports binary values nor subtypes, and proposes to serialize binary values as array of uint8 values. This translation is implemented by the library.
!!! example

View file

@ -2,6 +2,29 @@
The library uses a SAX-like interface with the following functions:
```plantuml
class sax {
+ {abstract} bool null()
+ {abstract} bool boolean(bool val)
+ {abstract} bool number_integer(number_integer_t val)
+ {abstract} bool number_unsigned(number_unsigned_t val)
+ {abstract} bool number_float(number_float_t val, const string_t& s)
+ {abstract} bool string(string_t& val)
+ {abstract} bool start_object(std::size_t elements)
+ {abstract} bool end_object()
+ {abstract} bool start_array(std::size_t elements)
+ {abstract} bool end_array()
+ {abstract} bool key(string_t& val)
+ {abstract} bool parse_error(std::size_t position, const std::string& last_token, const detail::exception& ex)
}
```
```cpp
// called when null is parsed
bool null();

View file

@ -0,0 +1,264 @@
# Types
This page gives an overview how JSON values are stored and how this can be configured.
## Overview
By default, JSON values are stored as follows:
| JSON type | C++ type |
| --------- | -------- |
| object | `std::map<std::string, basic_json>` |
| array | `std::vector<basic_json>` |
| null | `std::nullptr_t` |
| string | `std::string` |
| boolean | `bool` |
| number | `std::int64_t`, `std::uint64_t`, and `double` |
Note there are three different types for numbers - when parsing JSON text, the best fitting type is chosen.
```plantuml
enum value_t {
null
object
array
string
boolean
number_integer
number_unsigned
number_float
binary
discarded
}
class json_value << (U,orchid) >> {
object_t* object
array_t* array
string_t* string
binary_t* binary
boolean_t boolean
number_integer_t number_integer
number_unsigned_t number_unsigned
number_float_t number_float
}
class basic_json {
value_t m_type
json_value m_value
+ <u>typedef</u> object_t
+ <u>typedef</u> array_t
+ <u>typedef</u> binary_t
+ <u>typedef</u> boolean_t
+ <u>typedef</u> number_integer_t
+ <u>typedef</u> number_unsigned_t
+ <u>typedef</u> number_float_t
}
basic_json .. json_value
basic_json .. value_t
```
## Template arguments
The data types to store a JSON value are derived from the template arguments passed to class `basic_json`:
```cpp
template<
template<typename U, typename V, typename... Args> class ObjectType = std::map,
template<typename U, typename... Args> class ArrayType = std::vector,
class StringType = std::string,
class BooleanType = bool,
class NumberIntegerType = std::int64_t,
class NumberUnsignedType = std::uint64_t,
class NumberFloatType = double,
template<typename U> class AllocatorType = std::allocator,
template<typename T, typename SFINAE = void> class JSONSerializer = adl_serializer,
class BinaryType = std::vector<std::uint8_t>
>
class basic_json;
```
Type `json` is an alias for `basic_json<>` and uses the default types.
From the template arguments, the following types are derived:
```cpp
using object_comparator_t = std::less<>;
using object_t = ObjectType<StringType, basic_json, object_comparator_t,
AllocatorType<std::pair<const StringType, basic_json>>>;
using array_t = ArrayType<basic_json, AllocatorType<basic_json>>;
using string_t = StringType;
using boolean_t = BooleanType;
using number_integer_t = NumberIntegerType;
using number_unsigned_t = NumberUnsignedType;
using number_float_t = NumberFloatType;
using binary_t = nlohmann::byte_container_with_subtype<BinaryType>;
```
## Objects
[RFC 7159](http://rfc7159.net/rfc7159) describes JSON objects as follows:
> An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array.
### Default type
With the default values for *ObjectType* (`std::map`), *StringType* (`std::string`), and *AllocatorType* (`std::allocator`), the default value for `object_t` is:
```cpp
std::map<
std::string, // key_type
basic_json, // value_type
std::less<>, // key_compare
std::allocator<std::pair<const std::string, basic_json>> // allocator_type
>
```
### Behavior
The choice of `object_t` influences the behavior of the JSON class. With the default type, objects have the following behavior:
- When all names are unique, objects will be interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings.
- When the names within an object are not unique, it is unspecified which one of the values for a given key will be chosen. For instance, `#!json {"key": 2, "key": 1}` could be equal to either `#!json {"key": 1}` or `#!json {"key": 2}`.
- Internally, name/value pairs are stored in lexicographical order of the names. Objects will also be serialized (see `dump`) in this order. For instance, both `#!json {"b": 1, "a": 2}` and `#!json {"a": 2, "b": 1}` will be stored and serialized as `#!json {"a": 2, "b": 1}`.
- When comparing objects, the order of the name/value pairs is irrelevant. This makes objects interoperable in the sense that they will not be affected by these differences. For instance, `#!json {"b": 1, "a": 2}` and `#!json {"a": 2, "b": 1}` will be treated as equal.
### Key order
The order name/value pairs are added to the object is *not* preserved by the library. Therefore, iterating an object may return name/value pairs in a different order than they were originally stored. In fact, keys will be traversed in alphabetical order as `std::map` with `std::less` is used by default. Please note this behavior conforms to [RFC 7159](http://rfc7159.net/rfc7159), because any order implements the specified "unordered" nature of JSON objects.
### Limits
[RFC 7159](http://rfc7159.net/rfc7159) specifies:
> An implementation may set limits on the maximum depth of nesting.
In this class, the object's limit of nesting is not explicitly constrained. However, a maximum depth of nesting may be introduced by the compiler or runtime environment. A theoretical limit can be queried by calling the `max_size` function of a JSON object.
### Storage
Objects are stored as pointers in a `basic_json` type. That is, for any access to object values, a pointer of type `object_t*` must be dereferenced.
## Arrays
[RFC 7159](http://rfc7159.net/rfc7159) describes JSON arrays as follows:
> An array is an ordered sequence of zero or more values.
### Default type
With the default values for *ArrayType* (`std::vector`) and *AllocatorType* (`std::allocator`), the default value for `array_t` is:
```cpp
std::vector<
basic_json, // value_type
std::allocator<basic_json> // allocator_type
>
```
### Limits
[RFC 7159](http://rfc7159.net/rfc7159) specifies:
> An implementation may set limits on the maximum depth of nesting.
In this class, the array's limit of nesting is not explicitly constrained. However, a maximum depth of nesting may be introduced by the compiler or runtime environment. A theoretical limit can be queried by calling the `max_size` function of a JSON array.
### Storage
Arrays are stored as pointers in a `basic_json` type. That is, for any access to array values, a pointer of type `array_t*` must be dereferenced.
## Strings
[RFC 7159](http://rfc7159.net/rfc7159) describes JSON strings as follows:
> A string is a sequence of zero or more Unicode characters.
Unicode values are split by the JSON class into byte-sized characters during deserialization.
### Default type
With the default values for *StringType* (`std::string`), the default value for `string_t` is `#!cpp std::string`.
### Encoding
Strings are stored in UTF-8 encoding. Therefore, functions like `std::string::size()` or `std::string::length()` return the number of **bytes** in the string rather than the number of characters or glyphs.
### String comparison
[RFC 7159](http://rfc7159.net/rfc7159) states:
> Software implementations are typically required to test names of object members for equality. Implementations that transform the textual representation into sequences of Unicode code units and then perform the comparison numerically, code unit by code unit, are interoperable in the sense that implementations will agree in all cases on equality or inequality of two strings. For example, implementations that compare strings with escaped characters unconverted may incorrectly find that `"a\\b"` and `"a\u005Cb"` are not equal.
This implementation is interoperable as it does compare strings code unit by code unit.
### Storage
String values are stored as pointers in a `basic_json` type. That is, for any access to string values, a pointer of type `string_t*` must be dereferenced.
## Booleans
[RFC 7159](http://rfc7159.net/rfc7159) implicitly describes a boolean as a type which differentiates the two literals `true` and `false`.
### Default type
With the default values for *BooleanType* (`#!cpp bool`), the default value for `boolean_t` is `#!cpp bool`.
### Storage
Boolean values are stored directly inside a `basic_json` type.
## Numbers
[RFC 7159](http://rfc7159.net/rfc7159) describes numbers as follows:
> The representation of numbers is similar to that used in most programming languages. A number is represented in base 10 using decimal digits. It contains an integer component that may be prefixed with an optional minus sign, which may be followed by a fraction part and/or an exponent part. Leading zeros are not allowed. (...) Numeric values that cannot be represented in the grammar below (such as Infinity and NaN) are not permitted.
This description includes both integer and floating-point numbers. However, C++ allows more precise storage if it is known whether the number is a signed integer, an unsigned integer or a floating-point number. Therefore, three different types, `number_integer_t`, `number_unsigned_t`, and `number_float_t` are used.
### Default types
With the default values for *NumberIntegerType* (`std::int64_t`), the default value for `number_integer_t` is `std::int64_t`.
With the default values for *NumberUnsignedType* (`std::uint64_t`), the default value for `number_unsigned_t` is `std::uint64_t`.
With the default values for *NumberFloatType* (`#!cpp double`), the default value for `number_float_t` is `#!cpp double`.
### Default behavior
- The restrictions about leading zeros is not enforced in C++. Instead, leading zeros in integer literals lead to an interpretation as octal number. Internally, the value will be stored as decimal number. For instance, the C++ integer literal `#!c 010` will be serialized to `#!c 8`. During deserialization, leading zeros yield an error.
- Not-a-number (NaN) values will be serialized to `#!json null`.
### Limits
[RFC 7159](http://rfc7159.net/rfc7159) specifies:
> An implementation may set limits on the range and precision of numbers.
When the default type is used, the maximal integer number that can be stored is `#!c 9223372036854775807` (`INT64_MAX`) and the minimal integer number that can be stored is `#!c -9223372036854775808` (`INT64_MIN`). Integer numbers that are out of range will yield over/underflow when used in a constructor. During deserialization, too large or small integer numbers will be automatically be stored as `number_unsigned_t` or `number_float_t`.
When the default type is used, the maximal unsigned integer number that can be stored is `#!c 18446744073709551615` (`UINT64_MAX`) and the minimal integer number that can be stored is `#!c 0`. Integer numbers that are out of range will yield over/underflow when used in a constructor. During deserialization, too large or small integer numbers will be automatically be stored as `number_integer_t` or `number_float_t`.
[RFC 7159](http://rfc7159.net/rfc7159) further states:
> Note that when such software is used, numbers that are integers and are in the range $[-2^{53}+1, 2^{53}-1]$ are interoperable in the sense that implementations will agree exactly on their numeric values.
As this range is a subrange of the exactly supported range [`INT64_MIN`, `INT64_MAX`], this class's integer type is interoperable.
[RFC 7159](http://rfc7159.net/rfc7159) states:
> This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754-2008 binary64 (double precision) numbers is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision.
This implementation does exactly follow this approach, as it uses double precision floating-point numbers. Note values smaller than `#!c -1.79769313486232e+308` and values greater than `#!c 1.79769313486232e+308` will be stored as NaN internally and be serialized to `#!json null`.
### Storage
Integer number values, unsigned integer number values, and floating-point number values are stored directly inside a `basic_json` type.

View file

@ -49,6 +49,7 @@ nav:
- features/merge_patch.md
- features/enum_conversion.md
- features/sax_interface.md
- features/types.md
- Integration:
- integration/index.md
- integration/cmake.md
@ -98,6 +99,8 @@ markdown_extensions:
- pymdownx.snippets:
base_path: docs
check_paths: true
- plantuml_markdown:
format: svg
plugins:
- search:
@ -105,3 +108,8 @@ plugins:
- mkdocs-simple-hooks:
hooks:
on_post_build: "docs.hooks:copy_doxygen"
- minify:
minify_html: true
extra_javascript:
- https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-MML-AM_CHTML

View file

@ -1,8 +1,11 @@
click>=7.1.2
future>=0.18.2
htmlmin>=0.1.12
httplib2>=0.18.1
importlib-metadata>=1.6.0
Jinja2>=2.11.2
joblib>=0.15.1
jsmin>=2.2.2
livereload>=2.6.1
lunr>=0.5.8
Markdown>=3.2.2
@ -11,8 +14,11 @@ MarkupSafe>=1.1.1
mkdocs>=1.1.2
mkdocs-material>=5.2.1
mkdocs-material-extensions>=1.0
mkdocs-minify-plugin>=0.3.0
mkdocs-simple-hooks>=0.1.1
nltk>=3.5
plantuml>=0.3.0
plantuml-markdown>=3.2.2
Pygments>=2.6.1
pymdown-extensions>=7.1
PyYAML>=5.3.1