This OGC® Standard defines the Augmented Reality Markup Language 2.0 (ARML 2.0). ARML 2.0 allows users to describe virtual objects in an Augmented Reality (AR) scene with their appearances and their anchors (a broader concept of a location) related to the real world. Additionally, ARML 2.0 defines ECMAScript bindings to dynamically modify the AR scene based on user behavior and user input.
The following are keywords to be used by search engines and document catalogues.
ogcdoc ar augmented reality virtual objects arml virtual reality mixed reality 3d graphics model
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.
Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.
The following organizations submitted this Document to the Open Geospatial Consortium (OGC):
- Wikitude GmbH.
- Georgia Tech
- SK Telecom
- Laramie Range Innovations, LLC
All questions regarding this submission should be directed to the editor or the submitters:
|Martin Lechner||Wikitude GmbH.||Yes|
|Blair MacIntyre||Georgia Tech||Yes|
|Sang Chul Kang||SK Telecom||Yes|
|Scott Simmons||Laramie Range Innovations, LLC||Yes|
The scope of the ARML 2.0 standard is to provide an interchange format for Augmented Reality (AR) applications to describe an AR scene, with a focus on vision-based AR (as opposed to AR relying on audio etc.). The format describes the virtual objects that are placed into an AR environment, as well as their registration in the real world. ARML 2.0 is specified as an XML grammar. Both the specification as well as the XSD schema are provided.
Additionally, ARML 2.0 provides ECMAScript bindings to allow dynamic modification of the scene, as well as interaction with the user. The ECMAScript bindings, described in JSON, use the same core object models as the XML grammar and include event handling and animations.
The goal of ARML 2.0 is to provide an extensible standard and framework for AR applications to serve the AR use cases currently used or developed. With AR, many different standards and computational areas developed in different working groups come together. ARML 2.0 needs to be flexible enough to tie into other standards without actually having to adopt them, thus creating an AR-specific standard with connecting points to other widely used and AR-relevant standards.
As a requirement, a device running an AR implementation using ARML 2.0 must have a component (screen, see-through display etc.) where the virtual objects are projected on. The device must have sensors such as a camera, GPS, and orientation to analyze the real world - .
Users interact with the virtual scene by moving around in the real world. Based on the movement of the user, the scene on the screen is constantly updated. A user can also interact with the scene by selecting virtual objects, typically by touching them on the screen. However, how a user can select a virtual object is application- and device-specific and out of scope for ARML 2.0.
The plan is to extend ARML in the future to support non-visual virtual objects, such as sound and haptic feedback. The current specification of ARML 2.0, however, focuses on visual objects.
2. Normative References
The following normative documents contain provisions that, through reference in this text, constitute provisions of this document. For dated references, subsequent amendments to, or revisions of, any of these publications do not apply. For undated references, the latest edition of the normative document referred to applies.
- XML Schema Part 1: Structures Second Edition. W3C Recommendation (28 October 2004)
- ECMAScript Language Specification
- Web IDL Specification
- OGC® Geography Markup Language (GML) Encoding Standard version 3.2.1 (2007)
- COLLADA Specification
- XML Path Language (XPath) 2.0
3. Terms and Definitions
Terms and definitions used in this document are reused form the AR Glossary developed by the International AR Standards Community [AR Glossary] where applicable. The glossary is a public document (insert URL); the community’s chairperson gave specific permission for use.
The following definitions are used within the document:
- 3.1 (AR) Implementation or AR Application
- Any service that provides Augmentations to an AR-ready device or system.
- 3.2 Device
- the hardware unit the AR implementation is running on.
- 3.3 Augmentation
- A relationship between the real world and a digital asset. The realization of an augmentation is a composed scene. An augmentation may be formalized through an authoring and publishing process where the relationship between real and virtual is defined and made discoverable.
- 3.4 Digital Asset
- Data that is used to augment users' perception of reality and encompasses various kinds of digital content such as text, image, 3d models, video, audio and haptic surfaces. A digital asset is part of an augmentation and therefore is rendered in a composed scene. A digital asset can be scripted with behaviors. These scripts can be integral to the object (for example, a GIF animation) or separate code artifacts (for example, browser markup). A digital asset can have styling applied that changes its default appearance or presentation. Visual Assets are digital assets that are represented visually. As ARML in its current version focuses on visual representations of augmentations, only Visual Assets are allowed.
- 3.5 Composed Scene
- Produced by a system of sensors, displays and interfaces that creates a perception of reality where augmentations are integrated into the real world. A composed scene in an augmented reality system is a manifestation of a real world environment and one or more rendered digital assets. It does not necessarily involve 3D objects or even visual rendering. The acquisition of the user (or device)'s current pose is required to align the composed scene to the user's perspective. Examples of composed scenes with visual rendering (AR in camera view) include a smartphone application that presents visualization through the handheld video display, or a webcam-based system where the real object and augmentation are displayed on a PC monitor.
- 3.6 Camera View or AR View
- the term used to describe the presentation of information to the user (the augmentation) as an overlay on the camera display.
4.1 Abbreviated terms
The following symbols and abbreviated are used in this standard;
- Augmented Reality
- Augmented Reality Markup Language
- Geography Markup Language
- Keyhole Markup Language
- Open Geospatial Consortium
- Extensible Markup Language
- W3C XML Schema Definition Language
4.2 Schema language
The XML implementation specified in this Standard is described using the XML Schema language (XSD) [XML Schema Part 1: Structures].
4.3 Scripting Components
The Scripting components described are based on the ECMAScript language specification [ECMAScript Language Specification] and are defined using Web IDL [Web IDL Specification].
Even though AR has been researched for decades, no formal definition of AR exists. Below are two descriptions/definitions of AR:
[Wikipedia AR Definition (insert date pulled)]: Augmented reality (AR) is a live, direct or indirect, view of a physical, real-world environment whose elements are augmented by computer-generated sensory input such as sound, video, graphics or GPS data. As a result, the technology functions by enhancing one’s current perception of reality. AR is about augmenting the real world environment with virtual information by improving people’s senses and skills. AR mixes virtual characters with the actual world.
[Ronald Azuma AR Definition]: Augmented Reality is a system that has the following three characteristics:
- Combines real and virtual
- Interactive in real time
- Registered in 3-D
5.1 History of ARML - ARML 1.0
ARML 2.0’s predecessor, ARML 1.0 [ARML 1.0 Specification], was developed in 2009 as a proprietary interchange format for the Wikitude World Browser. ARML 2.0 does not extend ARML 1.0. Instead ARML 2.0 is a complete redesign of the format. ARML 1.0 documents are not expected to work with implementations based on ARML 2.0. ARML without a version number implicitly stands for ARML 2.0 in this document.
ARML 1.0 is a descriptive, XML based data format, specifically targeted for mobile AR applications. ARML focuses on mapping geo-referenced Points of Interest (POIs) and their metadata, as well as mapping data for the POI content providers publishing the POIs to the AR application. The creators of the Wikitude World Browser defined ARML 1.0 in late 2009, to enable developers to create content for AR Browsers. ARML 1.0 combines concepts and functionality typically shared by AR Browser, reuses concepts defined in OGC’s KML standard and is already used by hundreds of AR content developers around the world.
ARML 1.0 is fairly restrictive and focuses on functionality Wikitude required back in 2009. Thus, ARML 2.0, while still using ideas coming from ARML 1.0, is targeted to be a complete redesign of the 1.0 format, taking the evolution of the AR industry, as well as other concepts and ideas into account.
6. ARML 2.0 - Object Model (normative)
|Target Type||Software Implementation|
6.1 General Concepts
6.1.1 Features, Anchors and VisualAssets
In ARML 2.0, a Feature represents a real world object that should be augmented. Using the Ferris Wheel below as an example, the Feature to augment is the Ferris Wheel itself. Technically speaking, a Feature consists of some metadata on the real world object, as well as one or more Augmentations that describe where a Feature is located in the composed scene. In ARML 2.0 terms, an Augmentation is called an Anchor.
Anchors define the link between the digital and the physical world (a broader concept of a location). An Anchor describes where a particular Feature is located in the real world. An Anchor can be either a spatial location that is tracked using location and motion sensors on the device, or a visual pattern (such as markers, QR codes or any sort of reference image) that can be detected and tracked in the camera stream using computer vision technology. In the Ferris Wheel example, the Anchor is the geospatial location of the Ferris Wheel in Vienna.
Finally, VisualAssets describe how a particular Anchor should be represented in the Composed Scene. VisualAssets can either be 2-dimensional (such as text or images) or 3-dimensional. The icon and the text in the example below represent VisualAssets that are attached to the Anchor of the Ferris Wheel, causing the Ferris Wheel to be augmented with the Visual Assets as soon as the Anchor is visible in the scene.
|Feature||The physical object: The Riesenrad (Ferris Wheel) in Vienna, including Metadata|
|Anchor||Its location: 48.216581,16.395847|
|VisualAsset||The digital object that is used to represent the Feature in the scene.|
|Result||As soon as the location of the Ferris Wheel is detected to be in the field of vision (typically using GPS, motion sensors, magnetometers etc.), the VisualAsset is projected onto the corresponding position on the screen.|
|Computer Vision-based AR|
|Feature||The security features of a 10-dollar-note|
|Anchor||A US 10 Dollar-note (along with the location of the security features on the note).|
|VisualAsset||Some buttons that can be pressed to get more information on a particular security feature|
|Result||As soon as the 10 Dollar note is detected in the scene, the VisualAssets are projected onto the note in the correct positions.|
6.1.2 Declarative and Scripting Specification
ARML 2.0 comes with a declarative specification, serialized in XML, describing the objects in the AR scene (section 0), as well as a scripting specification allowing dynamically modifying the scene and reacting on user-triggered events (section 0). The scripting specification uses ECMAScript for the scripting parts and a JSON serialization of the objects for accessing the objects’ properties.
The scripting specification declares hooks to the descriptive spec, so both specs, while existing separately from another, work together for a dynamic experience. An implementation can chose to only support the declarative spec (for instance in case scripting parts cannot be implemented on the platform the implementation is running on).
The scripting specification contains sections, which are intended for advanced users only. These sections are clearly marked as Advanced ARML in the title and are intended for those already familiar with the basic concepts of ARML.
6.1.3 Document Structure
An ARML document is grouped into three parts: The declarative part (AR Elements), the styling part and the scripting part.
- The ARElements element contains a list of ARElement objects, as specified in the ARML specification below.
- The optional style element contains styles (typically CSS) used for styling the virtual objects in the scene.
The following section will describe the ARElements section.
|An implementation shall be able to parse valid ARML 2.0 XML encodings, as defined in section 7.|
6.1.4 ARML 2.0 Object Model Diagram
ARML 2.0 is built on top of a generic object model to allow future serializations in different languages, as well as good extensibility for future needs.
The diagram below shows the generic object model in ARML 2.0.
Units in ARML are given in meters. Whenever any virtual object in ARML has a size of x meters, the size of this object on the screen is equal to a real world object of the same size and the same distance in the camera view.
Remark: The actual size on the screen is dependent on certain camera parameters on the device.
|All units are specified in meters. The specified size of a virtual object corresponds to the size of a real world object of the same size at the same distance.|
6.2 interface ARElement
Most classes specified in ARML 2.0 are derived from ARElement. An ARElement has an optional id property, which uniquely identifies the object. The user id is pre-assigned by the system and must not be used in the encoding. If user is used, the attribute must be ignored.
|In case an ARElement’s id property is set to user, the property shall be ignored.|
|id||The unique ID of the ARElement||string||0 or 1|
The unique ID of the ARElement which makes it uniquely accessible and referenceable.
6.3 class Feature
Inherits From ARElement.
A Feature is an abstraction of a real world phenomenon [GML Specification]. In ARML, a Feature has one or more Anchors, which describe how the Feature is registered in the real world. Each of these Anchors has one or more VisualAssets attached to it, which visually represent the Feature in the composed scene.