License Agreement

Permission is hereby granted by the Open Geospatial Consortium, ("Licensor"), free of charge and subject to the terms set forth below, to any person obtaining a copy of this Intellectual Property and any associated documentation, to deal in the Intellectual Property without restriction (except as set forth below), including without limitation the rights to implement, use, copy, modify, merge, publish, distribute, and/or sublicense copies of the Intellectual Property, and to permit persons to whom the Intellectual Property is furnished to do so, provided that all copyright notices on the intellectual property are retained intact and that each person to whom the Intellectual Property is furnished agrees to the terms of this Agreement.

If you modify the Intellectual Property, all copies of the modified Intellectual Property must include, in addition to the above copyright notice, a notice that the Intellectual Property includes modifications that have not been approved or adopted by LICENSOR.

THIS LICENSE IS A COPYRIGHT LICENSE ONLY, AND DOES NOT CONVEY ANY RIGHTS UNDER ANY PATENTS THAT MAY BE IN FORCE ANYWHERE IN THE WORLD.

THE INTELLECTUAL PROPERTY IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE DO NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE INTELLECTUAL PROPERTY WILL MEET YOUR REQUIREMENTS OR THAT THE OPERATION OF THE INTELLECTUAL PROPERTY WILL BE UNINTERRUPTED OR ERROR FREE. ANY USE OF THE INTELLECTUAL PROPERTY SHALL BE MADE ENTIRELY AT THE USER’S OWN RISK. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR ANY CONTRIBUTOR OF INTELLECTUAL PROPERTY RIGHTS TO THE INTELLECTUAL PROPERTY BE LIABLE FOR ANY CLAIM, OR ANY DIRECT, SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM ANY ALLEGED INFRINGEMENT OR ANY LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR UNDER ANY OTHER LEGAL THEORY, ARISING OUT OF OR IN CONNECTION WITH THE IMPLEMENTATION, USE, COMMERCIALIZATION OR PERFORMANCE OF THIS INTELLECTUAL PROPERTY.

This license is effective until terminated. You may terminate it at any time by destroying the Intellectual Property together with all copies in any form. The license will also terminate if you fail to comply with any term or condition of this Agreement. Except as provided in the following sentence, no such termination of this license shall require the termination of any third party end-user sublicense to the Intellectual Property which is in force as of the date of notice of such termination. In addition, should the Intellectual Property, or the operation of the Intellectual Property, infringe, or in LICENSOR’s sole opinion be likely to infringe, any patent, copyright, trademark or other right of a third party, you agree that LICENSOR, in its sole discretion, may terminate this license without any compensation or liability to you, your licensees or any other party. You agree upon termination of any kind to destroy or cause to be destroyed the Intellectual Property together with all copies in any form, whether held by you or by any third party.

Except as contained in this notice, the name of LICENSOR or of any other holder of a copyright in all or part of the Intellectual Property shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Intellectual Property without prior written authorization of LICENSOR or such copyright holder. LICENSOR is and shall at all times be the sole entity that may authorize you or any third party to use certification marks, trademarks or other special designations to indicate compliance with any LICENSOR standards or specifications. This Agreement is governed by the laws of the Commonwealth of Massachusetts. The application to this Agreement of the United Nations Convention on Contracts for the International Sale of Goods is hereby expressly excluded. In the event any provision of this Agreement shall be deemed unenforceable, void or invalid, such provision shall be modified so as to make it valid and enforceable, and as so modified the entire Agreement shall remain in full force and effect. No decision, action or inaction by LICENSOR shall be construed to be a waiver of any rights or remedies available to it.


 

i. Abstract

This OGC® Standard defines the Augmented Reality Markup Language 2.0 (ARML 2.0). ARML 2.0 allows users to describe virtual objects in an Augmented Reality (AR) scene with their appearances and their anchors (a broader concept of a location) related to the real world. Additionally, ARML 2.0 defines ECMAScript bindings to dynamically modify the AR scene based on user behavior and user input.

ii. Keywords

The following are keywords to be used by search engines and document catalogues.

ogcdoc ar augmented reality virtual objects arml virtual reality mixed reality 3d graphics model

iii. Preface

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. The Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.

Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the standard set forth in this document, and to provide supporting documentation.

iv. Submitting organizations

The following organizations submitted this Document to the Open Geospatial Consortium (OGC):

Wikitude GmbH.
Georgia Tech
SK Telecom
Laramie Range Innovations, LLC

v. Submitters

All questions regarding this submission should be directed to the editor or the submitters:

Name Representing OGC member
Martin Lechner Wikitude GmbH. Yes
Blair MacIntyre Georgia Tech Yes
Sang Chul Kang SK Telecom Yes
Scott Simmons Laramie Range Innovations, LLC Yes

Scope

The scope of the ARML 2.0 standard is to provide an interchange format for Augmented Reality (AR) applications to describe an AR scene, with a focus on vision-based AR (as opposed to AR relying on audio etc.). The format describes the virtual objects that are placed into an AR environment, as well as their registration in the real world. ARML 2.0 is specified as an XML grammar. Both the specification as well as the XSD schema are provided.

Additionally, ARML 2.0 provides ECMAScript bindings to allow dynamic modification of the scene, as well as interaction with the user. The ECMAScript bindings, described in JSON, use the same core object models as the XML grammar and include event handling and animations.

The goal of ARML 2.0 is to provide an extensible standard and framework for AR applications to serve the AR use cases currently used or developed. With AR, many different standards and computational areas developed in different working groups come together. ARML 2.0 needs to be flexible enough to tie into other standards without actually having to adopt them, thus creating an AR-specific standard with connecting points to other widely used and AR-relevant standards.

As a requirement, a device running an AR implementation using ARML 2.0 must have a component (screen, see-through display etc.) where the virtual objects are projected on. The device must have sensors such as a camera, GPS, and orientation to analyze the real world - .

Users interact with the virtual scene by moving around in the real world. Based on the movement of the user, the scene on the screen is constantly updated. A user can also interact with the scene by selecting virtual objects, typically by touching them on the screen. However, how a user can select a virtual object is application- and device-specific and out of scope for ARML 2.0.

The plan is to extend ARML in the future to support non-visual virtual objects, such as sound and haptic feedback. The current specification of ARML 2.0, however, focuses on visual objects.

Normative References

The following normative documents contain provisions that, through reference in this text, constitute provisions of this document. For dated references, subsequent amendments to, or revisions of, any of these publications do not apply. For undated references, the latest edition of the normative document referred to applies.

XML Schema Part 1: Structures Second Edition. W3C Recommendation (28 October 2004)
http://www.w3.org/TR/xmlschema-1/
ECMAScript Language Specification
http://www.ecma-international.org/ecma-262/5.1/ECMA-262.pdf
Web IDL Specification
http://www.w3.org/TR/WebIDL/
OGC® Geography Markup Language (GML) Encoding Standard version 3.2.1 (2007)
http://www.opengeospatial.org/standards/gml
COLLADA Specification
http://www.khronos.org/collada/
XML Path Language (XPath) 2.0
http://www.w3.org/TR/xpath20/

Terms and Definitions

Terms and definitions used in this document are reused form the AR Glossary developed by the International AR Standards Community [AR Glossary] where applicable. The glossary is a public document (insert URL); the community’s chairperson gave specific permission for use.

The following definitions are used within the document:

3.1 (AR) Implementation or AR Application
Any service that provides Augmentations to an AR-ready device or system.

3.2 Device
the hardware unit the AR implementation is running on.
3.3 Augmentation
A relationship between the real world and a digital asset. The realization of an augmentation is a composed scene. An augmentation may be formalized through an authoring and publishing process where the relationship between real and virtual is defined and made discoverable.
3.4 Digital Asset
Data that is used to augment users' perception of reality and encompasses various kinds of digital content such as text, image, 3d models, video, audio and haptic surfaces. A digital asset is part of an augmentation and therefore is rendered in a composed scene. A digital asset can be scripted with behaviors. These scripts can be integral to the object (for example, a GIF animation) or separate code artifacts (for example, browser markup). A digital asset can have styling applied that changes its default appearance or presentation. Visual Assets are digital assets that are represented visually. As ARML in its current version focuses on visual representations of augmentations, only Visual Assets are allowed.
3.5 Composed Scene
Produced by a system of sensors, displays and interfaces that creates a perception of reality where augmentations are integrated into the real world. A composed scene in an augmented reality system is a manifestation of a real world environment and one or more rendered digital assets. It does not necessarily involve 3D objects or even visual rendering. The acquisition of the user (or device)'s current pose is required to align the composed scene to the user's perspective. Examples of composed scenes with visual rendering (AR in camera view) include a smartphone application that presents visualization through the handheld video display, or a webcam-based system where the real object and augmentation are displayed on a PC monitor.
3.6 Camera View or AR View
the term used to describe the presentation of information to the user (the augmentation) as an overlay on the camera display.

Conventions

Abbreviated terms

The following symbols and abbreviated are used in this standard;

AR
     Augmented Reality
ARML
     Augmented Reality Markup Language
GML
     Geography Markup Language
JSON
     JavaScript Object Notation
KML
     Keyhole Markup Language
OGC
     Open Geospatial Consortium
XML
     Extensible Markup Language
XSD
     W3C XML Schema Definition Language

Schema language

The XML implementation specified in this Standard is described using the XML Schema language (XSD) [XML Schema Part 1: Structures].

Scripting Components

The Scripting components described are based on the ECMAScript language specification [ECMAScript Language Specification] and are defined using Web IDL [Web IDL Specification].

Introduction

Even though AR has been researched for decades, no formal definition of AR exists. Below are two descriptions/definitions of AR:

[Wikipedia AR Definition (insert date pulled)]: Augmented reality (AR) is a live, direct or indirect, view of a physical, real-world environment whose elements are augmented by computer-generated sensory input such as sound, video, graphics or GPS data. As a result, the technology functions by enhancing one’s current perception of reality. AR is about augmenting the real world environment with virtual information by improving people’s senses and skills. AR mixes virtual characters with the actual world.

[Ronald Azuma AR Definition]: Augmented Reality is a system that has the following three characteristics:

History of ARML - ARML 1.0

ARML 2.0’s predecessor, ARML 1.0 [ARML 1.0 Specification], was developed in 2009 as a proprietary interchange format for the Wikitude World Browser. ARML 2.0 does not extend ARML 1.0. Instead ARML 2.0 is a complete redesign of the format. ARML 1.0 documents are not expected to work with implementations based on ARML 2.0. ARML without a version number implicitly stands for ARML 2.0 in this document.

ARML 1.0 is a descriptive, XML based data format, specifically targeted for mobile AR applications. ARML focuses on mapping geo-referenced Points of Interest (POIs) and their metadata, as well as mapping data for the POI content providers publishing the POIs to the AR application. The creators of the Wikitude World Browser defined ARML 1.0 in late 2009, to enable developers to create content for AR Browsers. ARML 1.0 combines concepts and functionality typically shared by AR Browser, reuses concepts defined in OGC’s KML standard and is already used by hundreds of AR content developers around the world.

ARML 1.0 is fairly restrictive and focuses on functionality Wikitude required back in 2009. Thus, ARML 2.0, while still using ideas coming from ARML 1.0, is targeted to be a complete redesign of the 1.0 format, taking the evolution of the AR industry, as well as other concepts and ideas into account.

ARML 2.0 - Object Model (normative)

Requirements Class
http://www.opengis.net/spec/arml/2.0/req/core
Target Type Software Implementation

General Concepts

Features, Anchors and VisualAssets

In ARML 2.0, a Feature represents a real world object that should be augmented. Using the Ferris Wheel below as an example, the Feature to augment is the Ferris Wheel itself. Technically speaking, a Feature consists of some metadata on the real world object, as well as one or more Augmentations that describe where a Feature is located in the composed scene. In ARML 2.0 terms, an Augmentation is called an Anchor.

Anchors define the link between the digital and the physical world (a broader concept of a location). An Anchor describes where a particular Feature is located in the real world. An Anchor can be either a spatial location that is tracked using location and motion sensors on the device, or a visual pattern (such as markers, QR codes or any sort of reference image) that can be detected and tracked in the camera stream using computer vision technology. In the Ferris Wheel example, the Anchor is the geospatial location of the Ferris Wheel in Vienna.

Finally, VisualAssets describe how a particular Anchor should be represented in the Composed Scene. VisualAssets can either be 2-dimensional (such as text or images) or 3-dimensional. The icon and the text in the example below represent VisualAssets that are attached to the Anchor of the Ferris Wheel, causing the Ferris Wheel to be augmented with the Visual Assets as soon as the Anchor is visible in the scene.

Examples:

 

Geospatial AR
Feature The physical object: The Riesenrad (Ferris Wheel) in Vienna, including Metadata
Anchor Its location: 48.216581,16.395847
VisualAsset The digital object that is used to represent the Feature in the scene.
Result As soon as the location of the Ferris Wheel is detected to be in the field of vision (typically using GPS, motion sensors, magnetometers etc.), the VisualAsset is projected onto the corresponding position on the screen.

 

Computer Vision-based AR
Feature The security features of a 10-dollar-note  
Anchor A US 10 Dollar-note (along with the location of the security features on the note).
VisualAsset Some buttons that can be pressed to get more information on a particular security feature
Result As soon as the 10 Dollar note is detected in the scene, the VisualAssets are projected onto the note in the correct positions.

 

Declarative and Scripting Specification

ARML 2.0 comes with a declarative specification, serialized in XML, describing the objects in the AR scene (section 0), as well as a scripting specification allowing dynamically modifying the scene and reacting on user-triggered events (section 0). The scripting specification uses ECMAScript for the scripting parts and a JSON serialization of the objects for accessing the objects’ properties.

The scripting specification declares hooks to the descriptive spec, so both specs, while existing separately from another, work together for a dynamic experience. An implementation can chose to only support the declarative spec (for instance in case scripting parts cannot be implemented on the platform the implementation is running on).

The scripting specification contains sections, which are intended for advanced users only. These sections are clearly marked as Advanced ARML in the title and are intended for those already familiar with the basic concepts of ARML.

Document Structure

An ARML document is grouped into three parts: The declarative part (AR Elements), the styling part and the scripting part.

 

The following section will describe the ARElements section.

Requirement
http://www.opengis.net/spec/arml/2.0/req/core/parse_encoding
An implementation shall be able to parse valid ARML 2.0 XML encodings, as defined in section 7.

ARML 2.0 Object Model Diagram

ARML 2.0 is built on top of a generic object model to allow future serializations in different languages, as well as good extensibility for future needs.

The diagram below shows the generic object model in ARML 2.0.

Figure : generic object model in ARML 2.0

 

Units

Units in ARML are given in meters. Whenever any virtual object in ARML has a size of x meters, the size of this object on the screen is equal to a real world object of the same size and the same distance in the camera view.
Remark: The actual size on the screen is dependent on certain camera parameters on the device.

Requirement
http://www.opengis.net/spec/arml/2.0/req/core/units
All units are specified in meters. The specified size of a virtual object corresponds to the size of a real world object of the same size at the same distance.

interface ARElement

Most classes specified in ARML 2.0 are derived from ARElement. An ARElement has an optional id property, which uniquely identifies the object. The user id is pre-assigned by the system and must not be used in the encoding. If user is used, the attribute must be ignored.

Requirement
http://www.opengis.net/spec/arml/2.0/req/core/ARElement/id_user
In case an ARElement’s id property is set to user, the property shall be ignored.


Properties:

Name Description Type Multiplicity
id The unique ID of the ARElement string 0 or 1

id
The unique ID of the ARElement which makes it uniquely accessible and referenceable.

class Feature

Inherits From ARElement.

A Feature is an abstraction of a real world phenomenon [GML Specification]. In ARML, a Feature has one or more Anchors, which describe how the Feature is registered in the real world. Each of these Anchors has one or more VisualAssets attached to it, which visually represent the Feature  in the composed scene.

Example: