Coding for ar – an introduction
AR is magic to some, but magicians know that all magic is science. In this chapter we introduce you to the wands which build a magical layer on top of reality.
CODING FOR AR – AN INTRODUCTION
AR programming is neither just about mathematics nor just about code. There are hurdles and we are here to help you take a jump into creating AR apps
AR is magic to some, but magicians know that all magic is science. In this chapter we introduce you to the wands which build a magical layer on top of reality.
Concepts and Domains Involved
Augmented reality does not derive itself from a single engineering stream but many. It requires one to understand more than one technology at a time. Development of AR apps may require enormous amounts of effort depending on the type of development that needs be done. As much as it is true for any other type of application development, more critical the application, more testing and accuracy is required (a military application has to be much more fast, secure and precise than a traveller’s AR browser application). We have seen the types of augmented reality in previous chapters. In this one, we walk through the domain knowledge required by each of them.
Location based AR
Location based Augmented Reality is one of the easiest to program for obvious reasons - you would have all the resources required by you in the form of Android or iPhone (iOS) SDKs along with the device itself. The requirement here is to understand how to fetch data from sensors and use it to produce desired results. In most cases the APIs would provide you data in usable format but that would still be incomplete without in-depth understanding of what you are fetching.
It is important to understand the roles of Orientation Sensor and Accelerometer for creating a Location based AR app
In addition to this, you are in need of understanding GPS system and its limitations. If you are unclear about the Latitude and Longitude systems, you need to brush up your high school geography before you would be able to develop location based AR apps.
Projection Based AR
This one is more difficult to code for than location based applications. Projection based AR depends on three main factors – the surface on which projection has to be done, the light sequence to be projected and position of the projector. One point of projection based AR is that both development and implementation of an app is more expensive than a location based AR app. You are in need of an intelligent enough video editor (unless of course, you plan to project a static image), a powerful enough computer to help you with the job, a nice projector and a setup which allows you enough space to experiment. A few points which are worth a note about projection based AR:
1 You will always need a low-light situation for projection based AR to work.
2 You need enough space to develop the light sequence for projection. e.g. If you want to project your light sequence on a cube with an edge of 2 meter, kept at a distance of 8 meters from the projector, then developing the light sequence by using a dice as a reference surface is a bad idea. Ideally, the target object’s size on which the simulation/testing has to be done should be the same as the target; but in case that cannot be done right away, the target object should still be large enough. It helps remove errors and allows better inspection during development.
3 Projector resolution matters. You cannot use low resolution projectors built for PowerPoint presentations in a video sequence on a concert stage. When we talk about the domain of knowledge and experience required to undertake a projection based AR app, the skillset required is more mathematical than programmatic. Once again, it would depend on the specific application.
1 If you are planning to use projection based AR in a concert, then the skillset required is going to be almost purely creative. There is hardly any programming involved except the tool which would help generate the light sequence.
2 If you wanted to build a dialer on your palm, you would be in need of basic projection system (which would be probably mounted on your head) and a recognition system clever enough to understand what you are touching with your finger. In this case, you are more in need of image recognition skills which again are a blend of various technologies and span multiple research streams.
Recognition based AR
Recognition based AR depends heavily on pattern recognitions. More your app knows about patterns, more it can do. Pattern recognition technology has developed in leaps and bounds in the last half decade. Smile shot, face recognition for tagging and Google’s ‘Search by image’ feature are some of the most famous applications of image recognition technology. But this is nowhere close to completion.
The first place from where you can start off with recognition based AR development is, once again, your smartphone because it already comes with the features of pattern recognition. Every time you scan a QR code to install an app, to open a webpage, you employ the pattern recognition faculties of your phone. But your phone can only get good enough for basic pattern recognition. If you want it to do advanced 3D object recognition, it might not prove to be the best gadget.
QR code scanning depends on pattern recognition
3D object recognition requires much more data, processing power and efficient algorithms for getting the job done. While data can be stored on a phone and algorithms be designed for the same, phones cannot have very powerful processors. Coming back to 3D object recognition, you would typically be in need of clever image recognition systems which know how an object appears from different angles.
If you have to create an AR application which can recognise objects and then augment a layer on top of it, you are limited by some very basic shapes. For example, a popular development library AndAR (stands for Android Augmented Reality) which allows you to create recognition based AR applications for the Android platform supports only very basic pattern recognitions. Better computer vision and object recognition libraries are available for more powerful systems (such as for desktops and laptops) and with their aid, you can create more intelligent applications. It is easily predictable that in time, with more advanced hardware and software, such libraries will be available for mobiles as well.
From the knowledge domain perspective, recognition based AR requires deep understanding of computer vision and multimedia systems, especially that of video data (augmented reality works with continuous video feed from a camera; simple image recognition systems are not of much use here). Once again, all these disciplines depend on multiple mathematical research fields. If the library you are working with does not provide you with the required features, you are going to need superb mathematical skills to get things to work your way.
Outlining based AR
Outlining based AR heavily depends on object recognition and hence, a significant portion of knowledge required to work with recognition based systems is also applicable to the outlining based AR systems. However the game is not all the same. Outlining based AR applications may use data sources from other sensors too, such as compass, GPS, accelerometers, ultrasound sensors, infrared sensors and so on. Depending on the requirements of the particular application you want to develop, conceptual and domain knowledge of one or more of the above will be necessary.
Let us say you want to build a location based AR browser. To achieve the feat,c you need to plan first and for that, you would need answers to a few questions. We try to show you some which you would most certainly have to face:
• What platform should I build it for?
If you are using an Android phone, you would target android; if you are an iPhone user, it’d be iOS. If you want to build a commercial application, you would want to target them both and may be others too, such as Blackberry and Windows Phones. The target platform depends on who your users are and need you wish to fulfil.
• What programming languages can I use?
That would depend on the platform that you are targeting. If you want Android, you need Java. Objective C if it’s for the iPhone while Windows and Blackberry phones allow you to get creative with multiple languages.
• What OS can I use as a development platform?
Once again, it depends on your target platform. Windows phone apps need Windows for your development environment. Android apps can be developed on any platform where Java JDK works well. iOS development demands that you have a Macintosh with Xcode installed.
• What IDE should I use?
Windows Phone asks for Visual Studio and iOS asks for XCode. Android being based on Java can do with any IDE as long as you are able to compile the sources and generate an APK package. Ideally Eclipse is the IDE used by most but other IDEs with more features (such as IntelliJ Idea) too support Android SDKs.
• How much help would I get on the web?
iOS and Android cover majority of smartphone market and thus the help available on the web for one of these platforms is more than other platforms. Almost all platforms have well documented APIs which would help you as you progress.
• How much time is it going to take?
If you were to just throw the geographical coordinates on the user’s screen, it would not take much time (and that does not count as AR). But if you were to include a map and an object recognition system right into your AR browser app, then you are going to have your hands dirty with code for a longer time. It would also depend on what your experience has been as a developer for your target platform.
Android NDK can be used to run platform specific code
As you can see, there are multiple variables in the mix and the most important factor which is going to affect your decision is the platform you want to target. Depending on the platform you want to build your app for, you have several restrictions. As a side note, one of the biggest reasons Android wins the apps battle is the freedom of options it provides except in terms of programming language.
In all cases nonetheless, you are in need of the platform SDKs. In case of Android, you have a sepa rate NDK (Native Development Kit) which allows you to write native code for the target platform (remember that Android can be made to run on an Intel machine as well, while most mobile phones run on the ARM platform). The NDK compiles code to binary for the native platform. If you are going to program games which need OpenGL, you would need NDK in most cases to leverage the hardware capabilities.
What frameworks do I have to aid me?
It goes without saying that frameworks are built around a particular language (the only exception to this would be .NET framework which supports multiple languages). You can neither use CodeIgniter for creating a Java based app nor Qt framework for writing a Python script. We have already mentioned that the choice of language, IDE, development platform etc. depend on the target platform. This translates to frameworks being mutually exclusive for mobile platforms. In addition to this, frameworks which need to run on the mobile platform have to use the platform specific APIs. This shrinks the possibility to have cross platform frameworks for AR uses. Here we list out the most popular and useful frameworks available to ease up the process of building AR apps.
A list of libraries which could help you do AR would be incomplete without OpenCV. OpenCV stands for Open Source Computer Vision and is one of the most famous libraries in the computer vision field. In case you are unaware, computer vision is the field of study which deals with any and every topic methods of acquiring, extracting, analyzing and understanding information from images to help a computer for take decisions in real world. Object recognition technologies come under the umbrella of computer vision. OpenCV claims to have more than 2500 optimised algorithms for computer vision of which, a majority can be useful in creating recognition based AR apps.
Face detection with OpenCV library
OpenCV can be a great aid to AR programmers because of the language bindings that it offers. OpenCV has interfaces in C, C++ and Python as of now and soon, it would be available for Java as well. Since iOS apps need to be written in objective C, one can utilise OpenCV’s faculties by importing the library into the application. The fact that Windows phones can be programmed in multiple languages is a benefit and OpenCV can be used with C++ for Windows phones. As far as Android is concerned, you can use the NDK (Native Development Kit) to cross compile OpenCV for android. An Android specific port is available at http://opencv.org/android. A blackberry port is also available for OpenCV. In case you want to build a desktop based application, OpenCV is compatible with Windows, Linux and Mac.
The most attractive part of OpenCV is the efficiency. Being written in C++ with multithreading support, it runs fast. The API is organized and allows you to perform high level as well as low level functions on the data received. STL based interface is also a blessing. Despite being written in C/ C++ OpenCV manages the memory for you. This is a huge benefit for those who are afraid of pointers. OpenCV is licensed under a BSD license which means that it is both open source and free to use. You may use it for an open source project or for a commercial project.
AndAR stands for Android Augmented Reality and – as evident from its name – is meant for the Android platform. The project seems a bit old since the official project page (https://code.google.com/p/andar/) shows the last release date back in 2010 implemented for Andoid Eclair and Froyo. Since majority of devices now run on Gingerbread which is mostly backward compatible with Froyo, this would still be helpful. AndAR uses the JNI (Java Native Interface) and contains code written in C (ah, the efficiency part) which means that you would need the NDK, once again, to use it in your application.
Default AndAR Marker
AndAR can only detect most basic shapes. You can find AndAR and AndAR Model Viewer apps in the android market and see them in action. The default shape that AndAR recognizes is a small L shape inside a box. Both the box and the shape inside should be of thick lines and black in color. If you wish to, you may create your own markers.
IN2AR (pronounced “Into AR”) is the brainchild of a company called Beyond Reality located in the Netherlands and happens to be a SDK to help you build AR applications. The best part is IN2AD supports Unity. For those who think that Unity is just the UI of Ubuntu Linux, Unity is a game engine available for free at http://unity3d.com. The interesting part about Unity is its cross platform nature – it can help you reach 10 platforms: Desktop (Windows, Mac and Linux), Mobile (iOS, Android), Web (Packaged HTML pages, Flash) and Consoles (Wii, PS3 and XBOX 360). Since IN2AR supports unity, building your AR app in Unity will allow you to get cross platform publishing for your app easily enough so you can focus on the main functionality. IN2AR is written in Adobe ActionScript (AS3) and uses Adobe Native Extension Library for platform specific code.
IN2AR demo page shows the advanced capabilities it comes with
To use IN2AR fully you’ll need to get the Adobe SDKs as well. The Flash SDK and AIR SDK are similar to each other while it is the Unity SDK which you might get interested in. Unity SDK comes with IN2AR plugins for iOS and Android (along with Windows and Mac).
Unity can deliver on almost every platform
IN2AR is available in two licences – commercial and free. The free license has some restrictions such as added logo on web apps and 90 seconds operation on mobile apps. The commercial license is costly (around $2600 or Rs. 1,30,000). That is a huge sum if you only want to try and test the platform. If you have an awesome idea revolving around recognition based AR, you can start off with the free version. That said, the Unity engine too has a commercial license which you can avail for an amount (but well, the licensing is pretty flexible here). To understand why this framework in particular carries value, do visit the IN2AR demo page: http://www. in2ar.com/demo.asp
Once you have gone through the demo, it is not difficult to see why IN2AR can be a good tool for creating apps like a virtual online dressing room. The Unity SDK makes sure that you can build games around Augmented Reality and publish it to not only mobile platforms but also to desktop gamers. Since IN2AR can detect multiple markers in a video feed, possibilities are high. Do note however that detection of more than one marker needs more processing and the performance would roll down.
Layar is one of the most famous AR location browsers which allows one to work with layers. You can create your very own layer which marks all the interesting outlets in the locality to help your friends. A food chain store can create one to help users locate their outlets. A company can create one to help users locate their offices and so on. All with minimal effort.
Layar Vision recognizes the image fingerprint
Although Layar has been an AR location browser, a relatively new technology named “Layar Vision” from the company promises to bring a new dimension to advertising and to the print industry. It allows one to upload a picture or a set of pictures to their servers which together form part of a layer. When the user activates the corresponding layer, it readies his/her device for image detection. When the user points the camera towards the image in real world (with the Layar app running and the right layer loaded on the app), Layar Vision recognises the image and shows the content. The content may be an advertisement or promotional offer or a descriptive video or anything that the creator of the layer intended to show for the image. It takes the concept of QR codes and pushes it one step ahead. While QR codes can contain text and web links, Layar Vision can trigger a predefined action. Also, Layar Vision uses an AR layer so it is more interesting than looking at a non-interactive QR code.
Layar APIs depend on REST technology which in turn depends on HTTP (just like majority of other applications). This translates to a lot of help being available on the web. The APIs are documented and you should not find much difficulty using them. Should you find yourself in trouble, quite a long list of posts and entries are available online along with discussion forums where others working on the platform could help you get through your niggles faster.
FLARToolkit is based on ARToolkit. The development of original library (ARToolkit) appears to have stopped as the last release was back in 2007. However, FLARToolkit is still breathing and alive. The project is available online at http://www.libspark.org/. The website is in Japanese and the only way you can use it well enough is by using Google Translate. There is a discussion forum which is in Japanese too, unfortunately. The library has been utilised in quite a number of programs. If you are willing to use it, the official site may not be very useful (we assume you do not understand Japanese). However, help is available elsewhere by people who have used the library to create their own apps.
Wikitude SDK is not free enough
To sum it up...
While there is no dearth of frameworks for almost any purpose, it is better to start-off with open source libraries. If you have not programmed an AR application before, the open source nature of the libraries would help you understand the underlying concepts, and guess what, you cannot skip the concepts even if you are using an AR library.
If your aren’t targeting a mobile platform, OpenCV is the best bet in the free world, and IN2AR if you are ready to pay. There are many other frameworks and libraries which we have not mentioned because they are either immature or their development has stopped long since. Also, before we end, we would like to emphasise on planning. Most AR applications are being targeted for mobile phones. If you too are going to develop for mobile phones, choose your target platform(s) wisely. Even more importantly, you should choose the framework which is known to work with the platform well enough. If the planned platform (and we do not just mean the OS, the device which you are going to test your application upon matters as well) does not support the technologies, amount of effort needed will rise. We wish you a very Happy