Qi Pan, a Cambridge University researcher has developed proForma. ProForma is a tool that turns any normal webcam into a 3D scanner. I got to interview him and talk about his product, which if it worked could one of  the holy grails of mass customization. It would enable anyone to inexpensively turn things digital and then reproduce them.

Joris Peels: So how did you get started on this project?

Qi Pan: At the start of my PhD, I was interested in real-time 3D modeling of
outdoor scenes. However, several months in, I realised that current
processing power wasn’t enough to model outdoor scenes well (due to
occlusions, lack of texture, etc). Therefore I turned my attention to
smaller objects, which would stand a better chance on current
hardware. With smaller objects though, they would always be sitting in
an environment, which you wouldn’t want to model, which led me to the
idea of using a fixed camera and separating the object using motion.
All of the design choices made in the system were then tailored
towards making everything as fast as possible, whilst still producing
a reasonable output.

How long did it take?

The project as it stands has taken around a year and a half to
develop, although not all of that time was spent on development (time
was also spent on publications and attending various conferences).

What was hard to do?

The hardest thing to do was to combine all of the system components
into a real-time system. The problem with real-time is that if any one
part of the system is not working well, your system just doesn’t work
full stop. Therefore you need to make sure all parts are well
optimised and producing the right output at the right time for the
other components. When designing each component, the utmost care had
to be taken to ensure that we were doing things as efficiently as
possible, using the best available algorithms (or inventing our own if
none existed).


How does it work exactly?

The system works in two stages.
The first stage is a tracker, and uses the partial 3D model we’ve
constructed to work out the position and orientation of the object
relative to the camera. This stage also tracks the position of
interest points (areas of high contrast change) in the images
frame-to-frame. After a significant enough motion is detected, a
key-frame plus the interest point tracks are passed to the
reconstruction stage. Only interest points on the object are tracked
as there is a mathematical constraint on the motion of points on a
rigid object (based on Epipolar geometry).
The reconstruction stage takes these feature tracks and triangulates
3D positions in order to form a cloud of points. This is then meshed
using a 3D Delaunay tetrahedralisation. This however merely partitions
the convex hull of the points into tetrahedra, so therefore we need to
employ a carving algorithm to remove incorrect tetrahedra from
concavities in the object. We formulated a very efficient
probabilistic carving algorithm to achieve this, which allows us to
obtain the surface of the object based on the interest points we’ve
seen in each keyframe.
This method requires a partial 3D model to track from, which isn’t
available right at the start of reconstruction (but is later).
Therefore, our initialisation step differs slightly from normal
operation. We assume that at least part of the object falls within a
large circle at the centre of the image. We track interest points
inside this circle, and use rigid body motion constraints to ascertain
the orientation and position of the object relative to the camera.
Amazingly, this is possible, even if we have no idea about the 3D
positions of the interest points we are tracking! The system then
works as above once we have this initial orientation and position.


But, can I take a thing and then you will give me a mesh?

Yes, as long as it is textured enough! The system is based on interest
points, so the object must have enough areas of high contrast change.


What are some of the limitations?
This system is of course only a first step in generic object
reconstruction, and as such has a few limitations. One limitation is
the inability to model objects or parts of objects without enough

texture. This is something we are working on – we are seeking to
combine other cues to complement our interest point based approach.

This approach can  in theory be applied to modeling entire scenes,

but then we come up against the problems of the environment not being
textured enough in areas, occlusion and needing more processing power.

The technique as it stands can only be used to model rigid objects due
to the rigid body assumption being used for segmentation.

You will be working more on it in the future?

Yes – we most certainly will! This project is more of a proof of
concept and just the tip of the iceberg in terms of what we can
achieve.

Will there be a tool that people can download?

Yes – we’re currently working on releasing one soon.

When?
I’m currently porting the software to the newest libraries (which
unfortunately means reimplementing lots of stuff from scratch) but in
a few months time we aim to release a linux-based demo which will
hopefully be followed by a windows based demo after that.