Notes about vision

August 6, 2012 permalink


Hmm, @harthvader has written some impressive neural network, machine learning, and image detection stuff, shared on her GitHub — wait, she’s combined these things into a JavaScript cat-detecting routine?! Okay, that wins.

var cats = kittydar.detectCats(canvas);
console.log(“there are”, cats.length, “cats in this photo”);
// { x: 30, y: 200, width: 140, height: 140 }

You can try out Kittydar here.

(Via O’Reilly Radar)

June 27, 2012 permalink

Google X Cat Image Recognition

The Internet has become self-aware, but thankfully it just wants to spend some time scrolling Tumblr for cat videos. From the NY Times, How Many Computers to Identify a Cat? 16,000:

[At the Google X lab] scientists created one of the largest neural networks for machine learning by connecting 16,000 computer processors, which they turned loose on the Internet to learn on its own.

Presented with 10 million digital images found in YouTube videos, what did Google’s brain do? What millions of humans do with YouTube: looked for cats. The neural network taught itself to recognize cats, which is actually no frivolous activity.

(Photo credit: Jim Wilson/The New York Times)

July 15, 2011 permalink

Simulated Heat Mapping for Computer Vision

A new approach to computer vision object recognition: simulated heat-mapping:

The heat-mapping method works by first breaking an object into a mesh of triangles, the simplest shape that can characterize surfaces, and then calculating the flow of heat over the meshed object. The method does not involve actually tracking heat; it simulates the flow of heat using well-established mathematical principles, Ramani said. …

The method accurately simulates how heat flows on the object while revealing its structure and distinguishing unique points needed for segmentation by computing the “heat mean signature.” Knowing the heat mean signature allows a computer to determine the center of each segment, assign a “weight” to specific segments and then define the overall shape of the object. …

“A histogram is a two-dimensional mapping of a three-dimensional shape,” Ramani said. “So, no matter how a dog bends or twists, it gives you the same signature.”

In other words, recognizing discrete parts (like fingers or facial features) of an object in front of the camera should be much more accurate with this approach than with older techniques like simple edge detection. Uses for real-time recognition are apparent (more accurate Dance Central!), but it seems like this would also be a boon for character animation rigging?

(Via ACM TechNews)

June 24, 2011 permalink

Herman Melville on the Nature of Color

From Moby Dick, chapter 42, “The Whiteness of the Whale”:

Is it that by its indefiniteness it shadows forth the heartless voids and immensities of the universe, and thus stabs us from behind with the thought of annihilation, when beholding the white depths of the milky way? Or is it, that as in essence whiteness is not so much a color as the visible absence of color, and at the same time the concrete of all colors; is it for these reasons that there is such a dumb blankness, full of meaning, in a wide landscape of snows – a colorless, all-color of atheism from which we shrink? And when we consider that other theory of the natural philosophers, that all other earthly hues – every stately or lovely emblazoning – the sweet tinges of sunset skies and woods; yea, and the gilded velvets of butterflies, and the butterfly cheeks of young girls; all these are but subtile deceits, not actually inherent in substances, but only laid on from without; so that all deified Nature absolutely paints like the harlot, whose allurements cover nothing but the charnel-house within; and when we proceed further, and consider that the mystical cosmetic which produces every one of her hues, the great principle of light, for ever remains white or colorless in itself, and if operating without medium upon matter, would touch all objects, even tulips and roses, with its own blank tinge – pondering all this, the palsied universe lies before us a leper; and like wilful travellers in Lapland, who refuse to wear colored and coloring glasses upon their eyes, so the wretched infidel gazes himself blind at the monumental white shroud that wraps all the prospect around him. And of all these things the Albino Whale was the symbol. Wonder ye then at the fiery hunt?

June 23, 2011 permalink


Research continues on whether humans (and other animals) have the ability to perceive magnetic fields:

Many birds have a compass in their eyes. Their retinas are loaded with a protein called cryptochrome, which is sensitive to the Earth’s magnetic fields. It’s possible that the birds can literally see these fields, overlaid on top of their normal vision. This remarkable sense allows them to keep their bearings when no other landmarks are visible.

But cryptochrome isn’t unique to birds – it’s an ancient protein with versions in all branches of life. In most cases, these proteins control daily rhythms. Humans, for example, have two cryptochromes – CRY1 and CRY2 – which help to control our body clocks. But Lauren Foley from the University of Massachusetts Medical School has found that CRY2 can double as a magnetic sensor.

Vision is amazing, even more so when you take into account the myriad other things that animals and insects can detect beyond just our “visible” EMF spectrum. See also: box jellyfish with their surprisingly complex (and human-like) set of 24 eyes.

September 20, 2010 permalink

iPad Light Paintings

This film explores playful uses for the increasingly ubiquitous ‘glowing rectangles’ that inhabit the world.

We use photographic and animation techniques that were developed to draw moving 3-dimensional typography and objects with an iPad. In dark environments, we play movies on the surface of the iPad that extrude 3-d light forms as they move through the exposure. Multiple exposures with slightly different movies make up the stop-frame animation.

We’ve collected some of the best images from the project and made a book of them you can buy:

Read more at the Dentsu London blog:
and at the BERG blog:

From Dentsu London, Making Future Magic:

We use photographic and animation techniques that were developed to draw moving 3-dimensional typography and objects with an iPad. In dark environments, we play movies on the surface of the iPad that extrude 3-d light forms as they move through the exposure. Multiple exposures with slightly different movies make up the stop-frame animation.

Take that, Picasso.

June 11, 2010 permalink

Iphone Resolution

Phil Plait of Bad Astronomy lucidly explains display resolution, clearing up arguments about the iPhone 4’s retinal display technology:

Imagine you see a vehicle coming toward you on the highway from miles away. Is it a motorcycle with one headlight, or a car with two? As the vehicle approaches, the light splits into two, and you see it’s the headlights from a car. But when it was miles away, your eye couldn’t tell if it was one light or two. That’s because at that distance your eye couldn’t resolve the two headlights into two distinct sources of light.

The ability to see two sources very close together is called resolution.

DPI issues aside, the name “retinal display” is awfully confusing given that there’s similar terminology already in use for virtual retinal displays

January 17, 2010 permalink

John Balestrieri’s Generative Painting Algorithms

John Balestrieri is tinkering with generative painting algorithms, trying to produce a better automated “photo -> painting” approach. You can see his works in progress on his tinrocket Flickr stream. (Yes, there are existing Photoshop / Painter filters that do similar things, but this one aims to be closer to making human-like decisions, and no, this isn’t in any way suggestive that machine-generated renderings will replace human artists – didn’t we already get over that in the age of photography?)

Whatever the utility, trying to understand the human hand in art through code is a good way to learn a lot about color theory, construction, and visual perception.

(Via Gurney Journey)

December 22, 2009 permalink

Magician Marco Tempest Demonstrates a Portable AR Screen

Magician Marco Tempest demonstrates a portable “magic” augmented reality screen. The system uses a laptop, small projector, a PlayStation Eye camera (presumably with the IR filter popped out?), some IR markers to make the canvas frame corner detection possible, Arduino (?), and openFrameworks-based software developed by Zachary Lieberman. I really love this kind of demo – people on the street (especially kids) intuitively understand what’s going on. This work reminds me a lot of Zack Simpson’s Mine-Control projects, especially with the use of cheap commodity hardware for creating a fun spectacle.

(Via Make)

September 12, 2009 permalink

Binocular Diplopia and the Book of Kells

How did reclusive monks living in the year 700 or 800 AD draw the intricate lines of the Book of Kells, rendered by hand at sub-millimeter resolution (about the same level of detail as the engraving work found on modern money), hundreds of years before optical instruments became available, hundreds of years before the pioneering visual research of Alhazen? According to Cornell paleontologist John Cisne’s theory, their trick was in the detail and pattern: by keeping their eyes unfocused on the picture plane, the monks could superimpose their linework and judge the accuracy against the template using a form of temporary binocular diplopia (sort of like willing yourself to view a stereograph or one of those Magic Eye posters).

That’s amazing.

August 1, 2009 permalink

The Timing of Blinks

Update 11/3/2009: RadioLab did a short piece in October on this phenomenon, even discussing the Mr. Bean test with the Japanese researchers:


From recent research out of Japan: “The results suggest that humans share a mechanism for controlling the timing of blinks that searches for an implicit timing that is appropriate to minimize the chance of losing critical information while viewing a stream of visual events.” In simpler words, the researchers found that audiences watching movies with action sequences have a strong tendency to synchronize their blinking so that they don’t miss anything good.

I’m not sure that this is interesting in and of itself, but it’s, um, eye-opening to think that we have our eyes closed for nearly 10% of our waking life. That’s roughly 10 full minutes of every movie lost to blinking. I imagine that editors already take this phenomenon into account, at least to some extent?

Full text available available in the Proceedings of the Royal Society B – Biological Sciences. Thanks, Creative Commons!

(Via NewScientist)

May 27, 2009 permalink

Real Time Object Recognition on a Mobile Device

Real-Time Object Recognition on a Mobile Device. I’ve seen this done for product lookups like books and boxes of cereal at the store, but hadn’t considered the accessibility implications. Not a bad idea, assuming that it produces valid information most of the time. Also seems like it would be limited to objects of a specific scale?