In all my books, one of the key arguments revolves around the routinely spectacular resolution of the human eye-brain system, which then in turn leads to the idea that our displays of evidence should be worthy of human eye-brain system. This is, for example, the conclusion of sparkline analysis in Beautiful Evidence, where the idea is to make our data graphics at least operate at the resolution of good typography (say 2400 dpi).
Here is a link to a press-release summary account of an article in Current Biology (July 2006) by Judith McLean and Michael A. Freed, from the University of Pennsylvania School of Medicine, and Ronen Segev and Michael J. Berry III, from Princeton University. The research suggests that the human retina transmits data to the brain at the rate of 10 million bits per second, which is close to an Ethernet connection!
Looking around the world is easier than analyzing evidence displays, and there may also be within-brain impediments to handling vast amounts of abstract data, but at least the narrow-band choke point for information resolution should not be the display itself.
The average PP slide contains 40 words, which take less 10 seconds to read. Call that 1000 bits per second, which comes to 1/10,000 of the routine human retina-brain data capacity.
Also most of our evidence displays are in flatland, which is a easier than 3D perceptual tasks. On the other hand, many serious data displays are not in the familiar 4D space/time coordinate system that our eye-brain knows so well.
Memory problems can be partly handled by high-resolution displays, so that key comparisons are made adjacent in space within the common eyespan. Spatial adjacency greatly reduces the memory problems associated with making comparisons of small amounts of information stacked in time (PP slides, for example).
-- Edward Tufte
While PowerPoint is surely a horrid way to transmit information, I'm not sure we can inject very abstract information into people at ethernet rates. 40 words in 10 seconds doesn't translate to 1000 bits per second transmitted over the optic nerve, which connects the retina to the banks of the calcarine sulcus in the occipital lobe, via the optic chiasm and the lateral geniculate nucleus. At a minimum the data being transmitted would require an analysis of the typography's geometry (edge detection being a basic function of the retina), the amount of the visual field taken up by the display, the location of the display's image on the retina relative to the fovea, and the rates of change in the display and surrounding motion (the speaker, other audience members, etc).
Your guesstimate of 40 words in 10 seconds leads to a 240 word-per-minute reading speed. Like normal readers, braille readers can read at 200 to 400 words per minute. Is there any evidence that a person with an aquired partial nerve blindness also aquires an impaired ability to reason spatially? My classmates at Tulane Med found they preferred listening to the lecture audio I recorded (see the (audio) links for Spring 2006) at one-and-a-half speed, which also pushes close to 200 words per minute. Most people found twice-speed to be uncomfortably fast. This 200, 240, 400 word-per-minute rate may be a more accurate definition of the rate at which the human mind can receive and abstract information in word form, and this is likely driven by communication between Broca's area and Wernicke's area via the arcuate tract. Keep in mind, reading is a highly abstract function. Babes learn to speak as they learn to interpret what they hear and form their vocabulary. Indeed, Kuhl et al from the University of Washington published findings last week in NeuroReport of activity in Broca's area and Wernicke's area becomes synchronized during the first year (learnt via NPR). Toddlers then dedicate their alphabet to memory, learn to form words in their mind by matching what they see with their mental images of the letters, often by saying them out loud and blending the sounds together, and concommitently starting to memorize common word forms, like their names, as a sort of super-alphabet. That's the left side.
Mountain bike racing in the woods is probably a good speed test for the right cerebral hemisphere's ability to interpret incoming visual data. The entire scene is certainly changing much more quickly, and this is likely recruiting as much of the optic nerve's "bandwidth" as possible. I wouldn't be at all surprised if the bit rate exceeded 106 bits per second, but the degree of abstraction is much lower. "Go there" and "Don't hit the tree" is about all that's necessary. Indeed, one of the most valuable things a racer can do to improve race day lap times is to pre-ride the course the day before. Pre-riding is not nearly such a big deal in road racing.
Where do visual evidence presentations fall in this? I for one have never been in a race and bothered to compare the catenary of a treacherous vine with the theoretical hyperbolic secant, nor is speech much of an issue after the start. Back in civil society, however, any 2D graphic is instantaneously captured in the mind's eye. A projection of 3D more slowly, but still very quickly. There comes a challenge though in asking people to find out what the axis labels are and interpret what they mean. How much of any science student's time is spent memorizing what variables the Greek and Roman characters represent in their particular field? The pursuit of common and standard placement in your work (horizontal words, hang the Y label of the top of the axis like a flag, etc) certainly goes to this challenge, but the fact remains that there is no one overwhelmingly accepted architecture for visual evidence displays, even for the simple graph. Any given reader faces an analytic graphic as a rather loose jumble of sticks and glyphs and bears an unfortunate burden of comparing them to a rather loose model, then reasoning about them. Prose faces people with a lesser burden, that of literacy, and brings with it an overwhelmingly accepted architecture — word, clause, sentence, paragraph. The analytic graphic carries an inherently higher burden. The graph, with all its degrees of abstraction (words, equations, the data, trend lines, etc) represents an extraordinary challenge that demands recalling linguistic and non-linguistic memories, comparing them to the scene falling on the retina, and reasoning about both the stuff before you and the difference between that stuff and the perfect stuff of memory. This requires not only comparing abstractions like words and drawings, which requires communication across the hemispheres via the corpus callosum, but also extraordinarily complex relays with the frontal cortex; in some cases the mere interpretation of an analytic graphic reaches as far as free will, which Francis Crick and at least one of his colleagues at Scripps believed to be located in the anterior cingulate gyrus, Brodmann area 24.
None of this is an apology for PowerPoint, only a minor point about which bit rates can be compared.
Is there any statistic or theory or number that illuminates how people absorb information
visually vs. through words? ie; information retention via visuals is X% more effective than
through words/sentences/bullet points? I know it will depend upon the visual and the words;
i'm just looking for an estimate or hypothesis. Any help is apprecitated.
Early research showed that the eye is an 80% receptor to the brain vs other sensors. German researchers created a visual discussion method to allow both visual criteria to be applied together with structured written verbal expression to support the spoken word. The method allows many persons to provide input simultaneously and the thinking structure is managed by a discussion butler. The problem with PP and many other displays are that they work to the adage 'the medium is the message' rather than allowing relationships to steer the matter. Interaction is one way. Participants sit in a darkened room that hinders visual relationships with other viewers (possibly on purpose if autocratic presentations are acceptable). One of the charms of ET's work is his use of 'context' to provide visual variety of interpretation. Who else would put sheep with rocks and metal objects to assist new insights?
The question then is, how well do we employ our viusal senses and enmesh them with wisdom of interpretation?
In response to Peter Shier's question, there have been a number of experiments that try to
demonstrate that retention of information provided graphically is greater than that of
information provided through text alone. (One example is Butler and Mautz, "Multimedia
Presentations and Learning," Issues in Accounting, Fall 1996). The problem with this and
other similar research from a visual display perspective is that they tend to lump all non-
textual display under the label of "multimedia," and information density is never a variable
that is measured and reported. Also, the actual visuals used are almost never included in
the journal articles; the few that are, if they are indicative of the rest, are worrisome: they
tend to include the worst kind of PowerPoint "Phfluff", and so the experiemental results are
(3) The morphology of an embodied system can have significant effects on its information processing capacity. We tested the hypothesis that sensor morphology (here, the arrangement of photoreceptors in a simulated retina) influences the flow of information in a sensorimotor system.
The last point in the previous paragraph supports the notion of a quantitative link between the morphology of the retina and a computational principle of "optimal flow of information." Given a fixed number of photosensitive elements, their space-variant arrangement maximizes the information gathered, even more so in a system engaged in a sensorimotor interaction, e.g., foveation behavior. If the photoreceptors were uniformly distributed in the retina, those in the periphery would be underutilized; also, fewer photoreceptors would be in the fovea, yielding (on average) lower spatial resolution, and resulting in less accurate estimates of object locations. Such non-uniformity at the receptor level is mirrored by non-uniformity at the cortical level in a topology-preserving fashion, that is, nearby parts of the sensory world are processed in nearby locations in the cortex. There has been some work on deriving such topology-preserving maps through the principles of uniform cortical information density  and entropy maximization . We argue here that in a sensorimotor system, the rate of information transfer is maximized at the receptor stage if the probability distribution of target objects on the retina is adapted to the local photoreceptor density (a morphological property), and that this can be achieved through appropriate system-environment interaction, e.g., foveation, saccades, or adequate hand movements . A further implication of our findings relates to the possible role of early visual processing for the learning of causal relationships between stimuli. It has been shown, for instance, that the receptive fields of retinal ganglion cells produce efficient (predictive) coding of the average visual scene [17,46]. We propose that such coding also depends on the local arrangement of the receptors and on the spatial frequencies encountered during the organism's lifetime.
In conclusion, our results highlight the fundamental importance of embodied interactions and body morphology in biological information processing, supporting a conceptual view of cognition that is based on the interplay between physical and information processes.
One implication from studies like this: In normal reading the eye is moving rapidly over text, which creates motion-like pattern-matching activities in the retina, an example of the eye processing information before sending it to the brain. The large text of PowerPoint bullet points in the small space of a slide may inhibit the mind's natural uptake of the information because the few words on the screen are held in a space that the fovea can handle without eye or body movements like "foveation, saccades, or adequate hand movements"
As an aside, Firefox 2.0's spell checker doesn't recognize sensorimotor, photoreceptor, foveation, saccades, or the HTML tag blockquote, but it knows both Ps in PowerPoint are supposed to be capitalized.
Does this question fit here? Do you think about people with eye-brain connections that have been injured or are completely non-functional, but who still have huge cognitive hunger to get at and give out complex information? Have you or others explored the problem of providing beautiful evidence by means other than through the eye? I attended your workshop recently to further my design thinking, but found myself instead thinking over and over again about applications and implications of your principles in terms of my son with physical disability and cortical visual impairment. The difficulty with visual memory is one thing; try relying on purely auditory memory to categorize, retrieve and make sense of complex information. Now that's a task. I pictured scatter plots as jazz-like compositions; sparklines as played on a theremin (the creepy horror movie instrument by which one can vary pitch and volume by moving one's hands) or synthesizer. Sparklines, overlaid for comparison, with varied tonal quality or instrumentation perhaps, or trends revealed via the complexity and beauty of Bobby McFerin's "Voicestra" vocal orchestra. Surely there are plenty of scientists, students, CEOs with macular degeneration or blindness that need to and desire to grasp rich and complex evidence by ear or by other means just as beautifully. The task of translating visual evidence to a "hi-res" auditory (ear-brain) experience that could be understood as readily and as beautifully as we would wish is a problem that begs for solutions, don't you think?
Adrian Perrig and Dawn Song wrote a paper in 2000, Hash Visualization: a New Technique
to improve Real-World Security. Their thinking was that the human eye-brain system is much more adapted to remembering structured images than strings of random characters, the usual representation of an encryption key, so images generated from keys might be sufficient for rapid visual confirmation that a key is trustworthy. Nothing beats confirming the alphanumeric key character for character, but it is a time-consuming task and many users completely bypass this step, creating substantial opportunity for man-in-the-middle attackers. The images would be an intermediate good-enough check for the typical person using online banking.
There are other ways to quickly assess a public key. A checksum of the key, called the fingerprint, is the most common; and some people will even truncate that down to just examining the first and last characters or the first 8 characters, something like that.
Perrig and Song specifically propose generating images with the Random Art algorithm: the binary public key is used as the seed for a randomly created function, and the image is generated by sampling the function. If you want a 100x100 image, you sample the function 10,000 times.
So, these images are generated from less than 10 million bits of data, the largest public keys are around 65 thousand bits, but the images could conceivably be inflated to 10 million bits by simply sampling the function 10 million times, which would create a roughly 3000 x 3000 pixel image, smaller than the output of a Nikon D2. 10 million bits can be even smaller if you add color as a dimension.
It is the structure of the function that the eye-brain system must evaluate. It seems to me the question is how much structure can the eye-brain system can parse, given a reference image and a challenge image and the implicit assumption that this should take about a second.
Some readers may realize their banks are already using images as an independent source of confirmation, but those are typically strongly metaphorical photographs of lions, houses, etc. This is rather different.
I found out about this 8 year old paper because OpenSSH 5.1 was released today and it includes image generation as a non-default option for system administrators to start using in the wild. Since the code is open-source, if admins find it effective, then people may start to change how they authenticate over the internet in the next few years. It may even lower the comfort barrier to acceptance of the current holy grail of Internet authentication, OpenID.
A visitor to this forum, Angela Morelli, asked me by email why we understand numbers and graphs differently. My response became a more thorough version of what is posted above.
People interpret numbers and graphs differently because they are handled differently in the brain. Numbers are generally handled by the verbal linguistic system and graphs are handled by both the non-verbal linguistic system and the limbic system. The bit rate of the visual system is about 10 million bits/second (see the first post in this thread). The rate of reading, listening, braille, typing, maxes out at around 150-400 words per minute. To understand how this works, and provide a foundation for further reading, a *very* brief review of the relevant neuroscience seems in order.
Visual processing begins in the retina with some very simple edge definition. Further edge definition occurs where the neurons of the optic nerve enter the brain at the lateral geniculate nucleus. The neurons synapse and new neurons run in the optic radiations from the LGN to the banks of the calcarine sulcus, where further edge definition and integration occurs. The calcarine sulcus is at the very posterior part of the cerebrum and represents the first time the visual information enters the cerebral grey matter. From there, the final elements of subconscious analysis and pattern recognition occur in the lingual gyrus and cuneus, which sort of wrap around the banks of the calcarine sulcus like concentric rings (heavily folded, of course). Everything up to and including this point is essentially image processing.
Conscious recognition starts to occur in the inferotemporal region, Brodmann's areas 37 and 7a. Lesions to these areas lead to what are called agnosias. Oliver Sacks' The Man Who Mistook His Wife for a Hat has a good example of an agnosia. In fact, most of the back half of the cerebrum (abaft your ears) that isn't involved in basic visual perception is involved in this kind of unimodal association. The other major exception is the angular gyrus and Wernicke's area.
Once the basic conversion to symbolic information occurs, information is routed based on type. Basic numeracy is handled by Brodmann's area 39 in the angular gyrus of the parietal lobe, just slightly above Brodmann's area 37 where the numbers were recognized as numbers. The syntactic region of the brain, Wernicke's area, exists very close by, and can be considered to involve the angular gyrus. A stroke to Wernicke's area leads to expressive aphasia. The patient has access to their entire vocabulary and will speak words clearly, but can't understand and can't compose syntactically correct thoughts. This is commonly described as a "word salad". Wernicke's area is where mathematical training or computer science training trains new syntactic structures. A physicist, in some ways, can think thoughts a non-physicist can't. If Noam Chomsky'ssyntactic structures exist, they are mainly constructed in Wernicke's area. Now, Wernicke's area is still in the back half of the brain. It is evolutionarily older than Homo sapiens. And, indeed, apes and even dogs, rats, cats, and insects can construct syntacticly different sequences of sounds.
Where we really start to diverge from other species is in Broca's area, which structurally lies in the relatively new prefrontal cortex and consists of the Pars triangularis and Pars opercularis. Functionally, Broca's area contains the dictionary. A person with a stroke in Broca's area can, with great difficulty, construct sentences, but they have profound word-finding problems which get worse with stress. And they're always stressed out because they can't find the word! Between Wernicke's area and Broca's area is the arcuate tract, a superhighway of axons committed to carrying information between the neurons of Wernicke's area and Broca's area. Damage to the arcuate tract results in a person who can understand and can speak, but can't hear what you say and then formulate a reply. In higher math, it is the left-sided verbal linguistic system that is involved in equations. In basic number recognition, it is simply the angular gyrus that is involved. This whole system is essentially verbal and exists on the left side. A non-verbal, musical, spatial, temporal, inflection-oriented corollary system exists on the right side.
Someone with a right-sided stroke can communicate, but may have inappropriate responses because they can't understand, interpret, or compose the "how you say it". They also have difficulties interpreting space and, perhaps, time, as they tend to have attention disorders. Interestingly, mathematicians are commonly interested in music and often find spatial expressions of their mathematics particularly appealing, suggesting a high degree of integration between their verbal and non-verbal language centers. And, similar to the syntactic function of Wernicke's area, a musician can compose syntactic structures a non-musician may have a hard time understanding.
As an aside, I am inclined to wonder if general intelligence is an emergent property of our neurons in a way that is similar to how a Turing complete programming language can emerge from Church numerals and lambda calculus. One could think of a neuron as an atom, and two neurons connected by a synapse as a list.
Regardless of how many bits go between Broca's area and Wernicke's area over the arcuate tract, that word-per-minute limit is still representative of how fast people can cogitate about abstract things like numbers. Obviously 150-400 words per minute is much slower that 10 million bits/second. However, the trained mind can handle numbers faster than the untrained mind, and scanning a table of numbers for the high and low and getting a feel for the median and the nature of the distributions and the relationships between variables can be done quite quickly without cogitating too explicitly about each number. One need not advance every number to the level of free will to appreciate that some are bigger than others.
Outwardly the difference between computer graphics processors and general purpose central processors is very similar to the difference between the visual system and the verbal or non-verbal linguistic systems. At the nVidia08 conference, Mythbusters provided a very nice demonstration of the difference between manipulating a single-threaded general processor (analogous to the linguistic system composed of Broca's area, Wernicke's area, and the arcuate tract) to render an image, and a function-specific system like a graphics processor or the visual system. While faster general processors are desirable, their power is not in their speed. Their power is in their generality, their ability to deal with any abstract issue and, potentially, make value judgments and exert free will. The mythbusters's CPU illustration, for example, could also be used to pick things up, or whatever else a robotic arm can be made to do.
Judgment, free will, and risk analysis occur on two levels. Judgment and free will occur in the very newest part of the cerebrum, the anterior cingulate gyrus, roughly Brodmann area 24. Recent findings suggest this is the last part of the brain to mature, at about the age of 25. I guess the car insurance companies know what they're talking about :-) Risk analysis can occur in this area, but, as we all know, the average bee can also conduct risk analysis. This reflexive risk analysis occurs in the limbic system, part of the reptilian brain that underlies our newer cerebrum. marketeers like graphs with a positive slope because society teaches us over and over the a positive slope is good, income-positive, growing, whatever. It's the repetitive association with primal desires that causes the positive slope to be imprinted on the reptilian brain. Me like up and right! Can I has cookie? Valuations that are repeated, over and over, are more likely to be imprinted, branded in the limbic system's primordial type of memory. Commercial television really is all about holding your eyes still while they spray your brain with advertising. Imprinting those brands on the limbic system. Over and over. Until the reptilian brain learns. The great thing about this, for marketeers, is that the reptilian brain keeps us alive in a lot of situations so it tends to get privileged access to information so it can respond very quickly, when necessary, or at least when certain preconditions are met. Marketeers have leveraged those preconditions to train the imprinted, branded, conditioned reptile to pick things up off the shelf before the human brain intervenes.
What we have then is an input system, the visual system, that provides input to multiple analytical systems. We are, at a minimum, a multi-core processor. Understanding numbers and thinking about math and interpreting graphics and making value decisions requires the newest and most complex parts of our brain, but there is a very real possibility of sending the information to the wrong system, the reptilian system. The association (positive slope)==(good), in any human who grew up in modern society, can be safely assumed to be imprinted in the reptilian brain. Unfortunately, even very educated and successful people may be are usually susceptible to such simple tricks.
So, making graphics to represent numbers can be tricky work, and society has rewarded ET and others for grappling with the problem. There are some books on how to think about these problems :-) If you want to evoke things that the reptile values, like hunger and fear, then activate the reptile. If you want things that are assigned by prefrontal centers, like credibility, reputation, and respect, then you should try to activate the prefrontal centers, and providing numerical information is one way to do that. If you need graphs because there are too many numbers, then you should make sure those graphs activate the non-verbal linguistic system: they should carry a fair bit of information, describe multiple variables,, prompt further decision-making by the prefrontal cortex, cite your sources, etc.
1The neuroanatomy for this post was checked against DE Haines, Fundamental Neuroscience for Basic and Clinical Applications, 3rd Ed, Elsevier 2006, pp 518-522.
this reference made me think of how I could make a Petabyte more understandable.
In digital data terms a petabyte is a lot of data. 1 PB = 1,000,000,000,000,000 B = 1015 byte. Assuming a byte is 8 bits then a petabyte is 8 x 1015 bits.
According to this paper, Google processes more than 20 Petabytes of data per day using its MapReduce program.
According to Kevin Kelly of the New York Times, this reference, "the entire works of humankind, from the beginning of recorded history, in all languages" would amount to 50 petabytes of data.
These are all difficult to understand as they are abstract. So I tried to find a way of understanding what a Petabyte is in terms of an individual human being. From the paper you refer to here we can estimate that the human retina communicates with the brain at a rate of 10 million bits per second or 106 bits per second. This sounds pretty impressive.
How long does it take a human eye-brain system to move a petabyte of data (assuming that you could keep your eyes permanently open so that you are getting your full 10 million bits per second).
By my calculations a year is 3.15 x 107 seconds. This means a total amount of data per year from retina to brain of 3.15 x 1013 bits. Dividing 8 x 1015 by 3.15 x 1013 we get 254 years. This is a long time to keep your eyes open!
If we take a normal human life to be the biblical standard of Psalms 90: The days of our years are threescore years and ten, then a normal human creates about 0.27 petabytes in their life.
We could also define a brand new unit, the PetaBlife, with a symbol ℘ which is the number of standard human lifetimes required for a human retina to make a PetaByte of data.
I have been reading up on the evolution of eyes and vision. I stumbled across the work of Prof Russell Fernald who is at Stanford University (http://www.stanford.edu/group/fernaldlab). From a paper by him published in Current Opinion in Neurobiology 10(4): 444-50 in 2000 the following profound statement made a big impression on me;
"Light has probably been the most profound selective force to act during biological evolution. The 10^15 sunrises and sunsets that have taken place since life began have led to
the evolution of eyes which use light for vision and for other purposes including navigation and timing."
Dear Professor Tufte,
I eagerly look forward to your analysis of Apple's new iPhone
4 particularly the Retina
Display they are branding. I can't help but wonder who at Apple has been following this discussion thread
which you initialized several years ago?
Some people who have apparently held and used an iPhone 4 are making comments like these:
The resolution of the "retina display" is as impressive as Apple boasts. Text renders like high quality print.
It's mentioned briefly in Apple's promotional video about the design of the iPhone 4, but they're using a new
production process that effectively fuses the LCD and touchscreen -- there is no longer any air between the two.
One result of this is that the iPhone 4 should be impervious to this dust-under-the-glass issue. More importantly,
though, is that it looks better. The effect is that the pixels appear to be painted on the surface of the phone; instead
of looking at pixels under glass, it like looking at pixels on glass. Combined with the incredibly high pixel density,
the overall effect is like "live print".
What might text and sparklines look like on a "Retina Display"? I can't wait for your own hands-on review of iPhone
4 and also the iPad.
There is a growing debate about the resolution of the new iPhone and how it compares to the eye.
Here are some highlights:
Raymond Soneira : on Wired
1. The resolution of the retina is in angular measure - the accepted value is 50 Cycles Per Degree.
A cycle is a line pair, which is two pixels, so the angular resolution of the eye is 0.6 arc
minutes per pixel.
2. So, if you hold an iPhone at the typical 12 inches from your eyes it would need to be 477 pixels
per inch to be a retina limited display. At 8 inches it would need to be 716 ppi. You have to hold
it out 18 inches before the requirement falls to 318 ppi. The iPhone 4 resolution is 326 ppi.
Phil Plait : on Discover
Let me make this clear: if you have perfect eyesight, then at one foot away the iPhone 4's pixels are
resolved. The picture will look pixellated. If you have average eyesight, the picture will look just fine.
Alfred Lukyanovich Yarbus (1914 -1986) was a Russian psychologist who made a number of seminal studies of eye movements. Many of his most interesting results were published in a book, translated into English and published in New York in 1967 as Eye Movements and Vision. This book is now out of print but you can find PDF copies to download.
I first saw some of Yarbus' data about 13 years ago as scratchy black and white scans from the book.
One of the most compelling of Yarbus' experiments was an eye-tracking study he performed where he asked subjects to look at a reproduction of a Russion oil painting An Unexpected Visitor painted by Ilya Repin in 1884.
Yarbus asked the subjects to look at the same picture in a number of different ways, including;  examine the painting freely.  estimate the material circumstances of the family.  assess the ages of the characters  determine the activities of the family prior to the visitor's arrival.  remember the characters' clothes. And  surmise how long the visitor had been away from the family. What is brilliant is that the eye-tracking traces recorded by Yarbus showed that the subjects visually interrogate the picture in a completely different way depending on what they want to get from it.
Cabinet Magazine (Issue 30 The Underground Summer 2008) has a piece by Sasha Archibald called Ways of Seeing that takes the original eye-tracking traces from Yarbus' book and superposes them on a colour reproduction of the painting.
This is the first time I have seen this done. The originals in the book by Yarbus are disembodied eye-tracking traces laid out near to, but not overlaying, the reproduction of the Repin painting. These new overlays by Archibald are worth comparing. Here is (left) the original image (middle) free examination and (right) what the subject did when asked to estimate the material circumstances of the family.
Example of industrial supplier using retinal tracking
A major industrial supplier asked me to participate in a study of their web interface.
After an interview, they sat me before a monitor and handed me about six different objects to find to find on their website. One object was a small plastic pipe fitting, and I remember a a couple of fasteners. A tiny web cam atop the monitor tracked my eye movements as I negotiated the site and found the products. They were testing frames, as I recall.
Based on my compensation, this testing is expensive, but the quality of their website shows.