Skip to Content
 
Logo of Marquette University BIEN 167 Module 2 Sensorimotor

Mode of Vision, and Process of Seeing

Mod 2 Info Proc Seeing Hearing Positioning Touching Integrating Usability

 

Mode of Vision, and Process of Seeing

Gift of Seeing

  • "Vision" as an active process
  • Seeing as an ability

Optics of the Eye

  • Physics of light through the eye
  • Relevant anatomy of the eye
  • Correctable dysfunction and optical accommodations
  • Neuromuscular Control Systems Related to Optics (accommodation, pupil)

Sensing and Vision: From the Retina to the Perception of Vision

  • Cells of the retina: distribution of rods and cones, sensitivity, and spatial connectivity
  • Brain structural connectivity: lateral geniculate body, visual cortex
  • Neurocontrol systems for pupil and acuity accommodation
  • Perception of vision

Gaze Movements

  • Mechanics of eyeball and eye muscles
  • Neuromotor convergence to occulomotor neurons
  • Classic eye movements to sample environment
    • fast tracking - saccadic eye movements
      • optimal control problem
    • slow/moderate target tracking - smooth pursuit eye movements
    • near-far stereo focus: vergence eye movements & Herring's Law
    • coordinated eye-head gaze: vestibular-ocular-reflex (VOR)
  • Integrated use of these movements
    • intrinsic nature of movements: intrinsic behaviors in persons who are blind
    • relative eye and head contributions to stationary gaze
    • tracking fast targets:
    • scan paths: sensorimotor integration to actively (optimally?) sample pictures, etc.
    • eye movements as a "window to the brain" and to understanding certain disease

Video Codecs: Technology Decisions Behind Transmitting Video

  • Technical building blocks for digital representation: pixels, intensity, color
  • Motivations behind digital filters: spatial and temporal considerations
  • Types of video codecs for data compression
  • Classic standards for video (e.g., JPEG, MPEG, H.263), and future directions toward universal access and user preferences (e.g., MPEG-4,7)
  • Pragmatic considerations for videoconferencing and multimedia applications

Visual Impairments, Disability, and Access/Accommodation Strategies

  • Changes related to the aging process
  • Common with many disabilities and often not recognized (e.g., cerebral palsy, schizophrenia)
  • Common sensory impairments
  • Common sensorimotor impairments affecting interpersonal interaction
  • Augmentative Technologies for the Partially Sighted
  • Technologies that Replace Lost Function (e.g., for the Blind)
  • Environmental accommodation strategies
  • Technical strategies for enhancing access

Gift of Seeing

"TO SEE" means to have vision. Broadly applied, "vision" is an active process that includes both sensing and interpreting rays of light passing through two eyeballs. This gift of multi-dimensional sight is one that most individuals possess, but often to varying degrees due to sensory and/or motor impairment. In this section we focus on using sight as an ability, i.e., on the act of seeing. As we will see, the full capability of seeing involves a large number of abilities. We need to understand those abilities, and in some cases make environmental accommodations (such as lighting) that maximize abilities.

In this section we develop a scientific and technical foundation on the act of seeing, to a level that can be covered in one lecture period, and is then available as a resource for the remainder of the course.

Optics of Light, Anatomical Apparatus and Optics-Related Neuromuscular Control Systems

As you know, light rays are electromagnetic waves. Light rays refract (bend) when passing through a media of different densities, such as a curved surface (e.g., cornea) and a biconvex shape (lens). Light rays that are close to parallel when arriving at such structures can be, if the shapes are near-perfect, refracted to a point. This is called the principal focus, and the distance between the lens and the principal focus is called the principal focal distance. The typical unit of measure of refractive power, termed the diopter, is the reciprocal of this focal distance, in meters. A person with good vision and a typical size eyeball will have a refractive power of about 67 diopters at rest. Rays for closer objects are diverging upon entry, and will have a longer principal focus distance. This would seem a problem. But it's been solved: we have access to an effective accommodation control system (see below) that helps us maintain acuity via using muscles to cause subtle changes in the shape of the lens, resulting in up to 12 diopters of change in young individuals. Apparently nature has been fine-tuning eyeball optics for a while, given that the biological design seems optimal in the sense of having the principal focus, at any moment, be on or near the retina. Interestingly, this is a different strategy than a camera lens, where it changes the length to camera film; biology has also discovered this strategy, as some fish actually change the shape of their eyes rather than the curvature of their lens.

But our human biological system not always perfect, and often corrective lens (eyeglasses or contact lenses) are needed:

  • If the lens is too round (or the eyeball too long), the refractive power is greater and the principal focal distance will be less than distance between the lens and retina, and the principal focus will be within the eye. The person then has myopia (i.e., is near-sighted), and needs a bi-concave (slightly separating rays) lens for correction.
  • Alternatively, if the lens is too flat (or eyeball too short), the principal focus will be beyond the eye, and this person has hyperopia (far-sightedness), and needs greater refractive power and thus an additional biconvex lens for correction.
  • If the curvature is not uniform, with curvature different in different meridians, light rays are refracted differently, not converging to a point. The resulting blurred vision from such astigmatism is corrected, as much as possible, by glasses with different curvature along meridians.

There are two important neuromuscular control systems related to optics:

  • Accommodation control system for optimizing acuity. The lens is an transparent, elastic crystalline tissue that held in tension by the lens ligaments. Also connected to the lens capsule, in a unique structurally parallel arrangement anterior to the lens, are the ciliary muscles (including both circular and longitudinal muscle fibers). When this muscle complex contracts, the lens curvature (especially the anterior surface) becomes more round, thus helping to bring near objects into focus. This is a feedback control system in which the "system" includes sensing of image clarity through sensory processing areas of the brain responsible for the perception of vision. Temporal dynamics of this control system are on the order of seconds. JW side note: one of my graduate student friends at Berkeley did his doctoral dissertation research on this nonlinear control system, and I was a subject for some of his experiments. Of note is that the highly used ciliary muscle can fatigue when an individual focuses on a close object for a while. Also, the lens tissue hardens with age, in part because some cells gradually die the amount of possible accommodation decreases with age, about 2 diopters/decade after about 20 years of age. Because of this, the nearest point to the eye at which an object can be brought into clear focus by accommodation recedes from about 9 cm at age 10 to about 83 cm at age 60. Difficulty with reading and close work, often requiring correction by about age 45, is referred to as presbyopia - it is corrected by wearing convex lenses.
  • Pupil neuromuscular control system for setting aperture. There is another intriguing neuromuscular control with roots in optics. So far we have focused on location of a light source, but light also has a frequency spectrum and an intensity. Intensity can vary dramatically, and the the key aim of the pupillary "light reflex" control system is to regulate this intensity by changing the entry aperture (pupil diameter). For instance, an intense light will cause dramatic increases in impulses in optic fibers, which initiate pupillary responses in the pretectal region and the superior colliculi within the brain. Signals to the oculomotor nuclei result is constriction of the pupil (i.e., the black hole in the middle). The actuator for this feedback control system is housed by the iris, the colored portion of the eye that contains both circular muscle fibers that constrict the pupil, and radial fibers that dilate the pupil. The range of change is from about 1.5 mm to 8.5 mm, thus allowing a fluctuation in area of roughly 30-to-1. But the biological solution here takes an interesting twist, as do the experiments that can be used to help evaluate this system. First, the pupil reaches a threshold in terms of how small it can get, and thus really small beams of varying intensity can be used to drive the system in an "open loop" mode, causing nonlinear behavior such as oscillations. JW side note: A very close friend and roommate at UC Berkeley modeled these nonlinear dynamic effects for his doctoral dissertation. The other quirk is that there is asymmetry in the muscle drives, with the sympathetic nervous system responsible contracting the iris muscle; this helps explain why the pupil often dilates under emotional situations or with certain chemical balances.

Thus the above discussion shows tight ties between biological control systems and optics for both acuity and light intensity. Another key quality of light - color - is dealt with via photoreceptor cell sensitivity, as will be seen in the next section.

Sensing/Vision: From the Retina to the Perception of Vision

In the previous web page we studied eye optics, including several neuromuscular control systems that help focus light rays and regulate light intensity. These systems subserve the retina, which is where we will start as we investigate the sensory mechanisms responsible for visual perception. The material on this web page could easily be the subject of a 3-credit course (e.g., one taken by this instructor while a graduate student at UC Berkeley).

Our aim is to briefly summarize the process of vision, from a "systems" perspective. You are encouraged to augment this summary with illustrations of the eye anatomy, found in any good physiology book. If you prefer, good web summaries of the basic anatomy and function include those at The Vision Channel and at www.tedmontgomery.com.

Processing at the Retina

We start by assuming that light arrives at the retina, a sheet of cells on the posterior part of the eyeball that extends nearly to the ciliary body. This sheet is organized into 10 layers, and includes sensory receptors (rods and cones) and four types of interneurons: bipolor, ganglion, horizontal and amacrine. The photosensitive rods and cones synapse with the bipolar cells, which in turn synapse with ganglion cells, which send out long axons that leave the eye via the optic nerve. Horizontal cells connect receptor cells to other receptor cells, while amacrine cells connect between ganglion cells. The details of retinal connectivity are well worked out, but beyond the scope of this class. But there are several items of special relevance for this class:

  • Spatial distribution of photoreceptors: most of the (color-sensitive) cones are packed in the middle "fovea" region, while the (more intensity-sensitive) rods are distributed throughout the retina, except in the fovea. In total, there are 6 million cones, 120 million rods, and 1.2 ganglion nerve fibers leaving the retina via the optic fiber.
  • Each foveal cone cell connects to a single bipolar cell which in turn connects with its own ganglion cell (and thus fiber in the optic nerve). This dedicated line provides a remarkable degree of acuity. The minimum visual angle threshold is about 1 second of arc. There are also three different types of color-sensitive cone receptors in primates: those responding maximally to 440 nm (blue-violet sensitive, or short-wave pigment), 535 nm (green, middle-wave pigment) and 565 nm (roughly yellow but also red, long-wave pigment). For comparison, along the frequency continuum blue is about 450-490 nm, green is about 490-575 nm, and red is about 650-720 nm. The end result is that there is a region of about 2 deg of arc (the macula, with the fovea at its center) where there is considerable higher spatial and color acuity, but only when there is sufficient light. The limits of color vision are about 60 deg to each side of the eye midline; however, it is not uniform for all colors (e.g., blue range is wider than red).
  • The rods have a light sensitivity threshold considerable lower than cones, do not have dedicated cell lines. Rather, many rods converge on each bipolar cell. Their photosensitive pigment is rhodopsin (visual purple), with peak sensitivity at 505 nm (blue-green). Rods enable us to sense brightness and locate objects outside of the fovea (but with less clarity), plus provide night vision. Notice that rod color sensitivity is especially far removed from spectrum for red, and thus reds (and yellows) are difficult to see in the dark. The overall range of the visual field is approximately 70 deg nasally and 100 deg laterally.
  • A good deal of image mapping occurs in the retina. Ganglion cells, for instance, provide a variety of receptive fields that range from circular regions with an excitatory center and inhibitory surround to cells that are sensitive to certain line orientations or colors.
  • The human eye can respond to a remarkable range of light intensity (luminance). Measured in milliamberts, rods that are adapted can sense intensities ranging from about 0.0000001 to 1, and cones from about 0.001 to 100,000,000. The region of overlap between the rods and cones is what we'd consider "normal" lighting.
  • The dynamics of light adaptation (mostly cones) takes about 5 minutes. Dark adaptation occurs in two phases, with the first phase of about 5 min attributed mostly the the cones, and the next phase that takes about an additional 15 min due to the rods.

This is a remarkably well-designed visual sensing and tracking system. For example, consider the ability of the rods and subsequent circuitry to locate objects in the periphery:

  • the frog typically captures and eats the fast-moving fly, which includes both recognizing the fly and anticipating its trajectory and speed, but will ignore bigger or smaller objects,
  • a child playing a fast-paced video game typically makes a pretty accurate fast movement to a new target identified in the periphery.
  • a kid will nearly run while hopping across rocks during a hike, using mostly peripheral vision to select the next location for a foot.

While processing for such exquisite capabilities starts in the retinal neurocircuitry, this is just the first stage of a systematic process.

Beyond the Retina

An optic nerve fiber that leaves the retina will either cross over the midline (if nasal) and target the contralateral lateral geniculate body, or will connect with the ipsilateral lateral geniculate body (if lateral). This neural structure, part of the thalamus, serves mostly as an important a relay "way station" that coordinates an integrated retinotopic spatial mapping of the two eyes. From here information passes on to other brain structures, mostly notably the occipital (visual) cortex.

The visual cortex possesses the 6-layer columnar structure that is common for cortical tissue. It is here that each fiber and various collections are fibers are processed in many ways. Receptive fields in the visual cortex can become remarkably selective, for instance to certain shapes traveling in certain direction. But this is just the beginning of the story, as there is a degree of imaging processing via neurocircuitry that remains, despite decades of study, mind-boggling to scientists. In particular, the robustness of pattern recognition is impressive. Consider, for example, that you can often recognize people and objects from may distances and orientations, including a friend after a haircut or a change in clothing. Also, the system is actively engaged in recognizing and classifying objects.

Summary

For the purposes of this class, a key observation is that the visual system enables a remarkable capacity to actively adapt to new settings and recognize persons and objects, if the frequency and intensity of the incoming light are within the sensitive ranges. But this ability is a function of many factors, ranging from optics to the effectiveness of physiological control systems.

There are ways for designing environments that are more accommodating for anybody, such as providing adequate lighting. There are also sometimes accommodations that can be made for specific persons with sensory dysfunction, but these start with an understanding the underlying sources of the visual dysfunction. "Seeing" is an active process, and before considering these sources, we will first develop an understanding of the eye movement and gaze control systems that are intricately intertwined with the sensory apparatus.

Gaze

Mechanics of Eyeball and Extraocular Eye Muscles

The eyeball can be thought of as a suspended sphere that is held in place by viscoelastic tissue that is grounded in a skeletal socket. This arrangement makes it relatively easy for a strategically-placed muscle to rotate the eyeball. Since muscles only pull, and since there is a desire to rotate the eyes both medio-laterally (left-right) and superior-inferiorly (up-down), one might expect that there might be two pairs of antagonistic muscles that tug on either side of the eyeball, one pair on the medio-lateral direction and the other in the superior-inferier direction. Indeed, this is the case, giving us the following four muscles:

  • lateral rectus (contraction rotates the eye outward)
  • medial rectus (contraction rotates the eye nasally)
  • superior rectus (contraction rotates the eye upward)
  • inferior rectus (contraction rotates the eye downward)

As suggested by their actions, these muscles insert on the eyeball about exactly 90 deg from each other. Their origin sites are close to each other, with insertions making tangential connections on the eyeball. Thus they are nearly in parallel with each other, with each having a maximal moment arm relative to the axis of rotation (which is about the center of the sphere). [There are also two other, angled extraocular muscles that are nearly in an orthogonal plane, called the superior oblique (moves eye inward and downward) and inferior oblique (moves eye outward and upward), that play more minor axial stabilizing roles.]

The inertia of the eyeball is very small, and there is normally no external load on the eye other than perhaps a small contact lens. This has two implications: not much muscle force is needed to rotate the eyeball, and the speed of rotation is rate-limited by the mechanics of muscles. Thus it in not surprising that the extraocular muscles are very slender since the required force is small, and somewhat long with fast muscle fibers since fast rotations are often desired. Indeed, these muscles have the highest proportion of fast muscle fibers of any in the human. The result is a very fast, predictable musculoskeletal system. As with all skeletal muscles, these have some key mechanical properties that as "systems engineers" we capture as a "tension-length" relation, a force-velocity property, a series elastic property, and a parallel elastic property. JW side note: as a graduate student, I published papers modeling this neuromuscular system, using six nonlinear differential equations in each plane: 2 for each muscle and 2 for the eyeball. The bottom line on these properties is that the parallel elastic and tension-length properties are tuned for an operating range of about ±60 deg, and force-velocity properties that enable eyeball speeds of over 10 rad/sec (570 deg/sec). That's fast, and can occur by maximally exciting one muscle while relaxing its antagonist.

Remember from the previous section that we see only about 2 deg of arc with high clarity. Thus there is a need to rotate the eyeball to fixate on targets of interest. When the eyes are fixated straight ahead, the motoneuronal drive to the antagonistic muscles is about 10% of maximum. To hold fixation at an angle to, say, 10 deg laterally, the drive to the lateral muscle must be a few percent greater, and that of the medial muscle less. But we don't move from location to location with such step-like shifts in activation; if we did, or visual world would spin while we moved! Rather, nature has come up with a wonderful collection of stereotype eye movements, each controlled by different parts of the brain that converge to the oculomotor nuclei in the brainstem. Thus we have the following four classic types of eye movements:

  • Saccadic eye movements. These are high-speed, near time-optimal movements that we take for granted. For these movements, motoneurons send a burst of activity to rotate the eyes quickly, followed by a new a new steady excitation drive to hold the new fixation position. The burst, often called a pulse, is near-maximum, and the magnitude of the change in direction is determined mostly by the duration of the burst. As an example, a 10 deg movement bursts for about 40 ms and reaches a peak speed of about 500 deg/sec, while a 20 deg movement reaches only a little higher speed but bursts for about 60 ms. Thus there is a near-maximal agonist "pulse" that causes a change in foveal orientation, and a "step" that helps hold the new position. The antagonist is first turned off, then often exhibits a brief pulse to help clamp (slow and stop) the movement before taking on the new step level. During the saccadic movement you do not perceive your world to be spinning at high speed because of "saccadic suppression" - you stop processing visual information during the 20-60 ms it takes to make a saccade. Once a saccade is made, you stay there for at least 200 ms, processing visual information, before you can make another saccade. This seems to be related to "locking" by the posterior eye fields. Thus there is a guarantee that nearly 90% of the time can be spent processing visual information. From an engineering perspective, this represents a sampled-data control system. Most saccades are voluntary, initiated from the frontal eye fields. This is the gaze control system you use to read a book, watch a movie, scan a picture, etc. These movements are so fast and so effective that when this control system works as desired, you have the illusion that you see your whole world with great clarity and acuity, when in fact its really only a few degrees of arc! These movements are critical for optimizing visual extraction of movement, and there has been considerable research on how and why we make choices in where to look.
  • Smooth Pursuit eye movements. This is a velocity feedback system that enables tracking of slow- and moderate-speed objects that are moving across the retinal field (specifically normally the fovea). This "retinal slip" tracking system often works in conjunction with the saccadic and VOR systems, with its gain "tuned" through the cerebellum. About 95% of individuals cannot make smooth pursuit movements voluntarily, requiring a visual stimulus. The neurocontrol input following a change in target velocity can be approximated as a step-ramp, with the step there to help get the eyeball accelerating and eliminating the retinal slip, and the ramp there to keep the eye at the right tracking velocity. Unlike the saccadic system, the smooth pursuit control system uses continuous feedback control. The smooth pursuit system is predictive, thus trying to anticipate so as to mimize the effects of neural time delays. The peak velocity for pursuit is about 1 rad/sec (57 deg/sec), and there is about a 25% decrease with age. A typical major league baseball player can smooth pursuit on the baseball for about the first 1/3 of the distance to home plate, after which they can make perhaps one saccadic eye movement that really can't influence the present swing by perhaps affect future swings. Interestingly, many major league players can smooth pursuit to higher than normal velocities - is this through practice or were they born this way (thus giving them a competitive advantage). We don't know. Saccadic and smooth pursuit movements are driven from different parts of the brain, but converge on the same motoneurons and recti muscles. They commonly work together, with the neural drive to the saccade to determine the new desired eye position seeming to anticipate the smooth pursuit movement.
  • Vestibulo-Ocular Reflex (VOR). When the head turns, whether due to a head-neck movement or a movement of the body, the eye automatically move in the opposite direction. This is driven by a 3-neuron, roughly 12-ms "arc" that starts with the sensory neuron at the sensory apparatus of the semicircular canals, includes one interneuron with an axon from the vestibular to oculomotor nuclei, and then the final motoneuron. It normally has a gain of one, and is so intrinsic that even persons that are blind have a significant VOR (but with a gain less than the 1.0 in the opposite direction). If the gain drifts from 1.0, the smooth pursuit system needs to help out to eliminate any retina slip during self-movement.. If this happens a lot, the neural pathway through the cerebellum works to help adjust the gain. This is a wonderful system that helps keep images stabilized during self-movement. Without a normal VOR, basketball players wouldn't be able to drive through the lane with the coordination and grace that we are used to taking for granted. The VOR is effective to velocities of about 100 deg/sec. Try this experiment with a friend: have a friend rotate their head back and forth while looking at your nose. You'll see smooth eye movements, and your friend will feel a stabilized gaze that can see the details of your nose as long as they don't more their head super fast. Now have them try to look straight ahead while rotating the heads. Should be trivial, right? Wrong. You'll notice that they can't, with their eyes jumping back-and-forth (they'll VOR then saccade, VOR then saccade, etc). As with smooth pursuit, the VOR works well with the saccadic system, and as a graduate student I published papers mathematically modeling this interaction.
  • Vergence eye (convergence/divergence) eye movements. This "in-out" movements work with the accommodation system to help foveate an image as it goes between near and far. Unlike the other eye movements where Herring's Law applies (i.e., two eyes move together, with tightly coupled neuromotor drives). Thus for really near targets, the eyes are a bit "cross-eyed."

Integrated Use of These Movements

These four classic eye movements are fairly easy to recognize during inspection of experimental angle versus time data. Indeed, saccades are identified by high-speed "jumps" between regions of no movement or smooth movement. If head movement is also measured, one also can easily distinguish between VOR and smooth pursuit head movements within the trace.

With appropriate mathematical mapping, one can also overplot eye movements onto the spatial images that the individual was looking at, such as a picture of art or a page that is to be read. These are commonly called scan paths, and often are displayed as lines connecting between dots. This tells us something about the sampling/processing part of the brain, and where the person chooses to focus their attention. For instance, an individual looking at a picture of a face will tend to focus their gaze primarily on key facial features such as the eyes and mouth, while occasionally jumping to seemingly random locations for a greater sampling of the image. In contrast, the gaze of many persons with aphasia display what appear to be suboptimal strategies, focusing on regions of contrast that are of less functional significance, such as ears or clothes. This is one of many examples where eye movements provide a "window to the brain"; another example is that persons with schizophrenia often display double-saccades.

Video/Codecs: Technological Video

Both the eye and the camera have i) a variable "aperture" for controlling the intensity of light, ii) a lens that includes mechanisms for focusing an image, and iii) photosensitive elements that can encode both intensity and color. In both, the density of the photosensitive elements, called "pixels" in a digital image from a camera, is a measure of resolution. In both, higher-level spatial and temporal filters are used to help remap the image to extract certain features, and intelligent algorithms are often used to recognize patterns. Furthermore, both have gone through an evolutionary process that yields multiple solutions: while different animals have different eye properties, camera resolutions and storage protocols also tend to be based on the evolutionary process, one that can be documented through the evolution of consensus standards that reflect a mixture of performance capabilities and a quasi-random economic process similar to natural selection that helps determine " winners" and "losers" by their success in the field.

There are, of course, also many subtle differences. For instance, in the still or video camera, the resolution is uniform across the field of view, unlike the strategy of a dense region of foveal cones and peripheral rods found in the eye that integrates in the eye movements that we studied in the previous web page. This is an important difference, and ironically the images and video are seen through a pair of eyes that make saccadic eye movements to determine the clarity of the image.

Technical Building Blocks for Digital Image Representation

The building block for digital images is the pixel (picture element). An image is a grid of pixels, normally described by the horizontal by vertical number, for instance 640 x 480. Each pixel has a state that relates to brightness, color, etc. There are several common schemes, all having to do with the number of bits of information being coded to describe the state of the pixel. This can range from 1 bit (e.g., black or white) to very high numbers of bits, such as the 24 or more. Very common is 8 bits, which gives 256 shades. For instance, using 8 bits (1 byte) gives nice "black-and-white" resolution, and good representation of "intensity" of light through shading. Pixels needn't be square in shape, but usually are.

For image color representation, a common approach is to use 8 bits (1 byte) for each of three "RGB" (red-green-blue) colors, where each color has 256 shades, the three colors are combined to give a truly rich variety of colors, with the number of colors depending on the standard. For instance, there is RGB8 with 256 total colors, RGBH with 32768 colors (15 bits), and RGBT with over 16 million colors (24 bits). For instance, many packages in Windows allow you to set colors, and if you try this out, you'll see that for shades of grey each of R, G and B are the same, e.g. (255,255,255) for pure white, and (0,0,0) for pure black. Pure red is (255,0,0), pure yellow combines red and green (255,255,0), and somewhat dark purple combines some red and blue (64,0,64). These are among the 48 "basic colors" that you can select from. that you can depend on any monitor or program or reliably reproduce. But you can set each of these three to any value between 0 and 255, and furthermore, to help you windows also gives to "Hue," "Sat" and "Lum" settings to help with interactive RGB setting. For instance, "Lum" is tightly tied to the degree of white, since most people will naturally associate white with brightness or intensity. Of note is that designers often use only 48 or 256 colors simply because they want to assure that what they see is what their customer will see as well. There is another standard called the natural color (YUV) format that tries to separate brightness information from color information. The Y values are for brightness (luminance) and ranges from 1-16, and the U and V are for color (chorominance) and range from 16-240. As with RGB, there are several variants on the format. There is also a mathematical mapping between the two standards:

  • From YUV to RGB (an approximation):
    • Let: Y' = (Y - 16)*255/219; U' = (U-128)*127/112; V' = (V-128)*127/112;
    • Then: R = Y' + 1.402*V; G = Y - 0.344*U' - 0.714*V'; B = Y'+1.772*U
    • if necessary clamp each of RGB to <0,255> interval
  • From RGB to YUV (an approximation):
    • Y = 0.299 R + 0.587 G + 0.114 B , then
    • U = 0.493 (B - Y)
    • V = 0.877 (R - Y)
    • where Y is a weighted sum, U is roughly the difference between blue and yellow, and V the difference between

Grid sizes also tend to follow standards. Since pixels are usually square, the aspect ratio typically represents both the ratio of horizontal to vertical pixels and the shape for the whole field of view. Common for monitors is 4:3, such as 320x240 or 640x480 or 1280x960. But there is complexity, due to human-inspired technical evolution. Let's start with TV. The U.S. and Japan use the NTSC standard, which started at 352x240, with a 1.46 aspect ratio and a sampling rate of 30 fps. Most of Europe uses the PAL standard, at 352x288, with a 1.22 aspect ratio and a lower sampling rate (25 fps), and YUV. In both standards the picture quality is pretty good, but some people from the U.S. feel that European TV seems a bit choppy; but clearly images look smooth for most people at 25 fps, and even 15 fps is pretty good. Videoconferencing systems use the CIF standard, which is 352x288 (like PAL) but 30 fps (like NTSC). A good choice. For lower-bandwidth videoconferencing such as H.324-compliant videophone systems, QCIF (quarter CIF, 176x144) is common, and typically peak sampling is at 15 fps. You'll see the difference in the lab.

What about DVD's and high-definition TV? DVD's roughly double these dimensions, with the NTSC format being 720x480 or 704x480, and the PAL format being 720x576 or 704x576. What if one wants to shift between formats? There are several options. One is the cut the sides and include the "common" pixels. Another is to warp the pixels, typically stretching the vertical dimension. Still another is to mathematically re-calculate the pixels, possibly causing a decrease in resolution. You've probably seen all of these. By the way, often cameras of higher resolution, such as one of our Sony's, might be used to collect data at lower resolution. Mapping then needs to be done using an algorithm, and this helps explain why hi-resolution cameras sometimes give just average quality for a given application. If you are planning to collect at, say, common web-cam grids of 640x480 or 320x240, sometimes a cheaper camera that is tuned to this protocol will actually provide a better image.

Digital Filters and CODECs

There are many types of digital filters for images. Such filters may extract and emphasize features, such as seen for some neurons in the visual cortex or in medical imaging products, or may provide another representation of the data. One obvious observation from the above is that digital images and video can lead to large files for storage, and a lot of information to transfer. Consider that if we multiply 352x288 pixels (CIF) by 24 bits/pixel (RGB) and then 30 fps, without compression we'd need to send 73 MBits/sec of information. Within 15 sec we'd have sent, and perhaps stored, over 1 GBits of data. That's a lot.

In reality it is not really necessary to store every pixel. Often the goal is to reduce the number of bits necessary to capture the essence of the image. This is calling compression.

Spatial CODECs. As an example, an image often has regions with little change in color (e.g., a wall). One can smooth over the region, perhaps via a mathematical transformation. The end result is a smaller file. Of one tried doing the reverse operation, the resulting "decompressed" file would not be quite the original, but might be awfully close, and perhaps imperceptible to the eye. This algorithm that is involved in this process of compression-decompression is called a CODEC. There are many codecs, two common ones are GIF and JPEG. GIF is an example of a loss-less algorithm in that the original image quality can be recovered, while JPEG is an example of a "lossy" algorithm that assumes that some details others are.

An example of a "loss-less" algorithm is Compuserve's GIF (Graphics Interchange Format), covered by a patent from Unisys, that is based on the LZW of the 1970's and 80's, an algorithm that added the ability to use variable-length codes for compression translations that has roots in classic information theory. This algorithm, implemented in the 1980's with the web in mind and broadly supported by Internet browsers and development environments such as Microsoft's .Net, has constraints such as a maximum color palette of 256, and flexibility in areas such as selecting quality vs file size, and implementing transparency, and single-file animation and looping (loading image, then giving progressive display improvement via interlacing more details). An alternative without the proprietary concerns is PNG (Portable Network Graphic), which has greater functionality in terms of image size and colors but doesn't implement animation and looping.

An example of an effective "lossy" algorithm is JPEG (from the Joint Photographic Experts Group) - files with a .jpg extension are associated with this CODEC. JPEG files essentially apply a mathematical transformational model to 8x8 pixel blocks of the image, using the Discrete Cosine Transform (DCT) and a quantization scheme that gets rid of higher frequency content. The degree of compression ranges from about 2 to 17 times, depending on settings for the algorithm and the type of image, and of course the more compressed, the more risk of a worse representation of the original image (i.e., "lossy" compression). Each CODEC has advantages and disadvantages. For instance, JPEG is great at keeping representations of subtle shades in color, but not so good with sharp boundaries on images. JPEG falls under the collection of approaches that are part of the international SPIFF (Still Picture Interface File Format, .spf files) standard (ISO/IEC 19818-3).

Other newer spatial coding schemes are using wavelets and the Discrete Wavelet Transform. The latter are robust, but like JPEG blur edges. Many other algorithms, including fractals, are being tried. Often these are offshoots of EE faculty and students, who love this challenge. So undoubtedly improvements (mostly incremental) will continue to be on the horizon.

Spatiotemporal CODECs. The popular AVI file format usually uses motion JPEG, or MJEG. This is rather conservative, in that often only a small part of a video image may actually change between two frames. It's crazy to keep storing the same information on frame after frame. Thus combining filtering with temporal filtering makes sense. Addressing this need was critical to the emerging videoconferencing field, and in 1990 the H.261 standard was approved that targeted video operating on multiples of 64Kbits/sec transmission (i.e., the maximum capacity of a dedicated phone line), based on the Discrete Cosine Transform for spatial compression and a block-based motion algorithm for temporal compression; this formed the basis for all subsequent video algorithms.

Two bodies have been involved in forming video compression standards: the ITU-T (for the H.26x standards) and the ISO/IEC (for MPEG Moving Picture Experts Group) standards). While they overlap, the ITU-T is a bit more targeted on video for teleconferencing, the ISO/IEC group towards video for multimedia (both transmission and storage). The ITU's also tend to target the CIF standard (352x288), and more aggressive compression. For instance, the ITU realized the need for a standard that worked well below 64 KB/s, and hence came the H.263 (1995) and H.263+ (1998), which has much more flexibility (e.g., algorithms adjusts modes with different connection speeds) and has replace H.261 for most videoconferencing products. The mathematical foundation for the intra-frame compression algorithm is a hybrid space-time filter: it does some preliminary work (e.g., maps to YUV if initially in RGB, and subsamples U and V), then uses the DCT) in 8x8 image blocks (as does JPEG), but then it checks correlations between subsequent frames and then implements an algorithm across time for motion compensated prediction between frames. When any of the DCT-based algorithms have trouble, for instance at low bandwidths, the familiar "checkerboard" effect of 8x8 blocks forms around movement transitions. You'll see that this happens often during human movement when using our lower-bandwidth H.324 systems. The rule of thumb is that for near-TV quality videoconferencing, the bandwidth needs to be at least 384 KBits/sec (i.e., three ISDN lines) - checkerboard and blurring effects become rare. This is what we usually use, though we can go higher in that we have four ISDN lines in both of our videoconferencing rooms and thus can go as high as 512KBits/sec. Our Polycom systems also support IP conferencing, which we use mainly with a group from UC Berkeley; for IP the calls are free and there is not a dedicated line with guaranteed quality of service, and while our meetings are normally good-quality, on several occasions the transmission of both video and audio has been choppy.

The above lack the multimedia flexibility of those for MPEG, which is now widely used for streaming video, DVDs, etc. For a summary of the well-known collection of MPEG standards; see UC Berkeley's Multimedia Lab. These also add smoothing through time to smoothing through space, and in our group we routinely see compression ratios of well over 30x. MPEG-1, approved as a standard in 1992, was the original standard for storage and retrieval of moving pictures, for rates up to 1.5 Mbits/sec, i.e. compression of about 50 times (but still a high rate compared to H.261). MPEG-2 added considerably in the area of scalability, motivated by digital TV. MPEG-4 is an impressive extension that includes an impressive new video codec that also goes as ITU's H.264. It adds some universal accessibility features and robustness (for a huge range of bandwidths), high multimedia interactive functionality, and compression efficiency. MPEG-2 is also the ITU's H.262, and MPEG-4 is also ITU's H.264. This latter codec has taken on great significance, and most of the key companies have or are about to implement it in all or most of their product line.

The recent H.264/MPEG-4 standard, the result of combined efforts by these two key standards bodies involved in codec standards, is the newest that is currently being rapidly implement. It roughly doubles the quality for a given bandwidth, which obviously represents a significant improvement. All of the key videoconferencing companies (e.g., Polycom, Tandberg, VCON) now have H.264 embedded in some of their suite of products, with more to come. In addition to some videoconferencing systems, we now use a form of this standard (DivX implementation of MPEG-4) for compressing our digital video for our Mobile Usability Lab (MU-Lab) system, which you will be using for the lab associated with this module.

Microsoft is systematically supporting more and more video codecs, for instance as alternatives for encoding video within AVI files. The Windows Media Video 8/9 encoder, freely available, offers real-time encoding, as well as accepts formats such as .avi and .mpg, and then offers a variety of scalable capabilities, with default profiles such as DSL/Cable delivery at 250-500 Kbits/sec with 320x240 pixels at 30 fps, and 56 Kbits/sec at 160x120 pixels at 15 fps.

You'll see many of these in action in the Telerehab and Human Performance Lab, for Module 2 (Sensorimotor) and especially Module 3 (Telerehab, see especially section of videoconferencing standards).

Visual Impairments, Disability and Access/Accommodation Strategies

Now that we've developed a background on the visual system and the gift of sight, let's go through what can go wrong with the system (impairments), and possible accommodations. A good source for eye disease/dysfunction/disability is at McMaster. Of note is that the incidence of visual impairment is in individuals with severe physical disabilities is higher than is often recognized, with these difficulties often not treated (e.g., most children with cerebral palsy have visual impairment). These impairments can affect visual acuity (due to motor and/or sensory sources), visual field (e.g., sampling surround, center), visual tracking and scanning (e.g., faulty saccades, smooth pursuit), and visual accommodation (poor focusing).

Here are a few highlights that you should know:

1. Dysfunction within the eye:

  • As an individual grows older, the lens get larger, thicker and less compliant. The lack of accommodation for near objects means reading glasses due to presbyopia.
  • The cornea, as the barrier to the environment, has problems related to allergies, swelling/itching (conjuctivitis, or pink eye), infections, various dystrophies, abrasions, and ocular herpes. Many affect the size/shape of the cornea, and thus its role in curving light.
  • A cloudy area within the lens of the eye is called a cataract. This causes light through the region to be blocked or scattered, with symptoms including blurred vision, sensitivity to bright lights, and poorer color sensitivity. By 65, about half of all adults have them.
  • High intraocular pressure within the eyeball, or ocular hypertension (e.g., to increased outflow resistance), above about 30 mmHg, is called glaucoma. It is one of the most common causes of blindness, and can happen on a time course of hours to years.
  • Macular degernation, or central vision loss, is an incurable eye disease the leading cause of legal blindness, affecting more than 10 million Americans, especially those over 55. For "dry" macular degeneration (about 85% of cases), deterioration of the central part of the retina is associated with formation of small yellow deposits, leading to thinning and drying of the macula. There is gradual central vision loss, first blurry and then with blank spots forming.
  • Blind spots, e.g. due to eyeball debris or optic nerve damage, are called scotomata.

2. Neural:

  • Selective optic nerve destruction of the optic chiasm affects the nasal signals from each eye, called bitemporal hemianopsia.
  • Lack of fusion of the two eyes (e.g., cross-eyedness) is called strabismus.
  • Involuntary, rhythmical, repeated oscillations or jerky movements of one or both eyes are termed nystagmus. There are many types, all related to neural dysfunction. Often the visual cortex adjusts somewhat.
  • Lack of sympathetic nerve drive (e.g., due to Horner's Syndrome) causes decreased or lack of ability to dilate the pupil.
  • Many common diseases with neuromotor impairment, or head trauma, show up in eye movements. For example, schizophrenics (e.g., double saccades) and autistic/aphasic kids (saccadic scanning) have already been mentioned.

For a more general (but common) term, poor visual development (with poor visual acuity) is called amblyopia. The source is usually within the eye itself, although "wandering" eyes are also often classified here.

Access/Accommodation Strategies

Strategies for accommodating visual impairment tend to fall into two categories: those that augment existing capabilities (e.g., assumes partial sight) and those that use an alternative form for communication that intends to replace missing function (e.g., assumes the person is blind). In fact there can be a continuum between mild impairment and blindness, and many people are legally blind but have partial sight. Such a person may use both a technology that augments residual sensory abilities (e.g., magnifier) and a technology that is intended for the blind (e.g., screen reader). As another example, many persons with partial sight have a seeing eye dog. Here we classify technologies into those that are augmentative (beyond glasses or contact lenses) and those that are replacements for lost vision.

As we have seen, some people have poor fovial vision but adequate surround, others have significant cloudiness in all or certain parts of their visual field, others cannot see certain colors, others have variable vision that changes with disease changes or fatigue, others are hypersensitive to light, others see double vision, and still others cannot make normal voluntary movements. Thus the appropriate augmentative technology depends on the person's abilities and their needs in life. Examples of augmentative technologies include hand-held magnifiers, options for changing text size and/or colors on an computer monitor, environmental changes in room/device colors or lighting so that recognition is more likely, and tactile cues for button controls on devices, transitions on walkways and other physical objects.

Technologies that are intended as replacements for lost vision include Braille, canes, screen readers for text and other visual content (typically with audio used to communicate content), and specialized audio signals (e.g., orientation cues, communication cues). In many public areas there are now audio cues that communicate the color of a traffic light or the location of a bus stop. Non-technological approaches include human companions who make a special effort to describe visual events, and seeing eye dog companions.

Of note is that some persons who are blind have compensated by developing remarkable capacities with other senses, such as hearing and localizing sound, or touching. Thus another strategy is to take advantage of these other sensory abilities through alternative modes for providing information, working to make the alternative mode as equitable as possible.

 

 

©2003-2004 Jack Winters ... BIEN 167 Home