深知未来
Hi,Welcome To Contact Deepthink
Name*
Company*
Email Address*
Contact Number
What Can I Help You

AI reshapes image generation: seeing the night as day, and seeing the subtle as fire

August 28, 2023 Blog Reporter:Luyao

“Color is joy,” said Ernst Haas, the most famous and influential photographer of the 20th century. In the 1960s, when the “most serious” photographers only wanted to see the world in black and white, this pioneer of color imaging was the first to use Kodak film to express the undeniable power of color.

 

At the same time, French volcanologists Katia and Maurice Krafft used 16mm lenses and Nikon F2 film cameras to record the thrilling movement of active volcanoes, with the orange-red magma gushing out like blood flowing in the heart of the earth.

 

 

The development of optical imaging technology has greatly expanded human vision and expression capabilities, but it is difficult for humans to get out of the dilemma of color. We cannot see everything in the night as freely as in the daytime, although most magic happens at night.

 

1. Human Eye, ISP and Digital Imaging

 

The inspiration of optical imaging technology comes from the observation of human visual activities. When the photons entering the eye hit one or more of the 12.5 billion photosensitive nerve cells in the retina at the back of each eye, visual processing begins:

 

Humans have about 130 million rod cells, which use rhodopsin to receive weak light, help us perceive brightness changes of specific light intensity, and also dominate our visual function in low light at dawn and dusk and at night.

 

The wavelength range of visible light is 380-790 nanometers, which is also the only wavelength range with color information.

 

Compared with the powerful rod cells, humans only have about 7 million cone cells. Cones rely on related photosensitive pigments to distinguish colors, and they can only work properly when there is sufficient light.

 

If it is in a particularly dark environment, the cones will stop working and cannot distinguish different wavelengths of light, and we can only see gray scenes.

 

The two types of cells are distributed differently in the retina and have different functions. Rod cells (blue) mainly sense light and darkness, while cone cells (red) sense different colors.

 

The cameras of common mobile phones, surveillance, security and other equipment around us are also visible light cameras. One of the biggest differences between them and film imaging is that the photosensitive medium has changed from film to image sensor (such as the common CMOS), a metal oxide that is responsible for converting light signals into electrical signals.

 

CMOS is covered with micro-metal components according to rules. They are like small recorders who are loyal to their duties and record the brightness information of the corresponding positions, called pixels.

 

CMOS is like rod cells, which can only sense the intensity of light but not the wavelength of light (which means that it cannot record colors). Scientists later added a filter layer in front of the image sensor, and calculated the color of each pixel through the filter results (RAW images) received on the CMOS by relying on special algorithms.

 

At this time, it is the turn of ISP (Image Signal Processor). It is specifically responsible for post-processing the voltage and current signals output by the front-end image sensor, striving to better restore the details of the scene so that people can understand the image.

 

The imaging process of a digital camera. The ISP is at the final stage of the process, using image algorithms to post-process the photoelectric signals received from the image sensor.

In fact, these post-processing processes are all based on image algorithms. For example, the algorithm that infers the color of a pixel is called “demosaicing”.

 

 

Almost all devices now automatically perform “linear correction” by default, which is to use the ISP to perform a linear transformation on the darker image of the machine to offset the overly dim output and make the final result consistent with what the naked eye actually sees.

 

 

In low light conditions, it is difficult for the image sensor to receive enough light information. Therefore, a higher ISO (sensitivity) or a slower shutter speed is required to increase the amount of photons received by the photosensitive chip, but this often leads to heating problems and produces noisy images. The ISP has an advanced noise reduction algorithm that can reduce various color or pattern noise while retaining texture details.

 

 

White balance aims to accurately restore the original color of objects in various complex scenes, even if you are shooting a piece of white paper under an incandescent lamp. Automatic exposure control calculates and controls the aperture, shutter speed and ISO by analyzing the brightness information from the sensor to make the image brightness appropriate.

 

Therefore, ISP technology largely determines the imaging quality of digital cameras and is called the “brain” of camera equipment.

 

2. The dilemma of ISP and traditional night vision solutions

 

However, the magician of visible light has also given ISPs many difficult problems to solve. During the day, if the light is too strong or the contrast is too great, such as backlight or the perception of a vehicle coming out of a tunnel and suddenly facing strong light, it is difficult for the human eye to solve it, and the camera cannot do it either.

 

As the sun is swallowed by the horizon, in extremely weak light, traditional ISPs can hardly see anything.

 

According to military standards, a full moon is about 0.1 Lux (luminous flux per unit area), followed by a quarter of a moon, which is about 0.01 Lux. If there is no moon and there are only stars in the sky, it is about 0.001 Lux. We define this starlight level (illuminance) as extremely weak light. Zhang Qining, CEO of DeepKnow Future, explained.

 

Whenever night falls, places with poor lighting coverage such as Shenzhen City Park Lake are basically extremely weak light. Parking in the parking lot in the community is inconvenient because of the dim street lights, and reversing is also a scene of weak or extremely weak light.

 

 

Since its establishment at the end of 2017, Shenzhi Future has been committed to using its self-developed AI ISP technology to break through the limitations of extremely weak light scenes (complex light conditions such as low illumination, backlight, backlight, and rain, snow, and fog) to achieve real-time full-color imaging in such scenes.

 

In outdoor sports scenes, more and more people like to climb at night, and almost every week in Shenzhen, one or two people get lost because of climbing at night. At this time, the relevant rescue team will use drones to search and rescue on the mountain after receiving the news.

 

The night is also a natural umbrella for criminals. Nearly 70% of crimes occur at night, and the peak period of crime is from 7 pm to 5 am the next day.

 

In addition, the garrison of more than 20,000 kilometers of borders, illegal monitoring of mountain and desert oil field operating areas, regular supervision of rivers under the ten-year fishing ban in the Yangtze River, power inspections, and wildlife monitoring, etc., due to the weak light, traditional photographic equipment is difficult to detect at night, and infrared cameras must be used.

 

In some national nature reserves, you can see infrared cameras tied to tree trunks with wire to monitor wildlife. It will actively emit infrared beams (non-visible light) to illuminate the target, and convert the infrared image reflected by the target into a visible light image for night observation.

 

Infrared cameras recorded leopard cats appearing at night in a nature reserve.

 

This active infrared night vision system can also be used for observation in total darkness. However, since it receives infrared light of a single frequency band reflected by an object, it does not contain the primary colors of visible light such as green and blue, and cannot present color effects. After processing, infrared imaging can only present black and white effects, which cannot meet the needs of capturing more target details, such as color and stripes.

 

In urban monitoring scenarios, highly reflective objects such as license plates can be easily overexposed through infrared fill light, and clothing color, body color, license plates, etc. are often key clues to solving cases and cannot be lost.

 

The principles of laser night vision devices are similar to those of infrared night vision devices, and they also belong to active sensing imaging, but the observation distance is farther, up to several kilometers. In addition to the same problem of signal interference, the module price is not cheap, which indirectly increases the total system (visible light + invisible light) cost of building high-quality cameras.

 

In addition to the common active sensing imaging methods mentioned above, there is also a passive infrared night vision system. Thermal imagers automatically collect invisible thermal radiation from all objects in the scene and convert thermal distribution data into video images. They are also widely used.

 

For example, they monitor transmission lines for poor contact, leakage, overheating, or tree obstacles; they are carried on drones to observe the movements of elephants, suspicious people, and vehicles, as well as monitoring suspicious people, vehicles, and ships in oil fields and on the sea.

 

 

Thermal imagers monitor the movement of elephants. The principle of thermal imagers is that all objects with a temperature above absolute zero are constantly radiating infrared rays.

 

Like visible light imaging, thermal imaging is also a passive sensing imaging method. However, the imaging results lose a lot of information such as features and textures, and look like ghost images.

 

Zhang Qining believes that (in thermal imaging) the human face is a fuzzy presentation of the whole, and it is difficult to distinguish facial details (including eyes, nose and even wrinkles), making it difficult to complete high-quality visual imaging.

 

Due to the low imaging quality, it is easy to cause false alarms when the temperature difference of the target (such as wildfire monitoring) is not large. In addition, because it cannot optically zoom, thermal imagers cannot see the target clearly at a long distance, while visible light lenses can detect much farther.

 

Recently, Nature reported that researchers from Purdue University and Los Alamos National Laboratory in the United States have developed a heat-assisted detection and ranging (HADAR) system that trains artificial intelligence (AI) to determine the temperature, energy characteristics and physical texture of each pixel in the thermal image, and the resulting image is almost as clear as the image taken by a traditional camera in daylight.

 

That issue of Nature featured the HADAR study on its cover

The study proposed a method called HADAR, which combines thermal physics, infrared imaging and machine learning to try to restore the target texture and overcome the ghosting effect.

 

This technology is actually a pseudo-color, predicting the color of an object based on the material. Zhang Qining also noticed this study. “It’s like painting with crayons. The crayons are all made of the same material, but they have different colors. It’s actually difficult to predict what color the crayon is.”

 

From a commercial perspective, HADAR technology is even less advantageous.

 

When digital cameras were first invented, they only had 280,000 pixels. Later, people have been working to make CMOS accommodate more photosensitive units in a very small area, and to madly improve the camera resolution – from 1 million, 5 million to tens of millions, 35 million or even hundreds of millions, and the imaging effect is completely comparable to that of traditional film cameras.

 

Nowadays, mobile phone lenses generally have tens of millions of pixels, while high-end infrared thermal imagers have only millions of pixels. Why? Because the pixel of the core component detector cannot be made smaller:

 

The infrared light (8 microns to 14 microns) used in thermal imaging has a very long wavelength and greater energy, which leads to the pixel size on the detector to be very large. The pixel of a visible light camera is only 1-2 microns, while the detector pixel of an infrared thermal imager is 12-17 microns each.

 

When the lens size is the same, the pixel of the thermal imager lens is much less than that of the visible light lens, so the imaging effect is naturally much worse.

The smaller the pixel size of the thermal imager’s detector, the more pixels there are, the higher the resolution, the larger the field of view, and the wider the field of view.

 

Thermal imaging chips are difficult to make smaller, and even if they are sold in large quantities, the cost cannot be spread out to be cheaper than CMOS. Zhang Qining believes that thermal imaging technology will have obvious advantages in specific segments, such as identifying signs of life in completely dark scenes. If it is placed in other scenes that require careful identification of details, the advantage is relatively not obvious.

 

At present, cameras in extremely weak light environments have “basically remained in the black and white era”, and there has been no particularly good way to solve the problem of color imaging. Zhang Qining said that in extremely weak light, there is almost no way to achieve high signal-to-noise ratio quality imaging.

 

In his memoirs, SONY founder Akio Morita talked about Sony’s self-developed Trinitron in the 1960s, believing that color TV was the general trend.

 

However, humans will pay for the upgrade of experience. This is true for color film, color movies, and the replacement of color TV with black and white competitors. In 2016, Time Magazine listed Japan’s Sony Trinitron (color cathode ray tube) TV, along with iPod, iPhone, Macintosh, and Google Glass, in the list of the 50 most influential electronic devices of all time.

 

In terms of product functions, recording, storage, and night vision functions have become standard features of cameras. According to the 2022 annual report of Loto Technology, 97% of cameras support night vision and are gradually evolving towards color. The share of full-color day and night cameras has increased from 20% in January to 31% in December.

 

Moore’s Law is still in effect. In the future, when AI computing power is cheap enough and power consumption is low enough, we can replace the night vision engine for each camera at a low cost. Zhang Qining said.

 

If the cost performance and power consumption are comparable to the current imaging chip, why don’t we use a full-color night vision camera?

 

3. Another way: Integrate AI and soften ISP

 

Our current technology can really do some high-quality imaging in extremely weak light, and carefully distinguish some details of people and things in the dark. According to Zhang Qining, in many key imaging tasks, it can be improved by hundreds of times.

 

Comparison of laser solution (left), thermal imaging (middle) and full-color night vision imaging effect (right) in extremely weak light.

 

For example, based on traditional ISP hardware, only 0.1Lux imaging can be performed. After being enhanced by our AI-ISP, 0.0001Lux imaging can be achieved.

 

How clearly you can see depends on the size of the target object. He explained that if the monitoring range is to reach 10 kilometers, you can still see tall buildings, bridges and other huge buildings in extremely weak light. If you need to monitor a range of 3-5 kilometers, it is basically monitoring ships at sea, large vehicles on the ground, etc.

 

If you want to see a person clearly, the current optical range is only one or two hundred meters.

 

Typhoon Dusurui passed through Quanzhou, Fujian, and heavy rainfall caused secondary disasters such as urban flooding. The Nan’an Swift Rescue Team in Quanzhou, Fujian, used the S2 full-color night vision camera of the Deep Future to carry out night rescue with drones.

 

In 2018, Intel’s CVPR paper Learning to see in the dark used a model to fit the entire ISP process, taking a RAW data as input and directly outputting an sRGB film, with amazing results.

 

The paper was very popular. To some extent, it demonstrated the possibility of realizing all the functions of the ISP through the entire neural network. Especially from a commercial perspective, it increases the useful range of visible light camera systems and provides a possibility of providing real-time full-color clear images day and night at a lower cost.

 

Subsequently, some companies such as Deep Future, Ambarella, Huawei HiSilicon, Aixin Yuanzhi, and Eye Engine Technology began to explore the use of neural networks to build visual imaging engines.

 

Lenses and CMOS are mainly analog devices, so it is difficult to add algorithms. Given that the two industries are already very mature, it is difficult to achieve a breakthrough at the principle level (unless it is materials).

 

However, ISP is related to algorithms. It will do a lot of processing on the received electrical signals, and will erase a lot of useful information. On this basis, if you try to improve the downstream recognition effect, you will lose the opportunity. Especially in low light and high dynamic scenes, image distortion and information loss are inevitable.

 

For example, some CMOS have reached 160dB, but the dynamic range of most traditional ISPs is still at 48dB, just like taking a country road after a highway. Since the ISP country road has limited traffic volume, it will do some processing on the received electrical signals, such as discarding a lot of information by removing the darkest and brightest parts.

 

If it is unreliable to end the entire ISP at one time, considering that the computing power on the terminal side is also limited, combined with the power consumption and cost issues in landing, can the key links related to imaging quality be AI-based, and more information can be directly extracted from the original data (such as photosensitive chip data) and let AI process it? For example, use one DNN for white balance, another DNN for demosaic, and then let many neural networks work together?

 

Following this line of thought, Huawei HiSilicon released the Yueying ISP chip in 2021, which is regarded as promoting the ISP shift of the entire security industry. Yueying AI ISP can intelligently distinguish between signals and noise in images and achieve intelligent noise reduction in low-light scenes.

 

In 2022, Ambarella, which has 17 years of experience in ISP processing, also announced the launch of AI ISP, which can achieve color imaging in low light with extremely low illumination and minimal noise, which is 10 to 100 times better than mainstream ISP performance, and has more natural color reproduction and higher dynamic range processing capabilities.

 

 

Shenzhi Future also uses neural network deep learning to learn the distribution characteristics of noise and signals, and trains a deep learning AI algorithm that can separate noise and real signals from extremely weak light signals. While reducing noise, it enhances the real signal to the intensity of a normal light environment, with a signal-to-noise ratio improvement of up to 25dB, achieving normal imaging in extremely weak light environments.

 

With the future AI ISP technology, the signal-to-noise ratio can be improved by up to 25dB, achieving normal imaging in extremely weak light environments.

 

The Kunming Fire Brigade in Yunnan Province used the Deep Future night vision camera to mount the S3 on a drone for testing at night. This picture shows the comparison of the S3 VS other night vision camera payloads.

 

The power of neural networks lies in their ability to model complex scenes, which makes the image effect surpass that of traditional ISPs, especially in terms of noise reduction and contrast enhancement under extremely low illumination.

 

“All we do is collect enough data to enhance modeling capabilities, especially various corner cases.” Zhang Qining gave an example, saying that people who grew up in Shenzhen may not be able to imagine how dark the Tibetan Plateau and glaciers are at night. We have never seen such a dark scene. What other extreme situations will imaging encounter throughout China and even the world? Can our algorithm cover them?

 

Obtaining these bad cases and then training them in a targeted manner can enhance the ability to cope with complex scenes and update ISP parameters in real time. Iterating the visual model can quickly achieve the upgrade of chip product image quality.

 

In comparison, since traditional ISPs must run on FPGA or ASIC when performing imaging, they must have very strict timing hardware to ensure that their latency is controllable. They are completely solidified into circuit logic. Therefore, they cannot be effectively personalized and can never be upgraded.

 

At present, the trend of combining AI with traditional ISP is very prominent among mobile phone manufacturers, with the purpose of increasing the camera effect of mobile phones and deepening brand differences. In addition, AI ISP is also entering the fields of security, drones and even autonomous driving.

 

Since last year, our largest commercial scenario is the full-color night vision camera mounting of industrial drones. Zhang Qining told us that the business model of the drone market has been verified and now has a mature series of product lines.

 

The application of domestic industrial drones is actually very extensive, including public security, fishery administration, border defense, coastal defense, firefighting, emergency response, etc. At present, there are more than 200 industrial drone-related companies in China, mainly focusing on the three key areas of agricultural plant protection, power inspection and police security.

 

At the same time, Shenzhi Future is also exploring the consumer market for night vision cameras, such as handheld night vision telescopes.

 

4. Moving towards 2.0, killing ISP

Now, we are still in the AI ​​ISP 1. 0 era-part of the traditional ISP process plus part of the neural network process-essentially a transition. In Zhang Qining’s view.

 

The current solution requires not only the traditional ISP, but also the NPU. Whether in terms of cost or power consumption, it is definitely higher than the original one, and it is not so easy to achieve a flat replacement.

 

Chips are very expensive, and now we have to leave a piece of land for the ISP, which is quite large, even larger than the NPU. Sometimes, the power consumption will exceed that of the NPU.

 

Because the ISP and NPU need to work together, some data exchanges between the two sides will cause the NPU to not be able to work at full power, and usually the utilization rate may only be 10% or 20%.

 

However, ISP technology is still evolving, and integration with AI is only one direction. There is also a view that, based on some of its own shortcomings (such as power, end-side computing power, training and inference costs, etc.), AI ISP cannot completely replace traditional ISPs.

 

In Zhang Qining’s view, the technical path of integration with AI must continue to evolve to a state where everyone has no choice before it can truly achieve large-scale replacement. “Next, we want to completely cut off all traditional ISP processes and replace them with neural networks.”

 

Vision has been using neural networks for imaging since its birth. As a result, trilobites became the overlords of the Cambrian ocean and survived on Earth for nearly 300 million years before becoming extinct. Human vision itself is a very pure neural network.

 

It is expected that the prototype of the 2. 0 framework will be realized by the end of this year. According to Zhang Qining, this is an all-in-one neural network that no longer relies on any traditional ISP pipeline.

 

You can think of it as a multitask neural network that can achieve a lot of tasks, unlike the current solution that requires a lot of neural network collaboration.

 

“Only NPU is needed, it is a brand new species.”