EN

News

Your current location:Home >News > Industry News >

Detailed 3D vision system technical solutions, multi-scene applications may detonate the market
 
The development of 3D vision technology
 
        In recent years, with the development of chip technology and the deepening of related software and hardware systems, visual sensors have been widely used. The society is getting smarter and smarter. Artificial intelligence and big data technology can be used to intelligently use the images recorded by people, instead of putting images and videos on a shelf in cabinets.
 
 
From film to CCD to CMOS, which is now particularly mature and ubiquitous, our pursuit of image sensor performance has gradually changed. Proactive and post-photographs began to appear on mobile phones, and the so-called "Yuba" and "Gatling" appeared in the post-photographs. With the support of the algorithm, the usefulness of each camera is different.
 
        And 18 and 19 years will be the two years when 3D image sensors take off and take off. With 3D sensors, it is easier for us to do event-based analysis and directly guide the optimization of images around us to make somatosensory games, facial payment, automatic obstacle avoidance by robots, and automatic industrial sorting.
 
        In 2016, AlphaGo became a computer Go program that defeated professional nine-dan Go player Lee Sedol without the help of handicap. This matter caused a sensation among human beings and a variety of discussions. Following this is the overwhelming propaganda of artificial intelligence, which has given countless people the confidence, and a big wave of machine intelligence is coming.
 
AI is a very popular word now. Many people want to do AI, and many people want to rely on AI. The emergence of AI is equivalent to having a smart brain. The previous processor can only deal with the problem of a specific scene. AI brings self-learning and self-improvement functions to the world. Especially for the processing of complex scenes, AI is more "smart".
 
       But only with AI, autonomous driving can't be done. It also needs various sensors such as cameras, lidar, millimeter wave radar, etc.
 
       Face recognition is also a very good technology, which can be used to make face recognition gates and faceless payment. However, face recognition is still susceptible to environmental interference and hacker attacks in many cases.
 
       Therefore, if we want to do well in AI, sensors are essential for us to enter the era of intelligence. With the 3D sensor, the sweeper will not stumble, and the phone will not be unlocked with just a photo or video. Autonomous driving can also detect incoming pedestrians and vehicles, making it safer.
 
       3D sensors have a wide range of applications in almost all areas of AI, such as new retail, autonomous driving, personalized education, smart medical care, smart security, smart monitoring, smart robots, and so on. In 2019, we will also usher in the wide application of 3D vision technology in various fields.
 
3D vision technology solution
 
1. Binocular vision
 
       When it comes to 3D vision, it mainly means that the image is not only the two-dimensional XY coordinates, but also the distance, size, and size of the object being photographed, which is the space coordinate Z.
 
       Leaning on the left and right eyes, we can estimate that the front door is 3m away, the teacup on the table is 1.5m away, and the distant tree is about 10m away. Bionics is very well applied. Through two cameras, the UAV can accurately distinguish the distance of an obstacle in front of a telegraph pole. Because in its left eye, the object coordinate is A, the corresponding field of view angle α, the right eye coordinate is B, the corresponding field of view angle β, and the baseline distance x is determined on the mechanical structure earlier. In this way, through the following formula, we can get the z-axis distance of the space point.
 
 
 This method has been used for many years. Technically speaking, all the points in the field of view are unreliable, and it is uncertain whether the two points in the left and right eyes are the same. Its advantage is that the observation distance is long, the accuracy is high, and the cost is relatively low. The disadvantage is that facing a single scene, such as a white wall, fluctuating water surface, white snow, and green grass, we all lose the reference point. At this time, the drone or the processor cannot calculate the precise depth.
 
       This is why binocular cameras are rarely used in mobile phones, face recognition, face unlocking, etc. Another problem is that if we want to do a high-resolution deep detection of the surface of the object, then the processor must first do multi-point image data matching. The computing power requirement of this matching algorithm is beyond ordinary people's imagination, and then execute the image. The calculation of the formula, and this operation is trigonometric function level, more complicated. It can be imagined that if you need to model the depth information of 1000 points on the face surface, the amount of calculation required is so complicated.
 
2. Structured light
 
        In 2017, the iPhone X was launched. It uses 3D structured light to calculate the 3D data of our faces, which once again triggered a technological trend.
 
       For structured light, it is actually a very old technology, but Apple can make it into a mobile phone, which surprised everyone.
 
 
The picture on the left is a simple experimental version of 3D structured light. Through the projector on the right, black and white striped patterns can be projected. These striped patterns on a fox mask will produce a certain amount of distortion. After the distorted shape is photographed by the CCD camera, the distorted state can be used to calculate the corresponding uneven 3D information of the mask. For example, if the stripes are curved to the left, they represent bulges, and they are curved to the right to represent depressions.
 
       The basic principle of the single-point structured light triangulation method is shown on the right. The laser light source produces a very small and bright red dot. After the sensor receives it, the coordinates of this particularly bright point (x', y) can be found on the sensor surface. '). Combining the projection angle of the light source, the baseline distance b, and the lens focal length f, the three-axis coordinates (x, y, z) can be resolved by the above formula.
 
       The iPhone X uses a 30,000-point projector, and then uses a 1.4 million-pixel infrared camera to collect all the information of these projection points. A complicated problem in the middle is to use these 30,000 points. Point every point exactly matches. The difficulty here is to accurately find the exact ID of the point hit on the face, that is, to know the projection angle and baseline distance of the hit point. This matching algorithm requires a very large amount of calculations. And in order to reduce the amount of calculation, the arrangement of these 30,000 points seems to us to be random, but it actually conforms to a certain mathematical and geometric law.
 
       It can be seen that this calculation formula contains various geometric parameters, so the assembly process is very demanding, and later the customer falls or shakes the mobile phone, which may affect the 3D measurement accuracy.
 
       In addition, due to patent protection, it is difficult for others to enter. Therefore, the industry still admires Apple's ability to launch this program. Apple still has quite strong engineering capabilities. Because IPHONE X has a good profit margin, Apple can do this. It is quite painful for other manufacturers to do this, limited by cost and technical difficulty.
 
3. ToF
 
       Time of flight, that is, flight time. The early flight time was an experiment conducted by Galileo in 1638. Earlier, scientists thought of a series of methods to solve the problem of measuring the speed of light.
 
 
And now we use this principle, because the speed of light is known, and there are many modern and accurate methods to measure it. At this time, you only need to know the delay of the signal to measure the precise distance of an object.
 
        For example, the light travels 60 centimeters in two nanoseconds of time. Then for a simple stroke, it is 30 centimeters. But if you want to do face recognition, or do an obstacle avoidance, then the basic requirement is within 1 cm. Face recognition may be higher, such as 3 mm. So this time is basically at the picosecond level. The difficulty of this scheme is to control the electrical system and the optical system to achieve very high-precision timing.
 
 
In terms of software complexity, binocular vision is high, because it needs to find two points that exactly match on a relatively uncertain picture.
 
       In terms of material cost, structured light requires a very high assembly accuracy. At this time, there will be a lot of "scrap". On the whole, the material loss is very large. For these two issues, ToF is basically done in a hardware method, so these two points will be relatively dominant.
 
       Since ToF has achieved picosecond and nanosecond response, both frame rate and processing speed are basically determined by the computing power of the hardware, so ToF can achieve very high frame rates.
 
       From the perspective of ranging accuracy, when the binocular solution encounters some relatively monotonous objects, there is basically no way to discuss the accuracy issue; structured light is indeed relatively accurate at present, like some industrial applications , It may still be based on structured light now; and the ranging accuracy of ToF needs to be improved by the electrical chip to improve the time measurement accuracy. If the picosecond delay measurement can be accurate to the femtosecond level, then the measurement accuracy of ToF can be further improved .
 
       Because the scene application is very complicated, for example, when you want to use a 3D camera at night, it will be more difficult for the binoculars, because it does not have a fill light for its own fill light. Structured light and ToF are both active light-emitting solutions, so in this area, they are relatively dominant.
 
       And in a hot summer outdoor scene, the sunlight contains a lot of energy that interferes with the light. The binocular camera actually likes this kind of clear scene, but the structured light is easily submerged in noise or background light by the sun's light because of the light point it emits. At this time, it will affect its resolution accuracy, and even can't find the distance at all.
 
       The technical solution adopted by ToF requires a very high frequency modulation of light energy. This modulated light can achieve instantaneous energy exceeding sunlight, so the dependence of ToF on sunlight will be greatly reduced.
 
       In terms of power consumption, the three are not very different. ToF is between binocular and structured light. But if you consider the overall 3D vision system solution and processor, ToF is still relatively dominant, because the amount of calculation will be greatly reduced. As we all know, no matter how good the current AI chips are, the power consumption is actually relatively high.
 
       From the perspective of the detection range, if a relatively high accuracy is to be maintained, then the binocular detection range will not be particularly far, because the lateral coordinate difference must reach a level before the relatively distant objects can be detected. If you want to see 100 meters, the camera may have 16 or 20 million pixels.
 
        How far the structured light can see depends mainly on whether its light spot hits the object and whether the energy of that light spot can return to his lens.
 
       And because ToF adopts the method of modulating light, the energy of the light-emitting part can be raised quite high. At this time, the distance can be adjusted according to the scene. For example, there are many lidars that can achieve 100 meters. One indicator that everyone may be relatively concerned about so far is 300 meters, which is suitable for driving on high-speed roads.
 
        3D application scenes like 3D movies are shot with binocular cameras. Scanning like 3D now uses structured light and ToF more. Then, like face recognition, gesture proximity or gesture recognition, ToF applications are also quite extensive. The application of AR and VR will be discussed after the 3D space scene is modeled.
 
 
3D vision is applied in all aspects
 
 
In daily chats, you may feel that the emoji package is not enough to describe the current mood-the picture is not satisfactory. On this page, you can see that you don’t need to look for a smiling face to make a 3D emoticon, just grin it yourself; to measure furniture, you only need to take out your phone and tap two points to measure the length, width, and height of some objects; When playing games, the dynamics of the characters in the specific form can be captured in real time through the camera, and then complete the desired action, which actually has a large number of applications in Kinect.
 
         As future technologies, AR/VR/MR, etc., actually need to clearly measure the 3D structure and 3D scene of the entire space. If the modeling is not done well, it will be confused with the realism brought by the human eye, and cause dizziness.
 
 
Autonomous driving and smart vehicles also require a large number of 3D sensors. The upper left picture is equivalent to HUD detection, and the road surface needs to be modeled, otherwise the arrow will be biased.
 
        The lower left picture shows that when the sound volume is controlled, the finger can be detected when turning left and right. In fact, some people will say that ordinary cameras can also do this. But the ToF sensor can be used to separate its background, at least it can improve the stability and recognition rate of the on-board system, because it provides the third-dimensional data.
 
        In the upper right corner is pedestrian detection and pedestrian recognition. Sensors can be used to control autonomous vehicles not to run into children playing.
 
        In the lower right corner is an example of smart reversing. If there is a 3D ToF to model the entire scene of the reversing rear view, it is relatively easy for the car to fall into a narrow or noisy environment, which is much more efficient than humans.
 
 
There are more and larger applications of 3D vision in the pan-IoT. For example, like the air conditioner directly above, the 3D ToF can measure the light reflectivity of the person's surface by detecting the received energy under the premise of a certain distance and luminous energy. For example, the person in red is hot and sweats If it detects that the light reflectance on his head is relatively strong, the air conditioner will change the direction and blow towards him accordingly. People who are not too hot in the distance will not be disturbed by the cold wind.
 
        Applications such as smart assistants, ARVR-related games, smart retail service robots, smart door locks, and smart dressing mirrors are actually very extensive.
 
        Some system manufacturers will make various application solutions to suit China's vast consumer market. Franklin believes that the pan-IOT field will have a vigorous development in 2019.
 

Of course, the 3D vision system also has a lot of difficulties waiting to be solved one by one. Customers often have a variety of ideas, such as use outdoors, use in high-speed moving objects, use in automatic driving, and so on. In the existing architecture, customers certainly want a low-cost, low-power, high-precision, and small-volume solution, but everyone knows that this is more difficult. Facing the complex needs of customers, it is necessary to have a deep understanding and analysis of customers.
 
        It can be seen that 3D vision products are actually quite comprehensive subjects, including not only electrical, algorithmic, optical, but also some mechanical structures, user psychology, etc., as well as laser safety laws or some regulations. Therefore, it is necessary to systematically do a comprehensive optimization to be able to give a reasonably satisfactory product to the market.
 
 
For the market forecast of 3D imaging&sensing from 2011 to 2023. It can be seen that the rising curve is very obvious, and according to experts' prediction, it is about 44%. Franklin said that this market should be more optimistic: "Because we are really in the early stages of the explosion of 3D applications."