I've been beta-testing the Tesla full self-driving for a few months now. It's been intriguing watching how a computer figures out how to drive a car.
It's way more complicated than I had thought to get a computer to decide if a moving object is a potential threat, and the "release notes" accompanying this week's update give a lot more insight than previous updates about the complexity of problems they are trying to solve.
When new issues are identified during the software development, such as recognizing stop signs in all their myriad variations, developers can "query the fleet" to pull thousands of real-world recordings from their vehicles actually out on the roads to help train and improve the system. This one-minute video illustrates this well.
The car drives like a nervous teenager, faltering at crosswalks and left turns and making sudden jarring adjustments mid-turn, although this has improved with the latest update.
Even after several years of various software updates, including a total rewrite from the ground up two years ago, there is still a lot of smoothing to be done. (The rewrite 2 years ago involved looking at nearby object's movement in sequential images from multiple cameras to predict their movement, rather than making decisions based on the instantaneous still image, so called 4-D (4th dimension) because it incorporates time into the prediction.)
The latest software update, which is generating more than the usual amount of online reaction, has been another big step forward. The release notes (below) are interesting to read because of the intensely abstruse jargon they use - so much so that it's like reading another language.
I've appended the software release notes below, and I hope you enjoy reading them as much as I did. [I've added my comments in square brackets after the paragraphs.]
Also, in the release notes they mention a "Chuck Cook" turn, named after a YouTuber who critiques new releases, which I had to look up:
Chuck Cook style unprotected left turn" is a wide highway intersection with multiple lanes. When making this left turn, the vehicle needs to make a left turn navigating across traffic going as fast as 60mph
https://www.torquenews.com/11826/elon-musk-praises-tesla-s-ai-team-after-solving-chuck-cook-style-unprotected-left-turn-fsd-beta-1069/amp
FSD Beta v10.69.2 Release Notes
- Added a new "deep lane guidance" module to the Vector Lanes neural network which fuses features extracted from the video streams with coarse map data, i.e. lane counts and lane connectivities. This architecture achieves a 44% lower error rate on lane topology compared to the previous model, enabling smoother control before lanes and their connectivities becomes visually apparent. This provides a way to make every Autopilot drive as good as someone driving their own commute, yet in a sufficiently general way that adapts for road changes.
[This is quite a break from tradition, as Tesla has been proud of the fact that their vehicles see each intersection "for the first time, every time," rather than relying on a massive database of road layouts (as the Google competitor does) since this makes the Tesla much better able to adapt to unexpected changes such as construction. This has greatly improved the visualization graphics on the dashboard]
- Improved overall driving smoothness, without sacrificing latency, through better modeling of system and actuation latency in trajectory planning. Trajectory planner now independently accounts for latency from steering commands to actual steering actuation, as well as acceleration and brake commands to actuation. This results in a trajectory that is a more accurate model of how the vehicle would drive. This allows better downstream controller tracking and smoothness while also allowing a more accurate response during harsh maneuvers.
[That is, the computer expects an immediate response to a steering correction, and gets confused when the steering doesn't respond immediately]
- Improved unprotected left turns with more appropriate speed profile when approaching and exiting median crossover regions, in the presence of high speed cross traffic ("Chuck Cook style" unprotected left turns). This was done by allowing optimisable initial jerk, to mimic the harsh pedal press by a human, when required to go in front of high speed objects. Also improved lateral profile approaching such safety regions to allow for better pose that aligns well for exiting the region. Finally, improved interaction with objects that are entering or waiting inside the median crossover region with better modeling of their future intent.
[It felt very unsafe crossing a multi-lane fast highway slowly, feeling like a sitting duck, especially if there was hesitation because another car was in the median turning lane already.]
- Added control for arbitrary low-speed moving volumes from Occupancy Network. This also enables finer control for more precise object shapes that cannot be easily represented by a cuboid primitive. This required predicting velocity at every 3D voxel. We may now control for slow-moving UFOs.
- Upgraded Occupancy Network to use video instead of images from single time step. This temporal context allows the network to be robust to temporary occlusions and enables prediction of occupancy flow. Also, improved ground truth with semantics-driven outlier rejection, hard example mining, and increasing the dataset size by 2.4x.
[when a vehicle passed behind another vehicle or object, very strange jittery distortions of the vehicle flickered on the screen, showing how much difficulty the computer had in recognizing where the vehicle went when hidden from view temporarily. This has improved the visualization on the dashboard a lot]
- Upgraded to a new two-stage architecture to produce object kinematics (e.g. velocity, acceleration, yaw rate) where network compute is allocated O(objects) instead of O(space). This improved velocity estimates for far away crossing vehicles by 20%, while using one tenth of the compute.
[I think this is because calculating the movement of an object that is assigned 0 space gives infinite acceleration because of dividing by zero, which throws off the computer]
- Increased smoothness for protected right turns by improving the association of traffic lights with slip lanes vs yield signs with slip lanes. This reduces false slowdowns when there are no relevant objects present and also improves yielding position when they are present.
- Reduced false slowdowns near crosswalks. This was done with improved understanding of pedestrian and bicyclist intent based on their motion.
[Unfortunately, there was a bug found at the last minute that has resulted in the current software release being overly cautious around crosswalks to an unnerving extent. I'm sure they'll release an update patch for this soon because it wasn't a problem in previous versions.]
- Improved geometry error of ego-relevant lanes by 34% and crossing lanes by 21% with a full Vector Lanes neural network update. Information bottlenecks in the network architecture were eliminated by increasing the size of the per-camera feature extractors, video modules, internals of the autoregressive decoder, and by adding a hard attention mechanism which greatly improved the fine position of lanes.
[I think this means that more processing power was focused on the lane directly ahead of the vehicle rather than on adjacent lanes, but I'm not sure because of all the jargon.]
- Made speed profile more comfortable when creeping for visibility, to allow for smoother stops when protecting for potentially occluded objects.
- Improved recall of animals by 34% by doubling the size of the auto-labeled training set.
- Enabled creeping for visibility at any intersection where objects might cross ego's path, regardless of presence of traffic controls.
[At a right turn on a red light, one still has to creep forward across the crosswalk to be able to see if any cars are coming. The way in which the Tesla creeps forward feels surprisingly natural]
- Improved accuracy of stopping position in critical scenarios with crossing objects, by allowing dynamic resolution in trajectory optimization to focus more on areas where finer control is essential.
[I think this means that the computer is focusing more attention on whether a vehicle is oblivious or actually responding to an impending collision]
- Increased recall of forking lanes by 36% by having topological tokens participate in the attention operations of the autoregressive decoder and by increasing the loss applied to fork tokens during training.
- Improved velocity error for pedestrians and bicyclists by 17%, especially when ego is making a turn, by improving the onboard trajectory estimation used as input to the neural network.
- Improved recall of object detection, eliminating 26% of missing detections for far away crossing vehicles by tuning the loss function used during training and improving label quality.
- Improved object future path prediction in scenarios with high yaw rate by incorporating yaw rate and lateral motion into the likelihood estimation. This helps with objects turning into or away from ego's lane, especially in intersections or cut-in scenarios.
[In other words, distinguishing whether the car turning onto the road ahead of you is coming across 3 lanes or staying in the curb lane. Or knowing whether it's safe to accelerate because the slow truck ahead of you is actually taking the exit lane or is continuing to block your lane. Previous versions have slowed way too much as the vehicle ahead takes an exit lane, being unsure that they have totally left the lane you're in.]
- Improved speed when entering highway by better handling of upcoming map speed changes, which increases the confidence of merging onto the highway.
- Reduced latency when starting from a stop by accounting for lead vehicle jerk.
[When the light turns green, people will start moving, stop, then go again. In previous versions, this would make the Tesla seem asleep at the wheel at a green light, waiting for traffic ahead to move.]
- Enabled faster identification of red light runners by evaluating their current kinematic state against their expected braking profile.
[I think the human brain is very good at deciphering whether a crossing vehicle is going to stop abruptly or run a red light by recognizing a pattern of acceleration vs braking as they approach the stop line, but getting a computer to predict this is really difficult.]
No comments:
Post a Comment