Facebook has been using AI to audibly describe the contents of photos for the visually impaired for some time, but it’s improving those efforts this year. The latest version of the model promises to provide much more information than before.
In what appears to be part of the benefit of owning both Facebook and Instagram, Facebook’s AI is moving away from what was a heavily supervised AI learning model to a new one. The company says it leveraged a model trained on weakly-supervised data in the form of billions of public Instagram images and hashtags.
To make our models work better for everyone, we fine-tuned them so that data was sampled from images across all geographies, and using translations of hashtags in many languages. We also evaluated our concepts along gender, skin tone, and age axes. The resulting models are both more accurate and culturally and demographically inclusive — for instance, they can identify weddings around the world based (in part) on traditional apparel instead of labeling only photos featuring white wedding dresses.
Facebook says that this new model is more reliably able to recognize more than 1,200 concepts, which is more than 10 times as many as the original version launched in 2016. And while better, Facebook stresses that its new model is not perfect and stresses that it cares more about accuracy and clarity than breadth of understanding. It will still use language such as “may be” and will lean towards excluding objects or concepts it is not confidently able to recognize.
This continued research and deployment of technologies is part of Facebook’s goal of making its platforms “for everyone,” including those who cannot traditionally enjoy photographs. This level of commitment to accessibility to its platform is impressive and may considerably expand the number of people who can enjoy Facebook’s services. The model will be available in 45 languages so that it can be of use to more people around the world.