[ad_1]
Prior to joining New York University Abu Dhabi, Mr. Habash was a research fellow at Columbia University’s Center for Computational Learning Systems.
The rise of advanced artificial intelligence (AI) systems such as ChatGPT has the potential to completely change the world. However, most of these platforms operate primarily in English, with languages such as Arabic facing difficulties due to limited online data. In a world increasingly reliant on AI, Nizar Habash, a computer scientist specializing in natural language processing and computational linguistics, is at a unique crossroads. Through extensive research spanning machine translation, morphological analysis, and computational modeling of Arabic and its dialects, Habash’s research has focused on building Arabic AI systems, or more simply, “teaching robots Arabic.” Provides insight into the challenges and opportunities presented.
“Arabic is one of the most important languages in the world. It ranks among the most widely used languages, both in everyday life and for religious purposes. It has transmitted knowledge over a long period of human history and is essentially It is an important language that has preserved it,” he added.
Habash, a computer science professor at New York University Abu Dhabi, points to the urgent need to develop more sophisticated machine learning systems with the ability to handle the cultural nuances embedded in different languages. Masu. “When we evaluated the resources available and the AI systems currently in use in Arabic today, we found that they do not match the level of complexity that the language has.”
Habash, who is originally from Palestine, said: “I am a native Arabic speaker and was aware of the complexity of the language from an early age, from the different dialects from different parts of the Arab world to the standards I had to adhere to throughout my education. We have been thinking a lot about how Arabic functions as a means of our identity, knowledge and communication, especially in the age of AI, and we have encountered numerous examples of problems in this regard. Masu.”
data challenges
Could the limitations of online data available for learning Arabic affect the development and performance of AI systems? According to Habash, the current thinking with great success in AI is to “simply The more, the better.” “It’s not the biggest challenge, but for some it may be seen as the only challenge. The problem with this idea is that eventually we will reach a point where there will be no more data being naturally created. , the moment you start generating artificial data to train the AI system itself, it becomes like creating a monster,” Habash says. Before he joined NYUAD, he served as a research fellow at Columbia University’s Center for Computational Learning Systems.
AI uses feedback loops, so your input may contain “creative” mistakes. Therefore, 100 times more data is generated, Habash explains, meaning that mistakes are also amplified 100 times. “When a mistake is repeated over and over again, it becomes the standard, and that standard becomes the operating model. The model has no concept of reality. It simply predicts the next word, fills in the blanks, and uses masking techniques. It’s just trying to find the next part of the sentence using something called “AI is good at making mistakes with confidence,” he added.
When discussing the limitations of online Arabic data collection, Habash also highlighted the risk of algorithmic bias and an inability to decipher the grammatical nuances inherent in Arabic script, such as the lack of diacritics. I am. Diacritics, also called “tashkir” or “haraqat” in Arabic, are small symbols placed above and below Arabic letters to indicate vowels, pronunciation, and grammatical structure.
These complexities pose significant challenges for AI systems seeking to accurately understand and process Arabic text. “Arabic in common use is generally written without diacritics to mark vowels. Only one or two of the Arabic words that appear in newspapers actually have vowel markers. percentage, but Arabic readers know how to understand it. It’s an understanding the reader has subconsciously, so we don’t have to think about it. However, as a result the words become vague and often can have a meaning of . Therefore, when we teach machines, context becomes very important,” says Habash.
Another important aspect to keep in mind is that there are many different dialects within Arabic, he added. “Where there are dialects, there are also historical variations. Classical Arabic, Arabic of Arabic.” Koran, which is spelled slightly differently from modern Standard Arabic. This is also what machines are working on.may be confused Koran If you combine modern Standard Arabic with Egyptian dialect and combine this mountain, you’ll confuse a lot of things. ”
“There are different complexities. In my opinion, some of the interesting challenges that are yet to be exploited may have to do with algorithmic bias,” Habash says.
cultural sensitivity and bias
When it comes to advanced AI systems such as ChatGPT, there are different types of biases to keep in mind. “One is content bias and the other is grammatical form bias, but both are interconnected,” Habash explains. “Content bias relates to the kinds of ideas about the world that a system is likely to generate in its generative models. As AI scientist Toby Walsh has previously said, “Language is political. It’s embedded.” I also agree with this to some extent. For example, in traditional journalistic reporting, we always see paradigms like Israelis die. ”
Citing a recent example of ChatGPT doing the rounds on social media, he added, “Similarly, ChatGPT was asked, ‘Do Palestinians deserve to be free?'” and “Do Israelis deserve to be free?” ” For Israelis, the answer was something related to “Of course Israelis are human beings, and all human beings deserve freedom,” but for Palestinians, the answer was, It was something like, “The question of Palestinian freedom is a complex one.” There are many opinions. ” Prejudice exists everywhere. The AI repeats what it learns,” Habash says.
What steps can we take to ensure that Arabic AI systems are culturally sensitive and avoid bias in their interactions? “The real challenge is to model the machines properly. It’s about figuring out how to do it and knowing which elements to give higher or lower weight to,” says Habash.
“One solution is to add more data to the training system to get better results. But this comes with its own challenges. Another solution, a more promising one. This means that researchers need to work on identifying content that appears to be moving away from its original purpose.’A normal distribution is as expected,’ he added. “For example, if there are a lot of references to doctors being male and nurses being female, can you actually artificially reduce the weight of a model? You don’t have to change the data. How can we learn from the data? You can change it. If you find a pattern that looks odd, you can work on finding balance.”
“This is a really exciting new space because we are working with data and information and can manipulate it in many different ways,” Habash says.
The role of language and AI experts
How can computational language experts like Habash help overcome these challenges and make “better” design choices to ensure cultural sensitivity in AI systems? ?
“That’s a great question. As an industry, we focus more on efficiency, effectiveness, and design of our models, creating things that are simple and easy, like ‘Google Elegance.’ Google has simplified everything with one simple search box. This is very appealing to people who are already overwhelmed. The amount of data on the web is staggering. Everyone wants short answers,” Habash replies.
In the field of design choices for AI models, computer scientists advocate simplicity without sacrificing substance and warn against “deceptive fluency.” “For example, if you talk to an English speaker who has good pronunciation, there is an underlying assumption that this person sounds pleasing to the ear and that you can understand him or her. Obviously, they are smart, they are If they’re smart, they’re good, if they’re good, they’re telling the truth. But if they’re a very smart person who actually knows a lot more but has difficulty speaking in English, they’re saying the same thing. “You might not think so,” he added.
“It’s the same thing with machines. Fluency equals intelligence equals truth, but this is not logically valid. Therefore, we are not dealing with something we have never dealt with before, but with something unique. The thing is, the amount and accessibility of advanced AI systems is “much higher,” Habash says.
Therefore, the dangers of relinquishing human agency to AI are very high: “If we rely too much on AI to make decisions for us and speak for us, we will lose our humanity.” “The responsibility is not too much for us,” Habash said, adding that human judgment, empathy, and ethical responsibility are irreplaceable. Emphasized no role. “That’s why I think it’s so important to continue to educate people.”
somya@khaleejtimes.com
[ad_2]
Source link