[ad_1]
Computational language expert Nizar Habash discusses the prospects and challenges of building Arabic artificial intelligence systems, and why over-reliance on AI can prove dangerous.
read more…
Prior to joining New York University Abu Dhabi, Mr. Habash was a research fellow at Columbia University’s Center for Computational Learning Systems.
In a world increasingly dependent on artificial intelligence (AI) and vast amounts of digital data, Nizar Habash, a computer scientist specializing in natural language processing and computational linguistics, finds himself at a unique crossroads. The rise of advanced AI systems such as ChatGPT has the potential to completely change the world, and it is important to note that the majority of these platforms operate primarily in English. Other languages, such as Arabic, may face difficulties due to limited online data. With extensive research spanning machine translation, morphological analysis, and computational modeling of Arabic and its dialects, Habash’s research explores the challenges posed by Arabic AI systems, or more simply, “teaching robots Arabic.” and provide insight into opportunities. ” he jokes.
Habash, a computer science professor at New York University Abu Dhabi, points to the urgent need to develop more sophisticated machine learning systems with the ability to handle the cultural nuances embedded in different languages. Masu. “Arabic is one of the most important languages in the world. It ranks among the most widely used languages, both in everyday life and for religious purposes. It is an important language that has preserved it in the past,” says Habash. “When we evaluated the resources available and the AI systems currently in use in Arabic today, we found that they do not match the level of complexity that the language has.”
Habash, who is originally from Palestine, said: “I am a native Arabic speaker and was aware of the complexity of the language from an early age, from the different dialects from different parts of the Arab world to the standards I had to adhere to throughout my education. We have been thinking a lot about how Arabic functions as a means of our identity, knowledge and communication, especially in the age of AI, and we have encountered numerous examples of problems in this regard. Masu.”
data challenges
Could the limitations of online data available for learning Arabic affect the development and performance of AI systems? According to Habash, what is currently driving AI with great success is “simply , the more data the better.” “It’s not the biggest challenge, but for some it may be seen as the only challenge. The problem is that we’ve reached a point where naturally created data no longer exists, and artificial data… “When you start generating and training an AI system on that, it’s like creating a monster,” says Habash, who was previously a researcher at Columbia University. Center for Computational Learning Systems.
Because AI uses feedback loops, there can be “creative” mistakes in the input, he explained. 100x more data means mistakes will be amplified 100x. “If a mistake is repeated enough times, it can become the norm, and that standard can become the operating model,” Habash says. “The model has no concept of reality. It’s just trying to predict the next word, fill in the blanks, or use something called masking techniques to find the next part of the sentence. I’m good at holding and making mistakes.”
When discussing the limitations of online Arabic data collection, Habash highlights the dangers of algorithmic bias and nuances inherent in Arabic scripts, such as the absence of diacritics in common usage. . These complexities pose major challenges for AI systems that strive to accurately understand and process Arabic text. “Arabic in common use is generally written without diacritics to mark vowels. Only one or two of the Arabic words that appear in newspapers actually have vowel markers. percentage, but Arabic readers know how to understand it. However, as a result the word is ambiguous and can have many meanings. Therefore, when we teach it to a machine, Context becomes very important,” he added.
Arabic has many dialects, and where there are dialects, there are also historical variations. “Classical Arabic, the Arabic language of the Quran, has a slightly different spelling than modern Standard Arabic. This is also something that machines are working on. You could confuse them and lump this pile together, which would confuse a lot of things,” Habash said. “There are different complexities. In my opinion, some of the interesting challenges that are yet to be exploited may have to do with algorithmic bias.”
cultural sensitivity and bias
What steps should be taken to ensure that Arabic AI systems are culturally sensitive and avoid bias in interactions? “There are different types of bias. One is content bias. , and the other is grammatical form bias. Both are interrelated,” Habash said. “Content bias relates to the kinds of ideas about the world that a system is likely to generate in its generative models.” AI scientist Toby Walsh previously said, “Language is political.” He said. There is always a bias embedded. ” To some extent, I agree with this. For example, in traditional journalistic reportage, we always see a do-or-die paradigm, where Israelis always seem to be “killed” and Palestinians always “die,” but nothing kills us. Is possible. This type of bias can also occur in Arabic. ”
Citing a recent example of ChatGPT doing the rounds on social media, he added, “Similarly, ChatGPT was asked, ‘Do Palestinians deserve to be free?'” and “Do Israelis deserve to be free?” ” For Israelis, the answer was something related to “Of course Israelis are human beings, and all human beings deserve freedom,” but for Palestinians, the answer was, It was something like, “The question of Palestinian freedom is a complex one.” There are many opinions. ” Prejudice exists everywhere. The AI repeats what it learns,” Habash said.
However, even if algorithmic biases originate from human biases, the feedback loops in which machine learning systems operate can amplify the biases, which can be a cause for concern. “The real challenge is figuring out how to properly model the machine and knowing which elements should be given higher weights and which elements should be given lower percentages,” Habash says.
Potential solutions to combating bias in existing algorithms include adding more data or for researchers to work on identifying content that appears to deviate from a normal distribution, he said. added. “For example, if there are a lot of references to doctors being male and nurses being female, can you actually artificially reduce the weight of a model? You don’t have to change the data. How can we learn from the data? You can change it. If you find a pattern that looks odd, you can work on balancing it out,” Habash says. “This is a really exciting new space because we’re dealing with data and information and we can manipulate it in many different ways.”
The role of language and AI experts
So how can computational language experts like Habash help overcome these challenges to make “better” design choices? “That’s a great question. As an industry. , we focus more on the efficiency, effectiveness and design of our models, creating something simple and easy with a certain ‘Google elegance’,” says Habash. “Google has simplified everything with one simple search box, which is very appealing to people who are already overwhelmed. The amount of data on the web is staggering. I want a short answer.”
In the area of design choices for AI models, Habash advocates for simplicity without sacrificing substance and warns against “deceptive fluency.” “For example, if you talk to an English speaker who has good pronunciation, you will be able to understand him and understand what he is saying. Obviously, they’re smart, and if they’re smart, they’re good, and if they’re good, they’re telling the truth.”
“But if a very smart person who actually knows a lot more has difficulty speaking in English, you might not think the same way, even if they give you a gem of wisdom. “No, it’s the same with machines. Fluency equals intelligence.” “This is not really logically valid. We’re not dealing with something we’ve never dealt with before. , the only thing I can say is that the volume and accessibility are much higher,” he explains.
The danger of relinquishing human agency to AI is enormous. “When we rely too much on AI to make decisions for us and speak for us, we give up something about our humanity, our intelligence, our conscience, and potentially our responsibilities. But it won’t take us very far.” Habash strongly cautioned against blind reliance on AI systems, calling for the use of human judgment, empathy, and ethical responsibility. It emphasizes their irreplaceable role. “That’s why I think it’s so important to continue to educate people.”
[ad_2]
Source link