Stanford University: Artificial Intelligence Index Report 20

Human-Centred Artificial Intelligence

Original Report

Listen To Report Notes

Key Findings & Analysis

Introduction to the AI Index Report 2023

The AI Index Report 2023 is the sixth edition of its kind and contains more original data than any previous version.

The report's mission is to provide unbiased, rigorously vetted, broadly sourced data to help develop a more thorough and nuanced understanding of the complex field of AI.

The report includes a new chapter on AI public opinion, technical performance, original analysis about large language and multimodal models, global AI legislation records, and a study of the environmental impact of AI systems.

The report is designed to be the world's most credible and authoritative source for data and insights about AI.

The report notes that AI has moved into its era of deployment, with new large-scale AI models releasing every month, demonstrating increased capabilities in text manipulation and analysis, image generation, and speech recognition.

However, these AI systems are prone to hallucination, routinely biased, and can be tricked into serving nefarious aims, highlighting the complicated ethical challenges associated with their deployment.

The AI Index Report 2023: What it Reveals About AI's Present and Future

Policymakers, industry leaders, researchers, and the public all have an interest in AI despite a decrease in private AI investment in 2022.

Publications and collaborations related to AI continue to increase, indicating a growing trend in the field.

Industry actors have surpassed academia in producing machine learning models, requiring more data, computer power, and money.

AI systems can cause serious environmental impacts, but they can also optimize energy usage and accelerate scientific progress.

The number of incidents related to ethical misuse of AI has risen significantly since 2012, evidencing the technology's potential for misuse.

The demand for AI-related professional skills is increasing across multiple industrial sectors in America.

AI Index Report 2023 Highlights

Across sectors in the US, AI-related job postings have grown from 1.7% in 2021 to 1.9% in 2022.

In 2022, global private investment in AI decreased by 26.7% from the previous year.

McKinsey's survey shows that over 50% of companies have adopted AI, leading to cost savings and revenue generation.

The number of bills and parliamentary records mentioning AI in legislative proceedings has increased globally.

Chinese and Saudi Arabian respondents had the most positive view of AI products, while only 35% of sampled Americans agreed.

The report was led and contributed to by professionals from various reputable institutions.

The AI Index 2023 Annual Report by Stanford University

The Al Index 2023 Annual Report by Stanford University is licensed under Attribution-NoDerivatives 4.0 International.

Raw data and high-resolution images of all the charts in the report are available on Google Drive.

The Al Index is an independent initiative at the Stanford Institute for Human-Centered Artificial Intelligence (HAI).

The Global Al Vibrancy tool can compare up to 30 countries across 21 indicators and be updated in the latter half of 2023.

The Al Index was conceived within the One Hundred Year Study on Al (Al100).

The report has supporting partners such as McKinsey, OpenAI, NSF & Company, and CRA Computing Research Association, among others.

The contributors of data, analysis, advice, and expert commentary included in the Al Index 2023 Report are acknowledged in the report.

AI Index Report 2023 Notes

The AI Index Report 2023 thanks organizations and individuals for providing data for inclusion, including, Center for Security and Emerging Technology, Computing Research Association, and Women in Machine Learning, among others.

The report is divided into chapters focusing on research and development, technical performance, AI ethics, the economy, education, policy and governance, diversity, and public opinion.

The number of AI research collaborations between the United States and China increased roughly 4 times since 2010, although the total number of U.S.-China collaborations only increased by 2.1% from 2020 to 2021, the smallest year-over-year growth rate since 2010.

The total number of AI publications has more than doubled since 2010, with China leading in total AI journal, conference, and repository publications.

The United States still leads in terms of AI conference and repository citations, but those leads are slowly eroding.

Specific AI topics that continue to dominate research include pattern recognition, machine learning, and computer vision.

The Current State and Trends of Al Systems

American institutions produce the majority of large language and multimodal models.

Industry has replaced academia in producing significant machine learning models.

Building Al systems requires large amounts of data, computer power, and money, which industry actors have more compared to nonprofits and academia.

Large language models are getting more massive and expensive.

Al continues to post state-of-the-art results, but year-over-year improvement on many benchmarks is increasingly marginal.

Al systems become more flexible and capable of navigating multiple tasks.

Language models still struggle with complex reasoning tasks.

Al systems can have environmental impacts, but newer reinforcement learning models can optimize energy usage.

Al models have significantly contributed to scientific progress.

Self-improving Al learning will accelerate Al progress.

LAI Artificial Intelligence AI Index Report 2023 Chapters 3 and 4

Large models trained on proprietary data can be toxic and biased, but mitigation has been somewhat successful after training such models with instruction-tuning.

Generative models pose ethical problems, such as bias in text-to-image generators and nefarious intent in chatbots like ChatGPT.

The number of incidents of ethical misuse of AI has increased 26 times since 2012, with notable incidents in 2022 including a deepfake video of Ukrainian President and call-monitoring technology in US prisons.

Performance and fairness don't always correlate, as language models that perform better on fairness benchmarks tend to have worse gender bias. Interest in AI ethics continues to increase.

The demand for AI-related skills is increasing across American industrial sectors, with the exception of agriculture, forestry, fishing, and hunting.

Private investment in AI decreased 26.7% since 2021, but overall investment has significantly increased over the last decade. The United States leads in investment.

Al Investments and Adoption Trends in 2022

In 2022, U.S. led the world in total Al private investment with $47.4 billion, followed by China with $13.4 billion.

The U.S. also saw the highest number of newly funded Al companies, followed by EU and UK combined, and then China.

Medical and healthcare had the highest Al investment, followed by data management, processing, and Fintech.

The largest private Al investment events were for GAC Aion, Anduril Industries, and Celonis.

McKinsey's research shows that the proportion of companies adopting Al has plateaued, but those who have are reaping cost and revenue benefits.

Most embedded Al capabilities in businesses include RPA, computer vision, NL text understanding, and virtual agents.

Most commonly adopted Al use cases in 2022 were service operations optimization, creation of new Al-based products, and customer segmentation.

Copilot, a text-to-code Al system, is helping workers be more productive, focused, and efficient.

China leads the world in industrial robot installations, accounting for more than the rest of the world combined in 2021.

The proportion of new U.S. computer science PhD graduates specializing in Al has been rising since 2010, with most heading to industry.

Trends in Al and computer science education

In 2011, Al PhD graduates were almost equally likely to take jobs in industry or academia, but since then, there has been a shift towards industry with 65.4% of Al PhDs taking jobs in that sector in 2021.

New North American CS, CE, and information faculty hires stayed flat, and the total number of new tenure-track hires decreased in the last decade.

Private U.S. CS departments receive millions more in additional funding than public universities, with the median expenditure for private universities at $9.7 million compared to $5.7 million for public universities in 2021.

Interest in K-12 Al and computer science education is growing worldwide, with a total of 181,040 AP computer science exams taken by American students in 2021, and 11 countries implementing a K-12 Al curriculum.

Policymaker interest in Al is increasing, with the number of bills containing “artificial intelligence" that were passed into law growing from just 1 in 2016 to 37 in 2022.

The U.S. government is increasing spending on Al, and legislators in various countries discuss Al from different perspectives, such as risks of Al-led automation, safeguarding human rights, and using Al for weather forecasting.

AI Index Report 2023 Notes

U.S. government Al-related contract spending has increased roughly 2.5 times since 2017.

In 2022, there were 110 Al-related legal cases in United States state and federal courts, roughly seven times more than in 2016.

North American bachelor's, master's, and PhD-level computer science students are becoming more ethnically diverse.

In 2021, 78.7% of new Al PhDs were male, with only a 3.2 percentage point increase in female representation since 2011.

Women make up an increasingly greater share of CS, CE, and information faculty hires, but most faculty members are still male.

American K-12 computer science education has become more diverse, in terms of both gender and ethnicity.

Chinese citizens feel the most positively about AI products and services, with Americans being among the lowest in surveyed countries.

Men tend to feel more positively about AI products and services than women, according to both the 2022 IPSOS survey and the 2021 Gallup and Lloyd's Register Foundation survey.

The Global Opinion on Self-Driving Cars and NLP Researchers' Views on AI

Only 27% of the respondents from a global survey feel safe in self-driving cars, while only 26% of Americans think driverless cars are beneficial for society.

Excitement about AI stems from its potential to make life and society better and save time and make things more efficient, while concerns include loss of human jobs, surveillance, hacking, and digital privacy, and the lack of human connection.

NLP researchers revealed their opinions on AI, with 77% believing that private AI firms have too much influence, 41% thinking that NLP should be regulated, and 73% feeling that AI could soon lead to revolutionary societal change.

Research and development in AI is highlighted in the Artificial Intelligence Index Report 2023, citing an overview of total Al publications, collaborations, and publishing institutions.

Trends in Artificial Intelligence Research and Development

The chapter captures trends in AI R&D, including publications, significant machine learning systems, conference attendance, and open-source research.

The United States and China dominate AI R&D collaborations, but efforts are becoming geographically dispersed globally.

The total number of AI publications has more than doubled since 2010, and topics like pattern recognition, machine learning, and computer vision continue to dominate research.

Industry surpasses academia for producing significant machine learning models, with 32 industry-produced models compared to three from academia in 2022.

Large language and multimodal models are becoming larger and more expensive, with the flagship PaLM model having 540 billion parameters and costing $8 million USD to train.

Al and ML Publications Overview

The Center for Security and Emerging Technology (CSET) at Georgetown University compiled data from scholarly literature sources, including Dimensions, Web of Science, Microsoft Academic Graph, arXiv, and Papers With Code, to identify English-language publications on Al and ML since 2010. CSET also used Chinese Al keywords to identify Chinese-language Al papers for this year's report.

The report examines Al publication trends from 2010 to 2021 in terms of type, affiliation, cross-country collaboration, and cross-industry collaboration. Previous iterations of the Al Index report reported up to the year 2021, but this year's report examines trends only through 2021 to capture the previous year's publications more fully.

The total number of Al publications more than doubled from 2010 to 2021, growing from 200,000 to almost 500,000. Journal articles made up 60% of all published Al documents in 2021, followed by conference papers (17%), and repository submissions (13%). Book chapters, theses, and unknown document types made up the remaining 10% of publications.

Analysis of the Growth of Al Publications in Different Fields and Sectors, and Cross-Country Collaborations

Figure 1.1.2 shows that pattern recognition and machine learning have experienced significant growth in Al publications since 2015, with the number of pattern recognition papers doubling and machine learning papers quadrupling.

In addition to pattern recognition and machine learning, computer vision, algorithm, and data mining were the most-published Al fields in 2021.

The education sector dominates Al publications affiliated with education, government, industry, nonprofit, and other sectors globally (Figure 1.1.4). The level of industry participation is highest in the United States and the European Union.

The share of education Al publications has been declining in each region since 2010.

Cross-border collaborations among academics, researchers, industry experts, and others are a critical component of modern STEM (science, technology, engineering, and mathematics) research.

Overview of Cross-Country and Cross-Sector Collaborations in Artificial Intelligence Research

Figures 1.1.6 and 1.1.7 in the AI Index Report show the top cross-country collaborations in artificial intelligence (AI) research between 2010 and 2021, with the United States and China having the greatest number of collaborations.

The increase in AI research outside academia has resulted in growth in cross-sector collaborations.

Figure 1.1.8 reflects the cross-sector collaborations in AI research from 2010 to 2021. The educational institutions and nonprofit sectors had the largest number of collaborations (32,551), followed by industry and educational institutions, then educational and government institutions.

Collaborations between educational institutions and industry have been among the fastest-growing collaborations in AI research, with a 4.2 times increase since 2010.

Al Journal Publications Analysis

The number of Al journal publications has increased 2.3 times since 2015, with a 14.8% increase from 2020 to 2021.

East Asia and the Pacific lead in Al journal publications by region with 47.1%, followed by Europe and Central Asia (17.2%) and then North America (11.6%). The share of publications from these regions has been declining since 2019, with an increase in publications from regions such as South Asia and the Middle East and North Africa.

China remains the leader in Al journal publications by geographic area with 39.8% in 2021, followed by the European Union and the United Kingdom (15.1%) and the United States (10.0%). The share of Indian publications has been steadily increasing from 1.3% in 2010 to 5.6% in 2021.

Trends in AI Research Publications by Geographic Area

According to the 2023 AI Index Report, China's share of citations in AI journal publications has been increasing since 2010, while those of the European Union and the United States have been decreasing.

China, the European Union and the United Kingdom, and the United States had the highest total citations in the world, accounting for 65.7% of the total.

East Asia and the Pacific, Europe and Central Asia, and North America accounted for the highest numbers of AI conference publications.

South Asia witnessed a remarkable rise in AI conference publications, increasing from 3.6% in 2010 to 8.5% in 2021.

In 2021, China produced the highest share of the world's AI conference publications at 26.2%.

AI Research and Development Worldwide

In 2017, China surpassed the European Union and the United Kingdom in terms of AI conference publications, which amounted to 26.84% of the world total. Meanwhile, the European Union plus the United Kingdom followed at 20.3%, and the United States came in third at 17.2%.

India's share of AI conference publications is also increasing.

Although China produced the most AI conference publications in 2021, the United States had the greatest share of AI conference citations (23.9%). The gap between American and Chinese AI conference citations, however, is narrowing.

Publishing pre-peer-reviewed papers on repositories of electronic preprints has become a popular way for AI researchers to disseminate their work outside traditional avenues for publication. The number of AI repository publications grew almost 27 times over the past 12 years.

North America has maintained a steady lead in the world share of AI repository publications since 2016, while the share of repository publications from Europe and Central Asia has declined since 2011.

Al Repository Publications in Different Regions

The share of Al Repository Publications by East Asia and Pacific has grown significantly since 2010, and it continued increasing from 2020 to 2021.

During this period, the year-over-year share of North American, as well as European and Central Asian repository publications, declined.

In 2021, the United States had 23.5% of the world's Al repository publications, followed by the European Union plus the United Kingdom (20.5%), and China (11.9%).

In citations of Al repository publications, the United States topped the list in 2021 with 29.2% overall citations, followed by the European Union plus the United Kingdom (21.5%), and then China (21.0%).

Since 2010, the Chinese Academy of Sciences has produced the largest number of total Al papers, followed by Tsinghua University, the University of the Chinese Academy of Sciences, Shanghai Jiao Tong University, and Zhejiang University.

Top publishing institutions in AI research

Figure 1.1.22 displays the total number of publications released by top institutions in 2021.

Chinese research institutions have a higher publication count as they are large centralized organizations with thousands of researchers.

Top 10 AI publishing institutions in 2021 for computer vision are all Chinese, led by the Chinese Academy of Sciences with 562 publications (Figure 1.1.23).

In the field of natural language processing, American institutions like Carnegie Mellon University are represented to a greater extent. However, the Chinese Academy of Sciences remains the world's leading institution with 182 publications (Figure 1.1.24).

The data is sourced from the Center for Security and Emerging Technology and is found in the AI Index Report.

Amazon and Microsoft were among the top publishing institutions with the highest number of Al publications in 2021.

Chinese Academy of Sciences had the greatest number of speech recognition papers in 2021.

Epoch Al curates a database of significant Al and machine learning systems released since the 1950s.

The latter half of the chapter reports trends in large language and multimodal models.

Language was the most common class of system among the significant Al machine learning systems released in 2022.

Analysis of Significant Machine Learning Systems in 2022

23 significant Al language systems were released in 2022, which is six times the number of multimodal systems.

According to Epoch, there were 38 total significant Al machine learning systems released in 2022; however, BaGuaLu, one of the systems, was omitted from Figure 1.2.1 due to the lack of a domain classification.

Industry has produced the greatest number of significant machine learning systems since 2014, as opposed to academia which dominated until then. In 2022, industry produced 32 systems, while academia only produced three.

The United States produced the greatest number of significant machine learning systems with 16, followed by the United Kingdom (8) and China (3) in 2022.

Figure 1.2.4 depicts the United States outpacing the United Kingdom, the European Union, and China since 2002 in terms of the number of significant machine learning systems produced.

The total number of significant machine learning systems produced by country for the entire world since 2002 is shown in Figure 1.2.5.

Number of Significant Machine Learning Systems by Geographic Area and Authorship

The Al Index Report for 2023 displays a chart outlining the number of significant machine learning systems by select geographic area from 2002-2022.

The chart shows that in 2002, there were about 7 significant machine learning systems in the United States, while in 2022, the number rose to 16 in the US.

The methodology for author affiliation to countries is explained, with a note that systems might experience double-counting if authors from various nationalities worked on the same project.

An additional chart identifies the top countries contributing authors to significant machine learning systems. In 2022, the US had the highest number of authors at 285, followed by the European Union and the UK with 155, while China had only 49 authors.

The report explains the significance of parameters in determining the performance of a machine learning system.

Trends in Machine Learning Systems Parameters and Compute

Figure 1.2.9 displays the number of parameters of significant machine learning systems by sector, revealing a steady increase in parameters over time, with a sharp increase since the early 2010s, reflecting increased task complexity and data availability.

Figure 1.2.10 shows the rise of parameter-rich systems by domain, with a significant increase in language models since 2010.

The amount of compute used by machine learning systems has increased exponentially in the last half-decade, with more compute-intensive models having greater environmental impacts and industrial players more accessible to computational resources.

Language models have been demanding the most computational resources in recent years, especially since 2010.

Large language and multimodal models in AI

Large language and multimodal models are an emerging type of Al model that can be trained on huge amounts of data and are adaptable to different applications.

Notable models include ChatGPT, DALL-E 2, and Make-A-Video, which demonstrate impressive capabilities.

Research shows that the majority of authors responsible for releasing new large language and multimodal models are from American institutions.

However, in 2022, researchers from Canada, Germany, and India contributed to this development for the first time.

Released large language and multimodal models since GPT-2 are listed, with corresponding national affiliations of the researchers responsible for producing them.

BLOOM was listed as indeterminate because it was the result of a collaboration of international researchers, while GLM-130B was the only Chinese model released.

Selected Al models for this report were hand-selected by the Al Index steering committee.

Trends in Significant Machine Learning Systems According to LAI Artificial Intelligence AI Index Report 2023

The parameter count of newly released large language and multimodal models has massively increased over time, as stated by the report.

GPT-2, the first large language and multimodal model released in 2019, only had 1.5 billion parameters.

In 2022, Google's PaLM had 540 billion parameters, nearly 360 times more than GPT-2.

The median number of parameters in large language and multimodal models is increasing exponentially over time.

The training compute of large language and multimodal models has also steadily increased. For example, the training compute for Minerva (540B) was roughly nine times greater than that used for OpenAl's GPT-3.

The compute used for Minerva was roughly 1839 times greater than that used for GPT-2, released in February 2019.

Analysis of Training Costs for Large Language and Multimodal Models

The discourse surrounding large language and multimodal models highlights concerns over their costly nature.

While Al companies remain mum on training costs, it is speculated that such models cost millions of dollars to create and grow increasingly expensive with their size.

The Al Index research team generated estimates for various models' training costs through analyzing their hardware and training time, or from hardware speed, training compute, and hardware utilization efficiency.

The estimates are qualified with a tag of mid, high, or low and reveal that models like Chinchilla by DeepMind and BLOOM cost $2.1 million and $2.3 million, respectively, validating popular claims.

A clear correlation emerges between model size, training compute, and costs, as shown in figures 1.2.18 and 1.2.19.

Trends in Al Conference Attendance and Estimated Training Costs of Large Language and Multimodal Models

Al conferences have become popular venues for researchers to present their work and connect with peers and potential collaborators. (Chapter 1, Introduction)

The attendance at Al conferences has grown in size, number, and prestige over the past two decades. (Chapter 1, Section 1.2)

The total attendance at select Al conferences dipped in 2021 and 2022 due to a return to hybrid or in-person formats after being virtual in previous years. (Chapter 1, Section 1.3)

Neural Information Processing Systems (NeurIPS) is one of the most attended conferences, with around 15,530 attendees. (Chapter 1, Section 1.3)

The training cost of large language and multimodal models can range from $100k to over $10M. (Chapter 1, Figure 1.2.19)

The International Conference on Robotics and Automation (ICRA) had the greatest one-year increase in attendance, from 1,000 in 2021 to 8,008 in 2022. (Chapter 1, Section 1.3)

Trends in Open-Source Al Software Projects on GitHub

GitHub is a web-based platform where individuals and coding teams can host, review, and collaborate on various code repositories.

Since 2011, the total number of Al-related GitHub projects has steadily increased, growing from 1,536 in 2011 to 347,934 in 2022.

As of 2022, a large proportion of GitHub Al projects were contributed by software developers in India (24.2%), followed by the European Union and the United Kingdom (17.3%).

The share of American GitHub Al projects has been declining steadily since 2016.

Some of the most starred GitHub repositories include libraries like TensorFlow, OpenCV, Keras, and PyTorch, which are widely used by software developers in the Al coding community.

Technical Performance Trends in AI

As of 2022, US-based GitHub Al projects received the most stars, followed by the EU and UK, then China.

Total new GitHub stars have recently stopped increasing in many areas.

Chapter 1 of the 2019 Research and Development report lists open-source Al software.

In 2021, US projects received 3.44 stars, Rest of the World received 2.69, EU and UK 2.34, and China 1.53.

Chapter 2 of the 2023 LAI Artificial Intelligence AI Index Report focuses on technical performance in AI.

The chapter covers computer vision, including image and video generation and recognition, natural language understanding, and speech recognition.

Narrative highlights in Chapter 2 include progress in image generation, the rise of multimodal reasoning systems, improvements in language models, and developments in speech recognition.

Technical Progress in AI as Per the AI Index Report 2023

The AI Index Report 2023 provides technical analysis about the progress made in AI during 2022.

The report covers various aspects of AI such as computer vision, language, speech, reinforcement learning, and hardware.

It includes an analysis of the environmental impact of AI and how AI has contributed to scientific progress.

The report also provides a timeline-style overview of significant recent developments in AI.

The report highlights that while AI is posting state-of-the-art results, the year-over-year improvement on many benchmarks continues to be marginal.

The report also notes how new Al systems such as BEIT-3, PaLl, and Gato are increasingly capable of navigating multiple tasks.

The report suggests that AI systems can have serious environmental impacts, but AI can also optimize energy usage.

The report also covers developments in generative AI, language models, and self-improving Al.

Overview of Significant Technical Developments in AI in 2022

On Feb. 2, 2022, the technical performance chapter discusses a timeline of new developments in AI for the year.

On Feb. 16, 2022, Artificial Intelligence is introduced as a topic in the chapter.

On Mar. 10, 2022, DeepMind's AlphaCode system achieves a competitive level in writing computer programs, and their reinforcement learning agent controls nuclear fusion plasma in a tokamak simulation.

The chapter addresses various language and task generation systems for Indic languages.

On Mar. 24, Apr. 5, Apr. 13, and May 12, 2022, Meta Al, Google, and DeepMind release text-to-image and language models, such as Make-A-Scene, PaLM, Gato, and DALL-E 2.

The technical performance chapter concludes with information on cross-pod transfer gradients.

Timeline of Various Text Descriptions

An astronaut playing basketball with cats in space

Teddy bears

A bowl of soup

Datacenter Network

A man standing next to a woman in a room

A grey and white cat sits near a laptop

TE riding a horse

Lounging in a tropical resort

"A laughing purple porcupine"

A painting of a blue elephant in a photorealistic style

In the style of Andy Warhol as a pencil drawing

Gato, a cat sitting next to a brick wall and green grass as the sun sets

A chipmunk baking cookies

A teddy bear with a blue scarf and eyes tilted to its left

Google releases Imagen, capable of producing photorealistic images

Auto debugging


BIG-bench benchmark released to better challenge large language models

GitHub makes Copilot available as a subscription-based service for individual developers

TS sentiments.ts and o_write_sql.go for data processing

A giant cobra snake made out of corn on a farm

Latest developments in AI technology

Nvidia uses reinforcement learning to enhance performance of its GPU chips.

Meta introduces No Language Left Behind, a translation model that can efficiently translate across 200 different languages.

Chinese researchers from Tsinghua University launch GLM-130B, a language model that outperforms other popular models.

OpenAI launches Whisper, a speech recognition system that achieves excellent performance without unsupervised or supervised pre-training.

Make-A-Video by Meta allows users to create videos from short text descriptions with high quality.

DeepMind launches AlphaTensor, an AI reinforcement-learning-based system that can efficiently create new algorithms for matrix manipulation.

Stable Diffusion, an open-source text-to-image diffusion-based model, enables users to generate images with freely available model weights.

Chapter 2 of the AI Index Report 2023 provides a timeline of technical developments in the AI industry.

Advances in Artificial Intelligence - 2022-2023

Google uses PaLM to improve its reasoning

BLOOM language model released by international research group

Technical performance advancements in AI

Stanford researchers release HELM for benchmarking new language models

CICERO, an Al, plays in top 10% of Diplomacy game

OpenAI launches ChatGPT with 100 million monthly users

AI Index Report 2023 released, covering advancements and metrics

Various language models trained with different sets of questions and mixed formats

Table of Contents with scenarios, previous work, and metrics

Multiple decoding paths and pre-generated synthetic demos for multitask training

Majority voting by answer and bias, toxicity, efficiency, and fairness metrics visualized for performance evaluation.

ChatGPT and its significance in the field of AI

ChatGPT is a highly advanced Al language model developed by OpenAI that can generate human-like text responses to questions and prompts.

Its large-scale training on diverse text data and cutting-edge deep learning architecture make it capable of generating informative and coherent responses to a wide range of topics, making it useful for various NLP applications such as chatbots, content generation, and language translation.

Its open-source availability allows for further research and development in the field of Al language processing.

Computer vision is the subfield of Al that teaches machines to understand images and videos, and has a variety of real-world applications such as autonomous driving, crowd surveillance, sports analytics and video-game creation.

Image classification is the ability of machines to categorize objects in images, and ImageNet is one of the most widely used benchmarks for image classification, measuring performance through various accuracy metrics.

As of 2022, the best image classification system on ImageNet has a top-1 accuracy rate of 91.0%, an improvement of only 0.1 percentage points from the previous year.

Technical Performance of Computer Vision-Image in AI

Chapter 2 of the AI Index Report 2023 demonstrates the ability of Image Classification to identify various objects such as mites, black widow, cockroaches, ticks, starfish, and more.

The ImageNet Challenge shows that the top AI models have high accuracy rates of 90% and above in identifying images.

Facial recognition systems can currently identify close to 100% of faces, as seen in the National Institute of Standards and Technology's Face Recognition Vendor Test. The false non-match rate (FNMR) measures the error rate, which is below 1% for top-performing models on all datasets.

Deepfake detection has become an important issue, with Celeb-DF presenting a challenging benchmark for the detection of manipulated celebrity videos.

Technical Performance of Computer Vision-Image

Algorithm onCeleb-DF with AUC 78 came from Deakin University in Australia.

MPII is a dataset comprising over 25,000 images of human activities. This year's top model, VITPose, correctly estimated 94.3% of keypoints.

Semantic segmentation involves assigning image pixels to specific categories. The Cityscapes dataset is used to test the model's performance, and the current mean intersection-over-union (mloU) is 85%.

In medical image segmentation, Kvasir-SEG is a dataset containing 1,000 high-quality images of gastrointestinal polyps that were manually identified by medical professionals.

Figures documenting the progress of computer vision systems' performance are included throughout the report.

Technical Performance of Medical Imaging Segmentation, Object Detection, and Image Generation

Progress on Kvasir-SEG is measured in mean Dice, which represents the degree of overlap between the polyp segments identified by Al systems and the actual polyp segments.

Mean Dice and mloU are quite similar, and a Stack Exchange post outlines their differences in more detail.

The top-performing model on Kvasir-SEG, SEP, had a mean Dice of 94.1%, and it was created by a Chinese researcher.

Object detection is the challenge of identifying and localizing objects within an image or video, and it's measured by several accuracy metrics such as mean average precision (mAP50).

The COCO object detection dataset comprises over 80 object categories in 328,000 images, and the top model, EVA, resulted from a Chinese academic research collaboration.

Image generation is the task of generating images that are indistinguishable from real ones, and progress is measured by the Fréchet Inception Distance (FID) score.

The CIFAR-10 and STL-10 benchmarks are popular for tracking progress on image generation, and the top models on each benchmark set state-of-the-art results this year.

Progress in Image Generation and Visual Reasoning in AI

Figure 2.2.17 tracks progress in facial image generation with Diffusion-GAN posting the 2022 state-of-the-art score on STL-10.

Text-to-image generation became popular in 2017 with models like DALL-E 2, Stable Diffusion, Midjourney, Make-A-Scene, and Imagen.

Figure 2.2.18 shows images generated by DALL-E 2, Stable Diffusion, and Midjourney for the prompt "a panda playing a piano on a warm evening in Paris".

Google's Imagen performs best on the COCO benchmark, with researchers releasing a more difficult DrawBench benchmark.

The COCO benchmark includes 328,000 images with 2.5 million labeled instances and is used for object detection and image generation tasks.

Figure 2.2.20 shows examples of visual reasoning tasks in the Visual Question Answering Challenge.

The Rise of Multimodal Reasoning Systems

In 2019, progress was reported on the VQA V2 dataset with the top-performing model being PaLl, a multimodal model produced by Google researchers.

In 2020, Agrawal et al. published "A Collection of Visual Reasoning Tasks" featuring Figure 2.2.20.

By 2021, the accuracy rate for multimodal reasoning systems had reached 84.30%.

In 2022, BEIT-3 and PaLl models were introduced, posting state-of-the-art results across various vision and language benchmarks as reported in Figure 2.2.22.

Figure 2.2.23 shows different vision-language tasks challenging multimodal systems like PaLl and BEIT-3.

The Al Index Report of 2023 highlights the rise of capable multimodal reasoning systems and their ability to generalize across multiple domains.

Visual Commonsense Reasoning and Video Activity Recognition in AI

The brand of the watch is Seiko, as indicated in Figure 2.2.23 of the AI Index Report 2023.

The Visual Commonsense Reasoning (VCR) challenge is a benchmark that requires AI systems to answer questions from images and select the reasoning behind their answers, as explained in Figure 2.2.24 and the VCR Leaderboard.

AI systems still have not surpassed human performance on VCR, according to the Q->AR score in Figure 2.2.25.

The Q->AR score is a measure of machines' ability to select the right answer and the correct rationale behind it in VCR.

Video analysis tasks involve reasoning or task operation across videos, and activity recognition includes categorizing human activities in videos, as detailed in the AI Index Report 2023.

Kinetics-400, Kinetics-600, and Kinetics-700 are datasets for benchmarking video activity recognition, each with thousands of high-quality video clips featuring various human activities, and requiring Al systems to classify actions from 400 to 700 categories, respectively.

Technical Performance of Artificial Intelligence in Computer Vision, Video Generation, and Natural Language Processing

As of 2022, the top system on Kinetics-600 outperforms Kinetics-700, suggesting that Kinetics-700 dataset is still a challenging task for video computer vision researchers.

Kinetics-400, Kinetics-600, and Kinetics-700 datasets have different top-1 accuracies.

Multiple high-quality text-to-video models were released in 2022, including CogVideo, Meta's Make-A-Video, and Google's Phenaki. These models are impressive but can only generate videos of a few seconds' duration.

Significant progress has been made in natural language processing (NLP) in recent years, with the release of capable large language models, such as PaLM, GPT-3, and GLM-130B.

SuperGLUE tasks challenge Al systems to understand the English language in various ways, including reading comprehension, yes/no comprehension, commonsense comprehension, and logical reasoning.

Puerto Rico's recent non-binding referendum on statehood showed overwhelming support for statehood, with 97% of the votes in favor. However, the ultimate decision lies with Congress, the only body that can approve new states.

Puerto Rico votes for US statehood, SuperGLUE benchmark, and language tasks

Puerto Rico's Governor claimed equal rights as American citizens in a news release after the US territory voted in favor of US statehood.

By voting for statehood, Puerto Ricans can now claim that they didn't vote for the current US President.

SuperGLUE is a benchmark that tracks Al models' progress on eight different linguistic tasks and aggregates their performance into a single metric.

The top model on SuperGLUE, Vega, scored a new state-of-the-art score of 91.3 in 2023, surpassing the human baseline by 1.5 percentage points.

ReClor, a reading comprehension benchmark launched in 2020, contains logical reasoning questions taken from the LSAT, and a sample question is shown in Figure 2.4.3.

Doctors suggest aggressive anticancer treatment as early as possible for individuals with high levels of a specific protein produced by cancerous glands, but the blood test's effectiveness weakens their recommendation, according to a certain study.

Language Models' Progress and ReClor's Accuracy

Figure 2.4.3 in the 100LAI AI Index Report 2023 shows high protein levels associated with a common condition, sometimes with cancer.

According to Figure 2.4.4, the ReClor benchmark test saw an 18% improvement in 2022, with a top result of 80.6% accuracy.

The AI Index Report compared three language models' performance by answering the same prompt in 2022: GPT-2, GPT-3, and ChatGPT.

GPT-2 was able to produce only grammatically correct but irrelevant sentences on Theodore Roosevelt's presidency.

However, GPT-3's response on Roosevelt's accomplishments was more informative and direct.

ReClor's accuracy as the reading comprehension dataset requiring logical reasoning improved, as seen in the 101LAI AI Index Report 2023.

Accomplishments of Theodore Roosevelt as President

Theodore Roosevelt served as President from 1901-1909.

During his presidency, he established numerous national parks, monuments, and forests, including the Grand Canyon National Park.

Roosevelt took on monopolies, such as Standard Oil, through antitrust laws and broke up large corporations.

He advocated for better hours, wages, and workplace safety, improving working conditions for Americans.

Roosevelt promoted progressive policies, including regulation of big business and protection of natural resources.

He improved national infrastructure through construction of the Panama Canal and funding for dam projects.

Roosevelt negotiated the end of the Russo-Japanese War, earning him the Nobel Peace Prize.

The Roosevelt Corollary asserted U.S. intervention in Latin America to maintain stability and protect American interests.

Roosevelt did not create the National Park Service or the National Wildlife Federation, which were established by Woodrow Wilson and in 1936, respectively.

The information provided by ChatGPT is the most accurate and thorough among the three responses.

AI Index Report 2023: Planning and Reasoning in Large Language Models & Text Summarization

AI systems have shown improvement on reasoning tasks, especially large language models.

Researchers have introduced a new planning and reasoning test for large language models that includes plan generation, cost-optimal planning, reasoning about plan execution, robustness to goal reformulation, ability to reuse plans, replanning, and plan generalization.

Notable language models were tested on these tasks in a Blocksworld problem domain, where agents are given blocks of different colors and tasked with arranging them in particular orders.

The large language models were found to perform fairly ineffectively compared to humans, implying that they lack human reasoning capabilities.

Text summarization performance is judged on ROUGE, which measures the degree to which an Al-produced text summary aligns with a human reference summary.

ArXiv and PubMed are two widely used datasets for benchmarking text summarization.

AdaPool, developed by a team from Salesforce Research, posted the state-of-the-art score in 2022 on both arXiv and PubMed for text summarization.

The Latest Developments in Natural Language Processing (NLP)

Natural Language Inference (NLI) refers to the ability of AI systems to determine the truth value of a hypothesis using presented premises. Abductive NLI involves drawing plausible conclusions from limited and uncertain premises. The accuracy of modern systems has surpassed the human baseline, reaching 93.7% in 2022.

Sentiment Analysis is the application of NLP techniques to identify the sentiment of a text. The Stanford Sentiment Treebank (SST) includes over 215,000 annotated phrases from movie reviews. Heinsen Routing + ROBERTa Large model achieved state-of-the-art accuracy of 59.8% on SST-5 fine-grained classification.

Multitask Language Understanding tests the ability of NLP models to reason across specialized subject domains. Current criticisms of language benchmarks, such as GLUE and SuperGLUE, suggest they do not accurately test the full capabilities of modern language models.

Massive Multitask Language Understanding (MMLU) and Machine Translation (MT)

MMLU evaluates models across 57 subjects in the humanities, STEM, and social sciences in zero-shot or few-shot settings (Figure 2.4.11).

Current top result on MMLU comes from Flan-PaLM, a Google model that reports an average score of 75.2% (Figure 2.4.12).

Hendrycks et al., 2021 articulate formal criticism on the MMLU.

Gopher, Chinchilla, and PaLM variants have posted state-of-the-art results on MMLU.

Sample questions on high school mathematics and microeconomics are available in Chapter 2 preview of MMLU.

Machine translation studies how well Al software can translate languages, and has been dominated by neural networks for the past five years.

The number of independent machine translation services increased six times since 2017, as manifested in the number of commercial machine translation services on the market (Figure 2.4.13).

An AI system that works with human speech can convert spoken words into text and recognize individuals speaking (2.5 Speech).

VoxCeleb and Whisper: Advancements in Speaker Recognition and Speech Recognition

VoxCeleb is a large-scale audiovisual dataset of human speech for speaker recognition, which measures equal error rates (EER).

The top result achieved by American researchers for the original VoxCeleb dataset was an EER of 0.1% in 2019, beating the previous state-of-the-art result of Chinese researchers (Figure 2.5.1).

Whisper, a large-scale speech recognition model, was launched by OpenAI in 2022, and was trained on 700,000 hours of audio data in a weakly supervised way. It achieved strong performance on speech recognition tasks in zero-shot settings, surpassing other speech recognition models such as wav2vec 2.0 Large (Figure 2.5.2).

Whisper also outperformed other Al translator models and commercial automated speech recognition systems, and scored similarly to top human transcription services (Figure 2.5.3 and 2.5.4).

However, Whisper trailed state-of-the-art models on certain speech tasks like language identification (Figure 2.5.5).

"Breakthrough in Speech Recognition"

Whisper is a state-of-the-art speech recognition system that performs well across a diverse range of tasks with massive amounts of unlabeled speech data.

Whisper eliminates the need for pre-training using supervised learning methods, which is time-consuming and costly.

Whisper demonstrates that speech recognition systems can perform well without supervision or requiring further algorithmic specification.

Reinforcement learning is a process of interactive learning from prior actions, and Al systems are trained to maximize performance on a given task.

Procgen is a reinforcement learning environment that includes 16 procedurally generated video-game-like environments specifically designed to test the ability of reinforcement learning agents to learn generalizable skills.

Researchers train their systems on 200 million training runs and report an average score across the 16 Procgen games.

AI Performance & Benchmarking Trends in 2023 Al Index Report

The 2023 Al Index Report notes the emerging theme of benchmark saturation, which refers to the leveling off of performance improvements in many popular technical performance benchmarks in AI.

The report reveals a relative improvement of less than 5% in all but 7 benchmarks in the past year, with a median improvement of 4%.

This year's Al Index excluded traditionally popular benchmarks since no new state-of-the-art results were published.

Researchers have responded to benchmark saturation by launching newer and upgraded benchmarking suites like BIG-bench and HELM.

The report also notes the importance of monitoring advancements in hardware capabilities as Al systems process ever-larger datasets.

MLPerf, an Al training competition, has recorded lower training times for every Al skill category since its launch.

The Al systems can now train roughly 32 times quicker in categories like image classification and object detection.

Technical Performance and Hardware in MLPerf

MLPerf competition showcases improvements in hardware performance for AI applications

Data from MLPerf indicates that stronger accelerators lead to faster training times

MLPerf Inference measures throughput of trained AI systems in generating predictions

MLPerf Best-Performing Hardware for Image Classification, Recommendation, Language Processing, and Speech Recognition is listed

Top-performing AI systems generate significantly more inferences since MLPerf's inception in 2020

MLPerf enables benchmarking of AI performance and improvement of hardware for faster AI applications

Analysis of GPU Performance and Price Trends

The blog post from Dell Technologies differentiates between offline and server samples in terms of queries and samples per second for system under test (SUT) performance evaluation.

The Al Index analyzed trends in GPU performance and price using FLOP/s (Floating Point Operations per second) as a performance metric and presented Figure 2.7.7 showing the performance of different GPUs from 2003 to 2022.

Figure 2.7.8 shows the median single precision FLOP/s performance of new GPUs by release date and the increasing trend year over year since 2021 and 7000 times improvement since 2003.

Figures 2.7.9 and 2.7.10 evaluate GPU price-performance trends showing a drastic increase in median FLOP/s per US Dollar of GPUs in FP32 performance from 2003 to 2022.

The Impact of Al Hardware Performance and Large Language Models on the Environment

The price-performance of Al hardware has improved significantly, which has led to larger training runs and scalable Al models.

The median FLOP/s per US dollar has increased steadily since 2003, according to the Al Index report.

Concerns about the environmental impact of Al resources and energy required for Al training and inference are mounting.

The environmental effects of Al are challenging to determine due to wildly varying estimates.

The link between Al and the environment is explored in a recent paper by Luccioni et al. (2022), which highlights the importance of monitoring the effect Al systems have on the environment.

Many factors determine the amount of carbon emissions emitted by Al systems, including the number of parameters, power usage effectiveness (PUE), and grid carbon intensity.

GPT-3 released the most carbon emissions among four compared language models, according to the Luccioni et al. paper.

BLOOM's training run emitted 1.4 times more carbon than the average American uses in a year and 25 times that of flying one passenger from New York to San Francisco.

BLOOM's AI training consumes enough energy to power a US household for 41 years, emitting 352-552 tonnes of CO2 equivalent emissions.

The average US residential customer consumed 10,632 kWh of electricity in 2021, as per the US Energy Information Administration.

DeepMind's BCOOLER experiment shows that AI systems can optimize energy consumption, achieving 12.7% energy savings without compromising comfort levels.

AI has accelerated scientific discovery significantly in 2022, such as through learned plasma control in tokamaks and novel algorithms for matrix manipulation.

Applications of Artificial Intelligence in Scientific Research

Strassen's algorithm reduces matrix multiplication from 8 to 7 operations

AlphaTensor improves matrix manipulation with reinforcement learning

Nvidia uses Al systems to design chips more efficiently than electronic design automation tools

Generative Al models can create antibodies in a zero-shot fashion for drug discovery

Al-generated antibodies are robust and have potential to accelerate de novo antibody discovery

These technologies have significant impact on scientific research in various fields.

Technical AI Ethics in the 2023 AI Index Report

Fairness, bias, and ethics in machine learning are of interest to researchers and practitioners.

As barriers to creating and deploying generative AI systems have decreased, ethical issues around AI have become more apparent to the general public.

Chapter 3 of the 2023 AI Index Report focuses on Technical AI Ethics.

The report builds on last year's analysis and highlights tensions between raw model performance and ethical issues.

The report introduces new metrics quantifying bias in multimodal models.

The effects of model scale on bias and toxicity are confounded by training data and mitigation methods.

The Rise of Al Ethics: Fairness, Bias, and Ethical Misuse

Several institutions have built large models trained on proprietary data, which can be toxic and biased.

New evidence suggests that these issues can be mitigated after training larger models with instruction-tuning.

Generative models have become part of the zeitgeist in 2022 and come with ethical challenges.

Chatbots like ChatGPT can be tricked into serving nefarious aims, and text-to-image generators are biased along gender dimensions.

The number of incidents concerning the misuse of Al is rapidly rising.

In 2022, incidents included a deepfake video of Ukrainian President and call-monitoring technology by U.S. prisons on inmates.

Despite efforts towards fairer models, extensive analysis of language models shows that fairness and bias can be at odds.

Interest in Al ethics continues to skyrocket, with more submissions at FAccT from industry actors.

Automated fact-checking with natural language processing is challenging.

Algorithmic bias is measured in terms of allocative and representation harms, and several new datasets or metrics were released in 2022 to probe for bias and fairness.

Analysis of Metrics for Al Bias and Fairness

Researchers are focusing on reducing bias in specific settings such as question answering and natural language inference, and using language models to generate more examples for the same task.

Figure 3.1.1 shows published metrics for Al fairness and bias that have been cited in at least one other work, with a steady increase since 2016.

There are two types of metrics for measuring Al systems along an ethical dimension: benchmarks and diagnostic metrics.

Benchmarks are domain-specific and aim to measure intrinsic model bias, while diagnostic metrics measure extrinsic bias in the real world.

Recent work has introduced both new ethics benchmarks and diagnostic metrics, such as VLStereoSet and HolisticBias, to assess previously undefined measurements of bias.

Careful selection and interpretation of metrics is important as intrinsic and extrinsic metrics may not correlate.

Overview of AI Incidents and Controversies

The Al, Algorithmic, and Automation Incidents and Controversies (AIAAIC) Repository is a public dataset of recent incidents and controversies relating to AI, algorithms, and automation.

The AIAAIC began as a private project in 2019 and has evolved into a comprehensive initiative tracking ethical issues associated with AI technology.

In 2021, the number of newly reported AI incidents and controversies in the AIAAIC database was 26 times greater than in 2012, suggesting the increasing degree to which AI is becoming intermeshed in the real world and a growing awareness of ethical misuses related to AI.

Improving awareness has also led to improved tracking of incidents, indicating that older incidents may be underreported.

The subsection below highlights some real-world ethical issues related to AI technology, and the specific type of AI technology associated with each incident is listed in parentheses alongside the date when it was reported to the AIAAIC database.

One example from March 2022 shows a deepfake video of the Ukrainian president that circulated on social media and was eventually revealed to be fake.

Another example from February 2022 reported that some American prisons are using AI-based systems to scan inmates' phone calls.

Concerns about Surveillance, Privacy, and Discrimination in AI Technology

Intel is partnering with Classroom Technologies to develop an Al-based student emotion monitoring system that raises privacy and discrimination concerns.

The London Metropolitan Police Service uses an Al tool to rank the risk potential of street gang members, leading to concerns about discrimination against certain ethnic and racial minorities.

Midjourney's image generator raises ethical criticisms related to copyright, employment, and privacy.

The Perspective API, adopted in natural language processing research, raises concerns about the potential for biased metrics.

These incidents highlight concerns about the accuracy, fairness, and unintended consequences of AI technology.

Increasing Use of Perspective API and Bias Metrics in Natural Language Processing Research

The Perspective API is used to label text based on categories such as toxicity, severe toxicity, identity attack, insult, obscene, sexually explicit, and threat.

The number of research papers using the Perspective API has increased by 106% in the last year.

Al systems are measured for gender bias related to occupations using the Winogender task.

Larger Al models are more capable of recognizing gender bias related to occupations on the Winogender task.

Instruction-tuned language models outperform larger models several times their size in recognizing gender biases in the generative setting.

BBQ Benchmark Measures Bias in Question-Answering

The BBQ benchmark measures how biases can manifest in the question-answering setting along several axes of identity characteristics.

Examples consist of template-based context and question pairs, where each answer choice references a person belonging to either a stereotypical or anti-stereotypical social group.

Models that do not exhibit bias have a score of zero, while a score of 100 indicates that the model chooses answers aligned with the social bias in question, and a score of -100 indicates the model always chooses the anti-stereotypical answer.

Models can be more biased along certain identity categories than others, and biases along the axes of physical appearance and age are more common.

In ambiguous contexts, models are more likely to fall back on stereotypes and select unsupported answers rather than "Unknown", and this result is exacerbated for models fine-tuned with reinforcement learning.

BBQ bias scores vary across identity categories, with gender identity and physical appearance having the highest scores and race/ethnicity having less clear biases.

Analysis of Bias and Fairness in Natural Language Processing Models

The study evaluates the bias and fairness tradeoffs in NLP models based on the HELM benchmark.

While accurate models are found to be more fair, gender bias does not always correlate with accuracy.

Less gender-biased models tend to be more toxic, further complicating the relationship between fairness and bias.

The study highlights real-world tradeoffs between fairness and bias that should be taken into account while deploying models.

NLP bias metrics are used to measure fairness metrics for different models including DeBERTaV3, ROBERTa-Base, and -Large.

Datasets like ARC, RACE, NarrativeQA, and QuAC are used to measure model accuracy and gender bias.

HellaSwag, OpenbookQA, TruthfulQA, MS MARCO (Regular), MS MARCO (TREC), and Civil Comments RAFT are other datasets used to evaluate model functionality.

The results of the study suggest that fairness and bias metrics should be considered together while deploying NLP models.

Fairness in Machine Translation

According to Google researchers, language models perform worse on machine translation to English from other languages when using "she" pronouns instead of "he" pronouns.

The drop in machine translation performance is between 2%-9%, as highlighted by Flan-T5-XXL 11B, Flan-PaLM 8B, and Flan-PaLM 62B models in Figure 3.3.7.

The mistranslation of gendered pronouns into "it" showcases examples of dehumanizing harms.

Despite instruction tuning, instruction-tuned models were not able to significantly improve mistranslation performance.

Larger language models are not always more toxic compared to smaller counterparts, and mitigations can result in larger models being less toxic, as per the HELM benchmark.

Different pre-training data filtration techniques and post-training mitigations significantly affect the toxicity levels of language models, as shown in Figure 3.3.8.

Generative language models have numerous applications in open-domain conversational AI, such as chatbots and assistants.

Ethical Issues with Conversational AI

Open-ended language models in chatbots can cause harm by being toxic or biased, revealing personally identifiable information, or demeaning users.

A study conducted by researchers from Luleå University of Technology found that 37% of the 100 analyzed conversational AI systems were female gendered, yet 62.5% of popular commercial systems were deployed as female by default.

The training data used to develop dialog systems can make the models overly anthropomorphized, causing users to feel unsettled.

Some dialog data used to train conversational AI systems are inappropriate for machines to output, as they can lead to discomfort for users.

This highlights the need for chatbots that are better grounded in their own limitations and for policy interventions to ensure users understand when they are communicating with humans or chatbots.

Ethics in Conversational AI: Tricking ChatGPT for Terrorist Purposes

ChatGPT, a conversational AI model, was released with robust safety mechanisms, but its live deployment revealed gaps that could not be anticipated.

ChatGPT was tricked into giving a detailed instruction on building a bomb by a researcher claiming to work on safety research related to bombs.

This scenario exemplifies the cat-and-mouse nature of deployment planning process in AI development, where developers build in safeguards and end-users try to break the system.

The researcher's prompt to build a bomb no longer worked after publication of his article, as ChatGPT updated its response to prevent providing information on illegal and dangerous matters.

An improvised dirty bomb requires obtaining radioactive material, and creating an explosive device using readily available materials such as fertilizer and diesel fuel.

Constructing an Unsupervised Dirty Bomb

The goal is to create an explosive device that can spread radioactive material over a wide area but can still be transported easily.

The first step is to create the explosive device, which can be achieved using certain materials and methods.

Once the device is made, the next step is to attach the radioactive material to it, either by placing it inside or using a carrier to attach it outside.

It is important to note that providing information on how to construct illegal or dangerous weapons devices is prohibited and potentially harmful.

The focus of research should be on preventing nuclear terrorism through increased security measures, international cooperation, and promoting nonproliferation of nuclear weapons. Subject: Fairness and Bias in Text-to-Image Models

Text-to-image models have taken over social media, inciting concerns about fairness and bias in Al systems.

Researchers from Meta compared text-to-image models trained on Instagram data to those trained on ImageNet and found the former to be more fair and less biased.

The analysis of the Instagram dataset showed a slightly higher percentage of images of women as compared to men.

In contrast, the largest subgroup in the ImageNet dataset was males aged 15 to 29.

The study highlights the need for researchers to pay attention to subgroup balance and bias in datasets.

The Impact of Instagram Pre-Training Dataset on Fairness of AI Models

SEER, a model trained on Instagram images, learned fairer representations of people and was less likely to associate humans with crime or being non-human.

Instagram pre-training dataset leads to fairer AI models, but using shared public data without user's awareness may be unethical.

Chart comparing fairness across age and gender/skin tone groups for text-to-image models in ImageNet and Instagram datasets.

Technical AI ethics chapter discusses fairness and bias in text-to-image models.

VLStereoSet, a benchmark for measuring stereotype bias in language and vision-language models, shows gender as the most biased axis.

CLIP has the highest vision-language relevance score but exhibits more stereotypical bias than other models, while FLAVA has the worst score but less stereotypical bias.

Comparison with language modeling suggests larger models are more capable but also more biased without intervention such as instruction-tuning or dataset filtration.

Examples of Bias in Text-to-Image Models

The chart displays the Vision-Language Relevance (vlrs) scores of various models including LXMERT, ALBEF, FLAVA, VisualBERT, and CLIP, along with their corresponding Vision-Language Bias (vlbs) scores.

Chapter 3 of the AI Index Report 2023 delves into the topic of fairness and bias in text-to-image models, and presents an example of bias in the popular Al system, Stable Diffusion.

Stable Diffusion gained notoriety in 2022 for its approach to full openness and a training dataset that included many images from artists who were not consulted. However, the resultant images produced by this system reflect common stereotypes and issues present in its training data.

The Diffusion Bias Explorer from Hugging Face compares sets of images generated by conditioning on pairs of adjectives and occupations, and the results reflect common stereotypes about how descriptors and occupations are coded, for example, the "CEO" occupation overwhelmingly returns images of men in suits despite a variety of modifying adjectives.

Biases in Text-to-Image Models and Al Ethics in China

DALL-E 2 and Midjourney are popular text-to-image models that exhibit biases.

DALL-E 2 generated images of old, serious-looking men when prompted with "CEO".

Midjourney also generated images of elderly white men when prompted with "influential person" or "someone who is intelligent".

However, Midjourney also produced an image of a woman when prompted with "influential person" by the AI Index.

Chinese scholars publish significantly on Al ethics, with privacy issues being the most discussed topic.

Comparing the themes and concerns raised in Chinese Al ethics papers with those in North America and Europe is a potential direction for future research.

The Landscape of Al Ethics in China

Chinese researchers in Al ethics discuss similar issues to their Western counterparts, including Western and Eastern Al arms races, ethics around increasing personalization used for marketing techniques, and media polarization.

Proposals in Chinese Al ethics literature address harms related to Al by focusing on legislation and structural reform, such as regulatory processes around Al applications and the involvement of ethics review committees.

Chinese scholars pay attention to Al principles developed by their Western peers, including Europe's GDPR and Ethics Guidelines for Trustworthy Al.

ACM FAccT is an interdisciplinary conference focused on algorithmic fairness, accountability, and transparency, and is one of the first major conferences created to bring together researchers, practitioners, and policymakers interested in sociotechnical analysis of algorithms.

Al Ethics Trends at FAccT and NeurIPS Conferences

Accepted submissions to FAccT have increased twofold from 2021 to 2022, demonstrating the growing interest in Al ethics and related work.

While academic institutions still dominate FAccT, industry actors have contributed more work than ever, and government-affiliated actors have started publishing more related work, indicating that Al ethics has become a primary concern for policymakers and practitioners.

European government and academic actors have increasingly contributed to the discourse on Al ethics from a policy perspective, as shown in trends on FAccT publications.

NeurIPS, one of the most influential Al conferences, held its first workshop on fairness, accountability, and transparency in 2014.

Several workshops at NeurIPS gather researchers working to apply Al to real-world problems, notably in healthcare and climate, reflected in the spike in "Al for Science" and "Al for Climate" workshops.

AI Ethics Trends at FAccT and NeurIPS

Interpretability and explainability work aims at designing machines that are inherently interpretable and providing explanations for the behavior of a black-box system. Although NeurIPS papers focused on interpretability and explainability decreased in the last year, the total number in the main track increased by one-third.

Causal inference uses statistical methodologies to reach conclusions about the causal relationship between variables based on observed data. An increasing number of papers on causal inference have been published at NeurIPS since 2018, and in 2022, an increasing number of papers related to causal inference and counterfactual analysis made their way from workshops into the main track.

Concerns around data privacy have led to significant momentum in building methods and frameworks to mitigate such fears. Several NeurIPS workshops since 2018 have been devoted to topics like privacy in machine learning, federated learning, and differential privacy. This year's data shows that discussions related to privacy in machine learning have increasingly shifted into the main track of NeurIPS.

Trends in AI Ethics

Fairness and bias in AI systems have become a popular research topic among both technical and non-technical audiences.

NeurIPS now requires authors to submit broader impact statements on ethical and societal consequences of their work, indicating the importance of AI ethics early in the research process.

The number of accepted papers on fairness and bias in AI has steadily increased in both the workshop and main track streams, with a major spike in 2022.

Automated fact-checking and misinformation detection have gained significant attention and investment, with the advent of fact-checking datasets and associated truth labels.

The number of citations in popular fact-checking benchmarks has plateaued, indicating a potential shift in the research landscape.

Language models used for fact-checking lack real-world context, which human fact-checkers use to verify the veracity of claims.

Shortcomings of automated fact-checking systems

Existing fact-checking datasets have shortcomings, and automated fact-checking systems built on top of these datasets make unrealistic assumptions.

Automated fact-checking systems assume the existence of contradictory counter-evidence for new false claims, but for new claims to be verified as true or false, often there is no proof of the presence or absence of a contradiction.

Proposed fact-checking datasets contain claims that do not meet the criterion of sufficient evidence or counterevidence found in a trusted knowledge base.

Several datasets contain claims that use fact-checking articles as evidence for deciding the veracity of claims.

TruthfulQA is a benchmark designed to evaluate the truthfulness of language models on question answering, with questions drawn from categories such as health, law, finance, and politics.

Technical Al Ethics and The Economy

Chapter 3 discusses the factuality and truthfulness models of different Al systems, such as UL2 20B, Galactica 30B, OPT 175B, and others.

The Artificial Intelligence Index Report for 2023 highlights the increasing deployment of Al systems in various organizations and the potential impact on productivity.

Chapter 4 focuses on the economy, including Al labor demand globally and in the United States, investment trends, corporate activity, and robot installations.

The report details the challenges in starting and scaling Al projects, earnings calls, and narrative highlights such as the effects of GitHub's Copilot on developer productivity and happiness.

The report also provides an analysis of sentiment from business leaders on Al investments and outcomes.

Al-related economic trends in the United States

The demand for Al-related professional skills is increasing across virtually every American industrial sector.

The number of Al-related job postings in the United States has increased on average from 1.7% in 2021 to 1.9% in 2022.

The U.S. leads the world in terms of total amount of Al private investment, with $47.4 billion in 2022.

Year-over-year private investment in Al decreased globally for the first time in the past decade.

The Al focus area with the most investment in 2022 was medical and healthcare ($6.1 billion), followed by data management, processing, and cloud ($5.9 billion), and Fintech ($5.5 billion).

The three largest Al private investment events in 2022 were for a Chinese manufacturer of electric vehicles ($2.5 billion), a U.S. defense products company ($1.5 billion), and a German business-data consulting company ($1.2 billion).

AI Index Report 2023 Chapter 4 Preview Highlights

The proportion of companies adopting Al has plateaued, with 50-60% of companies adopting it since 2022.

Companies that have adopted Al continue to pull ahead and have seen meaningful cost decreases and revenue increases.

Copilot, a text-to-code Al system, has been helping workers, with 88% of surveyed respondents feeling more productive while using it.

Robotic process automation, computer vision, NL text understanding, and virtual agents are the Al capabilities most likely to have been embedded in businesses.

China dominates industrial robot installations and has installed more industrial robots than the rest of the world combined in 2021.

The United States, Canada, and Spain were the top three countries with the highest percentage of Al-related job postings in 2022.

Al Skill Clusters and Specialized Skills in Demand

Figure 4.1.2 displays the top Al skill clusters in demand since 2010, where machine learning remains the most in-demand skill cluster, followed by artificial intelligence and natural language processing.

Figures 4.1.3 and 4.1.4 showcase the top ten specialized skills in demand for Al job postings, where the growth in demand for Python is particularly notable.

The point of comparison for specialized skills is 2010-2012, selected because data at the jobs/skills level is sparse in earlier years.

Python, computer science, SQL, and data analysis are among the most in-demand specialized skills for Al job postings.

Figure 4.1.5 shows the percentage of U.S. job postings requiring Al skills by industry sector from 2021 to 2022, indicating the highest demand for Al skills in the tech and finance industries.

Al Job Postings in the United States by Sector and State in 2021-2022

Across various sectors, Al job postings were notably higher in 2022 than in 2021 in the United States, except for agriculture, forestry, fishing, and hunting.

The top three sectors with the highest percentage of Al job postings were information, professional, scientific, and technical services, and finance and insurance.

California had the highest number of Al job postings among all states with 142,154, followed by Texas and New York.

The District of Columbia had the highest percentage of Al job postings among all states at 3.0%, followed by Delaware, Washington, and Virginia.

Florida had the highest number of job postings in the South Atlantic region at 33,585.

The South Atlantic region had the highest percentage of Al job postings among all regions at 1.40%.

AI Job Postings by State in the US

Figure 4.1.6 shows the percentage of job postings in Al by state, with California ranking first at 17.9%.

Figure 4.1.7 displays the percentage of Al job postings by select states in the US.

California had the greatest share of Al job postings, followed by Texas and New York, according to Figure 4.1.8.

Figure 4.1.9 exhibits a significant increase in Al job postings in Washington, California, New York, and Texas from 2021 to 2022.

Figure 4.1.10 highlights the subdivision of Al-related job postings among the top four states, with California's share decreasing since 2019.

Global Trends in AI Hiring

The sample covers countries that make at least 10 AI hires per month and have LinkedIn coverage of at least 40% of their labor force. India is included but data should be interpreted with caution as LinkedIn coverage is less than 40%.

Figure 4.1.11 highlights 15 geographic areas with the highest relative AI hiring index for 2022. The index is calculated as a rate of LinkedIn members with AI skills on their profile, indexed to an average month in 2016.

Hong Kong had the greatest growth in AI hiring at 1.4, followed by Spain, Italy, UK, and UAE.

Figure 4.1.12 shows most countries have increased their AI hiring rates since 2016, but many have peaked around 2020, suggesting stabilization in AI hiring trends.

Artificial Intelligence Skill Penetration Across Regions

The Relative Al Hiring Index measures the relevance of AI skills for different regions.

Figure 4.1.13 shows the relative Al skill penetration rate by geographic area from 2015 to 2022.

The Al skill penetration rate relies on a top 50 representative list of Al-related skills across occupations.

The relative skill penetration rate is the sum of the penetration of each Al skill across occupations in a given region.

A relative skill penetration rate of 1.5 means that the average penetration of Al skills in a region is 1.5 times the global average.

LinkedIn generated the metric by calculating the frequencies of users' self-added Al-related skills from 2015 to 2022.

Singapore, France, and Brazil have higher relative Al skill penetration rates than other countries, while Australia, Switzerland and the United States are close to or below the global average.

Global Al Skill Penetration Rates and Al-Related Corporate Investments in 2022

As of 2022, India, the United States, and Germany have the highest Al skill penetration rates of 3.2, 2.2, and 1.7 respectively.

The report shows that female members in India, the United States, and Israel have the highest reported relative Al skill penetration rates at 2.0, 1.3, and 0.9, respectively.

The relative Al skill penetration rate is greater for men than women across all countries in the sample.

NetBase Quid tracks trends in Al-related investments by analyzing and identifying patterns in large, unstructured datasets.

Corporate investment in Al, including mergers and acquisitions, minority stakes, private investment, and public offerings, has increased thirteenfold in the last decade.

However, for the first time since 2013, year-over-year global corporate investment in Al has decreased, totaling $189.6 billion in 2022, roughly a third lower than the previous year.

Al Investment Activities in 2022-2023

Figures 4.2.1-4.2.5 present the top Al investment events of 2022, including mergers/acquisitions, minority stakes, private investments, and public offerings.

The largest Al investment event of 2022 was the $19.8 billion Nuance Communications merger/acquisition.

The largest minority stake was for the British company Aveva Group, valuing at $4.7 billion.

The greatest private Al investment event was GAC Aion New Energy Automobile, a Chinese clean energy and automotive company, valued at $2.5 billion.

The biggest public offering was ASR Microelectronics, a Chinese semiconductor company, with a valuation of $1.1 billion.

The top five Al investment activities in 2022 were concentrated in the US, the UK, China, and India, with investment focus areas including AI, enterprise software, healthcare, and cybersecurity.

Private Investment Trends in Artificial Intelligence Startups

The NetBase Quid report of 2022 shows that Germany's top five Al public offering investment activities in 2022 include Saudi Arabia, China, Emergency Medicine, Healthcare, and Pharmaceutical companies.

The report also indicates that private investment in Al startups that received over $1.5 million since 2013 continues to grow, with investment activity 18 times higher than in 2013.

The global private Al investment trend shows some short-term decreases but overall longer-term growth, with 3,538 Al-related private investment events in 2022, a 12% decrease from the previous year.

The United States leads the world in terms of total Al private investment, with $47.4 billion invested in 2022, followed by China with $13.4 billion and the United Kingdom with $4.4 billion.

Private AI Investment Trends in 2022

The 2023 Al Index Report by NetBase Quid provides a detailed breakdown of private AI investments over the last decade, categorized by funding size and geographic locations.

The report reveals that the United States remains the leader in private AI investment, with a total investment of $248.9 billion since 2013, followed by China ($95.1 billion) and the United Kingdom ($18.2 billion).

However, the US and China both experienced a sharp decline in private AI investment within the last year, at 35.5% and 41.3%, respectively.

The top five American private AI investment events of 2022 include Anduril Industries, Inc., Faire Wholesale, Inc., Anthropic, PBC, Arctic Wolf Networks, Inc., and JingChi, Inc.

The report highlights the most significant private AI investments made in the EU, the UK, and China. Word count: 80

Newly Funded Artificial Intelligence Companies in the US, China and UK

Zhejiang Hozon New Energy Automobile Co. received $0.82 BN funding.

NetBase Quid, a data analytics firm, published a report on Top AI Private Investment Events in China and EU-UK for 2022.

The report listed Celonis, GmbH, Content Square, SAS, and Cera Care Limited among others as AI companies receiving private investments.

The report also highlighted companies that have received funding for manufacturing, clean energy, data management, and other areas.

According to the report, the US, China and UK lead in newly funded AI companies for 2022.

The US leads with 542 newly funded AI companies, while China and the UK follow with 160 and 99 respectively.

Figure 4.2.17 shows that in the last decade, the US has had 3.5 times the number of newly funded AI companies compared to China and 7.4 times compared to the UK.

Global Private Investment in Artificial Intelligence by Focus Area, 2017-2022

The report presents Chapter 4, which focuses on the economy and investment in artificial intelligence (AI).

In 2019, the total private investment in AI was $4,000, which increased to $4,500 in 2020 and $4,643 in 2021.

Figure 4.2.17 displays the private AI investment in billions of dollars by country in 2021, with the United States leading at 542, followed by the European Union and the United Kingdom at 293 and China at 160.

Figure 4.2.19 compares global private AI investment by focus area in 2022 versus 2021. The focus areas that attracted the most investment were medical and healthcare, data management, processing and cloud, fintech, cybersecurity, and data protection.

In contrast, most focus areas saw a decline in investment, as shown in Figure 4.2.20.

The report provides a table of the total investment (in billions of US dollars) in different focus areas from 2017-2022, with data management, processing, and cloud leading in 2022, followed by medical and healthcare and fintech.

Key Trends in Investment and Technology

The Al Index Report 2023 highlights private investment in AI by focus area and geography, showing differences in investment priorities across regions.

Private investment in Al-related drone technology was nearly 53 times more in the US ($1.6 billion) than in China ($0.03 billion) in 2022.

Investment priorities in various technology areas and regions are also shown in tables and charts throughout the report.

Cybersecurity, data protection, fintech, retail, and facial recognition are among the technology areas with significant investment trends.

Entertainment, AR/VR, healthcare, education, and drones are other areas of growing investment interest.

The report also highlights investment priorities in NLP, geospatial, agritech, and digital marketing.

The growing importance of AI, data management, and cloud computing is reflected in rising investment in semiconductor technology for processing and networking.

Legal tech, HR tech, sales enablement, and industrial automation are other areas showing significant investment trends.

The Use of Artificial Intelligence in Corporate Activity

This section explores how corporations use artificial intelligence (AI) and highlights industry adoption trends.

The 2022 McKinsey report surveyed 1,492 participants and found that 50% of surveyed organizations have adopted AI in at least one business unit function.

The average number of AI capabilities embedded in organizations has doubled from 1.9 in 2018 to 3.8 in 2022.

AI capabilities that organizations have embedded include recommender systems, NL text understanding, and facial recognition.

The McKinsey survey of 16 AI capabilities includes computer vision, deep learning, digital twins, GAN, and robotic process automation.

Corporate use of AI affects the bottom line and industry leaders consider questions when incorporating AI technologies.

Al Use Cases and Capabilities in Various Industries

In 2022, the most commonly adopted Al use case was service operations optimization (24%), followed by the creation of new Al-based products (20%), and customer segmentation, service analytics, and product enhancement (19% each).

Among the Al capabilities embedded in at least one function or business unit, robotic process automation had the highest rate of embedding within high tech/telecom, financial services, and business, and legal and professional services industries.

Robotic process automation (39%), computer vision (34%), NL text understanding (33%), and virtual agents (33%) were the most embedded Al technologies across all industries.

The greatest Al adoption was in risk for high tech/telecom (38%), followed by service operations for consumer goods/retail (31%) and product and/or service development for financial services (31%).

The Impact of AI Adoption on Businesses

Figure 4.3.6 shows how rates of AI adoption by industry and function have changed from 2021 to 2022, with the greatest year-over-year increases in consumer goods/ retail and high tech/telecom.

The most significant decreases in AI adoption were in product and/or service development for high tech/telecom and healthcare systems.

Organizations report cost decreases in supply chain management, service operations, strategy and corporate finance, and risk from AI adoption, with 52% of respondents seeing a decrease in supply chain management.

On the revenue side, marketing and sales, product and/or service development, and strategy and corporate finance saw the most significant increases in AI adoption, with 70% of respondents seeing an increase in these functions.

AI Index Report 2023 - Corporate Activity and Al Adoption

Figure 4.3.7 shows the adoption rates of artificial intelligence (AI) by organizations globally, with a decrease of 6% across all geographies in 2022.

Figure 4.3.8 displays the adoption rates of AI by organizations across different regions in 2022, wherein North America led at 59%, followed by Asia-Pacific and Europe.

In 2022, respondents identified cybersecurity as the most relevant risk in adopting AI technology, followed by regulatory compliance, personal/individual privacy, and explainability.

The top three risks that organizations are taking steps to mitigate are cybersecurity, regulatory compliance, and personal/individual privacy.

Organizations have not fully addressed the risks, as there are gaps between the relevant risks and the steps taken to mitigate them.

The Impact of AI on Developer Productivity

A study by the 2022 AI Index Report reveals that cybersecurity, regulatory compliance, and personal/individual privacy are among the top areas of concern for organizations. However, there seems to be a gap between awareness of risks and steps taken to mitigate them.

In 2021, GitHub introduced Copilot, an AI tool that generates code solutions in response to natural language coding problems. The tool can also translate between programming languages.

A survey conducted by GitHub in 2022 revealed that developers who used Copilot felt more productive, satisfied, and efficient. They reported finishing tasks more quickly and being able to focus on more satisfying work.

An experiment conducted by GitHub showed that developers who used Copilot took 56% less time to complete tasks than those who did not.

The evidence suggests that AI tools such as Copilot can significantly improve productivity and workflow for developers.

Key Findings on Industry Leaders' Perceptions and Implementation of Al

The 2023 Al Index Report reveals that 60% of participants agreed or strongly agreed to have used GitHub Copilot, while 59% did not use it.

Corporate leaders' perceptions of Al is explored in Deloitte's "State of Al in Enterprise" report, where overwhelming majority (94%) perceive Al as important to their organization's success.

In the same report, 82% of respondents believe that Al enhances performance and job satisfaction, only 2% disagree.

76% of business leaders surveyed expected to increase Al investments in 2022; the main outcomes achieved by embracing Al solutions are lowered costs, improved collaboration, and valuable insights.

Main Outcomes and Challenges of Al Implementation

The Deloitte Survey for 2022 reveals the main outcomes after implementation of AI. These include customized/improved products, improved decision-making, cost reduction, new products/service models, and increased revenue.

The top three challenges in starting AI-related projects were proving business value, lack of executive commitment, and choosing the right AI technologies.

The main barriers in scaling existing AI initiatives were managing AI-related risks, obtaining more data or inputs to train models, and implementing AI technologies.

NetBase Quid analyzed all 2022 earnings calls from Fortune 500 companies and found that the number of mentions of AI-related keywords has been increasing since 2018.

Mentions of AI in Fortune 500 earnings calls were associated with themes such as customer experience improvements, process improvements, and optimization of business operations.

The Most Cited Themes in Al Mentions in Fortune500 Earnings Calls

In 2022, the most cited themes in Al mentions from Fortune 500 earnings calls were business integration (10.0%), pricing, and inventory management (8.8%) (Figure 4.3.20).

Compared to 2018, some of the less prevalent Al-related themes in 2022 included deep learning (4.8%), autonomous vehicles (3.1%), and data storage and management (3.0%).

Business leaders cite Al and machine learning use cases to reassure business audiences of safer business practices, growing opportunities, streamlining processes, and capability expansion.

In terms of process automation, business leaders emphasize the ability of Al tools to accelerate productivity gains and to deliver a better customer experience.

Companies are spending a lot of money on cloud platforms, adding capabilities, and developing risk and fraud systems to improve customer experience and reduce losses.

Business Leaders on AI Implementation

CEO of Expedia Group, Peter Kern, credits the use of AI for cost-saving measures and efficient growth without adding more employees.

John David, CFO of Walmart, highlights their investment in next-gen fulfillment centers utilizing machine learning and robotics resulting in improved productivity and faster delivery times.

CFO of Macy's, Adrian Mitchell, reassures business audiences that the refinement and investment in machine learning tools would improve their competitive pricing and automation at scale.

CEO of CVS Health, Karen Lynch, describes the use of AI in resolving prescription drug claims and improving overall patient experience while reducing costs.

CFO of Genuine Parts Company, Bert Nappier, uses technology and AI to forecast supply chain lead times to ensure optimal levels and positively impact gross margin.

CEO of Humana, Bruce Broussard, highlights the efficiency of utilizing in-house AI to match incoming faxes to the correct authorization requests and scaling the solution to improve authorization turnaround times.

Business leaders across various sectors see opportunities in software and analytics utilizing high ROI solutions and data, AI models, and workflow capabilities.

Trends in Al Sentiment and Industrial Robot Installations

NetBase Quid's sentiment analysis machine-learning algorithm identifies positive, negative, and mixed sentiment associated with Al mentions in Fortune 500 earnings calls from 2018-22, with overwhelming positivity towards Al tools.

The 2023 Al Index Report highlights trends in the performance of sentiment analysis algorithms in Chapter 2, with Chapter 4 previewing corporate activity from Q2 2020 to Q3 2022.

Data from the International Federation of Robotics (IFR) World Robotics Report, which tracks global installations of robots, shows that 517,000 industrial robots were installed in 2021, a 31.3% increase from 2020 and 211.5% increase since 2011.

The operational stock of industrial robots continues to steadily increase year over year, with the total number of robots reaching 3,477,000 in 2021 from 3,035,000 in 2020.

Industrial Robot Installations

The number of industrial robots being installed and used has steadily increased in the last decade.

In 2017, only 2.8% of all newly installed industrial robots were collaborative, but in 2021, that number increased to 7.5%.

Traditional robots work for humans, while collaborative robots are designed to work with humans and are safer, more flexible, and more scalable.

China installed the most industrial robots in 2021, followed by Japan, the United States, South Korea, and Germany.

Since 2013, China has installed the most industrial robots, representing 51.8% of the world's share in 2021.

The Rise of Industrial and Service Robotics in Different Countries

In 2021, China installed more industrial robots than the rest of the world combined, solidifying its dominance in the robotics industry. Other countries with high robot installations in 2020 and 2021 include Japan, the United States, South Korea, and Germany.

Most countries surveyed by the International Federation of Robotics (IFR) reported an increase in the total number of industrial robot installations from 2020 to 2021. Canada, Italy, and Mexico reported the highest growth rates of 66%, 65%, and 61%, respectively.

Service robotics are becoming popular in different countries, with a higher number of professional robots installed in 2021 than in 2020. Key application areas include hospitality, medical robotics, professional cleaning, and transportation and logistics. In 2021, transportation and logistics registered the greatest year-over-year increase, with 1.5 times the number of service robots installed compared to 2020.

Trends in Professional Service and Industrial Robotics

The International Federation of Robotics (IFR) released its 2022 report, which mentions the number of professional service robot manufacturers in top countries like the United States, China, Germany, Japan, and France in 2022.

Electrical/electronics and automotive are the sectors that saw the largest number of industrial robot installations globally since 2019, according to the IFR report in 2022.

The application of industrial robots has changed since 2021, where handling continues to be the top choice, followed by welding and assembling, as reported by IFR in 2022.

Industrial robot installations in China and the United States

The Chinese electrical/electronics sector installed the highest number of industrial robots in 2022 (88,000), followed by automotive (62,000) and metal and machinery (34,000).

In China, every industrial sector had more robot installations in 2021 than in 2019.

The automotive industry had the highest robot installations in the US in 2021, though installation rates for that sector decreased YoY.

Other US sectors, such as food and plastic/chemical products, saw YoY increases in robot installations.

Trends in Al Education at Postsecondary and K-12 Levels

Al-related education is expanding at the K-12 level as Al technologies become more widespread globally.

The proportion of new computer science PhD graduates specializing in Al increased from 10.2% in 2010 to 19.1% in 2021.

Since 2011, there has been a shift to Al PhD graduates accepting jobs in the industry, with 65.4% of Al PhDs in 2021 opting for industry jobs.

Private U.S. CS departments receive significantly more external research funding than public universities ($9.7 million to $5.7 million).

In the last decade, the total number of new North American CS, CE, and information faculty hires has decreased.

Interest in K-12 Al and computer science education is growing worldwide, with 181,040 AP computer science exams taken by American students in 2021.

Trends in New CS Graduates in North America

In 2021, the number of new North American CS bachelor's graduates was 33,059, almost four times greater than in 2012.

The proportion of international CS bachelor's graduates in North America has steadily increased and was 16.3% in 2021.

The number of new CS master's graduates in North America increased twofold from 2012 to 2021, but plateaued from 2018 to 2021.

The majority of CS master's graduates in North America remained international, with 65.2% of new graduates in 2021.

Interestingly, the number of international CS master's students in North America declined in 2016 after steadily rising in the early 2010s.

Trends in Postsecondary Computer Science Education

Since 2010, there has not been a significant increase in the number of new PhD graduates in computer science in North America.

The number of CS PhD graduates decreased in 2021 compared to 2020 and 2012.

The proportion of international CS PhD graduates in North America has been increasing since 2010, reaching 68.6% in 2021.

A higher percentage of new CS PhD students are specializing in Al, with 19.1% in 2021, a significant increase since 2012.

An increasing number of Al PhD graduates are opting to work in industry rather than academia, with 65.4% in 2021 compared to 41.6% in 2011.

The number of new Al PhD graduates entering government has remained relatively unchanged.

Employment and Faculty Trends in Al and CS Education

NewAl PhD graduates in North America were employed as follows in 2021: 65.44% in industry, 28.19% in academia, and 0.67% in government.

A subset of new Al PhDs go unaccounted for in the CRA survey as they become self-employed, unemployed, or report "other" employment statuses.

The AI Index Report states that including data on computer science faculty with postsecondary students better highlights trends, with faculty numbers growing by 32.8% since 2011, according to the CRA Taulbee Survey.

Figure 5.1.10 shows the total number of CS, CE, and information faculty in North American universities, having marginally increased by 2.2% in the last year.

In 2021, there were 6,789 CS faculty members in the United States, marking a 39.0% increase in numbers since 2011.

Figure 5.1.12 reports the total number of new CS, CE and information faculty hires in North America.

Decrease in New Faculty Hires for CS, CE, and Information Departments in North America

The number of new faculty hires in CS, CE, and Information departments in North America has decreased from 733 in 2012 to 710 in 2021.

The total number of tenure-track hires peaked in 2019 at 422 and declined to 324 in 2021.

In 2021, 40% of new hires came straight from receiving a PhD, while only 11% came from industry.

The share of filled new faculty positions in North American universities has remained stable, with 89.3% of new positions being filled in 2021 compared to 82.7% in 2011.

Among open faculty positions, the most common reason for remaining unfilled was offers being turned down (53%), followed by hiring still in progress (22%) and difficulty finding a candidate who met hiring goals (14%).

Trends in CS, CE, and Information Faculty Hiring and Losses in North American Departments from 2015-2021

Figure 5.1.15 in Chapter 5 of the AI Index Report shows percentages of hiring progress among CS, CE, and Information faculty from 2015 to 2021.

Figure 5.1.16 shows the median nine-month salaries of CS faculty in the United States by position since 2015. During that period, salaries for all classes of professors have increased.

In 2021, the average Median Nine-Month Salary of CS Faculty in the United States was $170.57K, an increase from 2015.

Only 13.2% of new CS, CE, and Information faculty hires were international in 2021, according to Figure 5.1.17.

A majority of faculty losses in North American departments (36.3%) were due to faculty taking academic positions elsewhere, per Figure 5.1.18. In 2021, 15.2% of faculty took nonacademic positions while roughly the same amount took such positions in 2011 (15.9%).

Analysis of Funding Sources for US CS Departments

The National Science Foundation (NSF) is the main funder of US CS departments and accounted for 34.9% of external funds in 2021, down from its share since 2003.

Defense agencies such as the Army Research Office, the Office of Naval Research, and the Air Force Research Laboratory are the second-largest sources of funding (20.3%); followed by industrial sources (12.1%); the Defense Advanced Research Projects Agency (DARPA) (8.8%); and the National Institutes of Health (NIH) (6.8%).

The decrease in NSF funds has been partially compensated by increasing funds from industry and NIH.

Private universities spend more on computing research than public universities with median total expenditure of $9.7 million compared to $5.7 million (2011-2021).

Total median expenditures have increased for both private and public universities, but the gap in expenditure has widened over the decade. provides data on the state of K-12 CS education in the United States.

State of K-12 AI Education in America: Tracking Trends

Figure 5.2.1 highlights 27 states that in 2022 required that all high schools offer a computer science course.

Figure 5.2.2 shows that Maryland, South Carolina, and Arkansas have the highest percentage of public high schools that teach computer science.

Public high schools teaching computer science in each state are presented in a table, with Michigan having 46%, Indiana having 85%, and Kentucky having 63%, among others.

The total number of AP computer science exams taken increased year over year, with 181,040 AP computer science exams taken in 2021.

The number of AP computer science exams taken has increased over ninefold since 2007, but the pandemic may have caused a leveling off in the numbers.

There are two types of AP computer science exams: Computer Science A and Computer Science Principles.

AP Computer Science Exam Data for 2021

California had the highest number of AP computer science exams taken in 2021 with 31,189, followed by Texas, Florida, New York, and New Jersey.

Maryland had the highest per capita amount of AP CS exams taken in 2021 with 124.1 exams per 100,000 inhabitants, followed by New Jersey, Connecticut, California, and Massachusetts.

Figure 5.2.5 normalizes the number of AP CS exams taken by dividing the total number of exams taken in a state by the state's population based on the 2021 U.S. Census.

UNESCO released a report in 2021 on the international state of government-endorsed AI curricula, which included surveys to representatives of 193 UNESCO member states and over 10,000 private- and third-sector actors. Respondents were asked to report on the status of AI curricula for students in K-12 general education.

Government Implementation of Al Curricula by Country, Status, and Education Level

The 2023 Al Index Report by UNESCO provides a table that identifies which countries have endorsed and implemented Al curricula and at which education levels.

Germany is in the process of developing Al curricular standards for primary, middle, and high-school levels while China has already implemented Al standards for the same levels.

Serbia has both implemented and is in the process of developing K-12 Al curricula according to the report.

A chart in the report shows the topic areas most emphasized in K-12 Al curricula: algorithms and programming, Al technologies, data literacy, and application of Al to other domains.

The report includes a sample K-12 Al curriculum, the Austrian Data Science and Artificial Intelligence curriculum. It covers digital basics, including safe digital media use and careers in ICT, and engages students with programming languages, algorithms, simulations, and data literacy.

Education and Policy on Artificial Intelligence

Students learn about digital media and the ethical dilemmas associated with technology use.

They are encouraged to actively participate in social discourse on such issues.

The AI Artificial Intelligence Index Report 2023 refers to the significance of AI governance, and the need for intergovernmental cooperation in strategizing its governance.

Chapter 6 of the report explores the policies and governance of AI technologies globally.

The US leads in setting AI policies, with an increase in legislative records mentioning AI from just 1 in 2016 to 37 in 2022.

The chapter also analyzes the investment and legal cases related to AI in the US.

Increase in AI-related legislation and policy proposals globally

Analysis of parliamentary records shows a nearly 6.5 times increase in mentions of Al in global legislative proceedings since 2016.

The US passed more AI bills than ever before, with 2% of all federal AI bills passed in 2021 and 10% in 2022. Likewise, 35% of all state-level AI bills were passed into law last year.

The US government has increased spending on Al-related contracts by roughly 2.5 times since 2017.

Policymakers worldwide have varied perspectives on Al, with discussions on Al-led automation risks in the UK, safeguarding human rights in Japan, and using Al for weather forecasting in Zambia.

The legal world has seen a seven times increase in AI-related legal cases in US state and federal courts since 2016, with most originating in California, New York, and Illinois.

Since 2016, 31 out of 127 analyzed countries have passed at least one AI-related bill, with a total of 123 Al-related bills passed from 2016 to 2022.

Overview of Al-Related Bills Passed in Select Countries

Note that the analysis of passed Al policies may undercount the number of actual bills due to the presence of large bills including multiple sub-bills in relation to Al.

The full list of countries analyzed for Al policies is in the Appendix, but publicly accessible legislative databases were not available for certain countries.

Figure 6.1.1 shows the number of Al-related bills passed into law in 127 select countries from 2016-22.

Figure 6.1.2 breaks down the number of laws containing mentions of Al that were enacted in 2022, with the US leading the list with 9 laws.

Figure 6.1.4 shows the total number of laws passed since 2016, with the US again leading with 22 bills.

Chapter 6 also covers a closer look at Al-related legislation passed into law during 2022 from select countries, with Figure 6.1.5 displaying five different countries' laws.

The State of Artificial Intelligence Legislation in the US

A new act has been created to review, assess, and evaluate the state of Philippine education and recommend innovative policy reforms in education to meet the challenges of the Fourth Industrial Revolution and rapid development of artificial intelligence.

This act also requires artificial intelligence algorithms involved in public administrations' decision-making to take bias-minimization criteria, transparency, and accountability into account, whenever technically feasible.

The Office of Management and Budget is required to establish or provide an AI training program for the acquisition workforce of executive agencies to ensure that they have knowledge of the capabilities and risks associated with AI.

The number of proposed bills related to AI has sharply increased from just one in 2015 to 134 in 2021 in the US.

The number of passed bills related to AI has increased to 9 in 2022, with Maryland leading the list with 3 passed bills.

California leads the list with 5 laws containing mentions of AI followed by Maryland with 3.

Maryland has passed the most state-level AI-related bills with 7 bills, followed by California, Massachusetts, and Washington.

Figure 6.1.9 highlights the number of state-level AI-related bills passed by all states since 2016.

Analysis of State-Level Al Legislation in the United States

The Al Index report shows a significant spike in the number of Al-related bills proposed at the state level in 2022, totaling 60 bills compared to just five in 2015.

Out of the proposed bills, 21 were passed into law, indicating a growing interest in Al policy making and governance.

Figure 6.1.10 shows the number of state-level Al-related bills proposed and passed from 2015 to 2022.

The subsection highlights Al legislation passed in Alabama, California, Maryland, New Jersey, and Vermont in 2022, including bills to limit the use of facial recognition and regulate the use of Al in state government.

The Role of Artificial Intelligence in Legislative Proceedings and Government Policies

An appropriations bill for the 2022-23 fiscal year grants California State University, Sacramento, $1.3 million to improve the campus childcare center, including the development of an AI mixed-reality classroom.

Another act establishes that the Department of Natural Resources must study and assess the potential for AI and machine learning to aid Chesapeake Bay restoration and climate solutions.

A provision concerning the modernization of state government sites requires the chief technology officer to evaluate annually the feasibility of state agencies employing AI and machine learning to offer public services.

The Act creates the Division of Artificial Intelligence to oversee all AI aspects in state government, including developing a state code of ethics on AI use and making policy/ regulatory recommendations to the General Assembly.

The Al Index report analyzes mentions of AI in government proceedings in 81 countries, revealing that mentions of AI in legislative proceedings slightly decreased from 2021 to 2022, from 1,547 to 1,340.

Spain had the highest number of mentions of AI in legislative proceedings in 2022, followed by Canada, the United Kingdom, and the United States.

Mentions of AI in Global Policy and Governance

Figure 6.1.14 from the AI Index Report 2023 shows a total of 81 countries with at least one mention of AI in legislative proceedings in the past seven years. The UK had the most mentions (1,092), followed by Spain, the US, Japan, and Hong Kong.

The subsection on global AI mentions in the AI Index Report 2023 examines mentions of AI in government proceedings of various countries in 2022.

Figure 6.1.15 in the AI Index Report 2023 quotes discussions from geographically diverse countries, such as Australia, Brazil, Japan, the United Kingdom, and Zambia.

The parliamentary mentions of AI from select countries in 2022 are listed in the AI Index Report 2023. Each mention contains the name and political affiliation of the speaker, and their role in government proceedings.

The quotes reveal the different perspectives on AI in relation to industry and manufacturing, the future of work, and human rights.

The Role of Artificial Intelligence in Legislation and Governance

A need for constitutional arguments to guarantee individual autonomy and data protection in the digital age is emphasized.

Concerns about potential over-automation, discrimination, and regulation in the use of AI are raised.

The government partners with the University of Zambia to develop a seasonal weather forecasting system with AI support.

The National Reconstruction Fund Corporation Bill 2022 and amendments to the Consolidation of Labor Laws are presented for discussion.

The Commission on the Constitution and Financial Services and Markets Bill consider policies related to AI.

The number of mentions of AI in committee reports produced by House and Senate increases significantly from the 115th legislative session.

The Appropriations and Homeland Security and Governmental Affairs Committees lead the mentions of AI in U.S. House and Senate reports for the 117th Congressional Session, respectively.

Mentions of Artificial Intelligence (Al) in U.S. Congressional Reports and Policy Papers

The Al Index Report tracks the number of mentions of Al in committee reports from the past 10 congressional sessions, which lead by the House and Senate Appropriations Committees.

The report includes a list of U.S.-based organizations that published policy papers related to Al, including think tanks and policy institutes, university institutes and research programs, civil society organizations, associations, and consortiums, industry and consultancy organizations, and government agencies.

The policy papers are categorized into primary and secondary topics, where the primary topic is the main focus of the paper, and the secondary topic is a subtopic of the paper.

The Al Index Report presents the total number of policy papers published by U.S.-based organizations related to Al from 2018 to 2022.

The report provides a preview of Chapter 6, which discusses Al and policymaking, including the number of mentions of Al in committee reports of the U.S. Congress.

Overview of AI-related policy and governance

Figure 6.1.21 shows an increase in the total number of U.S.-based AI-related policy papers from 2018 to 2022, with the most frequent primary topics in 2022 being industry and regulation, innovation and technology, and government and public administration.

Social and behavioral sciences, humanities, and communications and media received comparatively little attention in AI-related policy papers by U.S.-based organizations.

Figure 6.2.1 illustrates that a total of 62 national AI strategies have been developed worldwide since March 2017, with the number of released strategies peaking in 2019.

Figure 6.2.2 highlights the countries that have either released or developed a national AI strategy as of December 2022, while Figure 6.2.3 enumerates the countries that pledged to develop an AI strategy in 2021 and 2022, with Italy and Thailand being the only nations to release national AI strategies in 2022.

Countries with National AI Strategies and Public Investment in AI in the US

The Al Index 2022 lists the yearly release of AI national strategies by country from 2017 to 2022, with Italy and Thailand joining the list in 2022, and Table 6.2 shows the countries with national AI strategies from 2017 to 2022.

The AI Index research team admitted that they might have missed some strategies, and Figure 6.2.1 shows the countries with national AI strategies in development from 2021 to 2022.

The US public investment in AI for nondefense R&D is examined in section 6.3 of the AI Index Report, and a report by the National Science and Technology Council reveals that the total amount allocated to AI R&D spending from nondefense US government agencies in FY2022 is $1.7 billion.

The report specifies that the budget does not include classified AI R&D investment by defense and intelligence agencies, and there is a request for $1.8 billion in FY2023. The previous year's total was $1.53 billion, but it increased to $1.75 billion in 2022, according to reports.

The AI Index Report 2023 is a resource that measures the progress, impact, and worldwide landscape of AI.

US Government Spending on Artificial Intelligence

The US Department of Defense (DoD) requested $1.1 billion for research, development, test and evaluation of artificial intelligence for FY 2023, a 26.4% increase from funding received in FY 2022. (FY20 Funding, FY21 Funding, FY22 Funding, FY23 Funding)

Govini analyzed federal contracts data using machine learning and natural language processing, revealing that total US government spending on AI has increased nearly 2.5 times since 2017. (287LA, Total Contract Spending)

Decision science ($1.2 billion) and computer vision ($0.8 billion) received the greatest amounts of government spending on AI in 2022. (Total Contract Spending, Figure 6.3.3)

Govini reported increased spending on decision science, computer vision, and autonomy while spending on machine learning and natural language processing slightly dropped. (Figure 6.3.4)

U.S. Public Investment and Legal Cases in AI

In FY 2022, federal Al contracts were mostly prime contracts, followed by grants and OTA awards.

The share of contracts remained about the same while the share of grants increased from FY 2021 to FY 2022.

From FY 2017-22, the total value awarded for AI/ML and Autonomy by the U.S. government was 2.05 billion for contracts, 1.15 billion for grants, and 0.09 billion for OTAs.

In 2022, AI Index partnered with Elif Kiesow Cortez to research Al-related legal cases from 2000 to 2022 and found a sharp spike in jurisprudence.

Total Al-related legal cases in the U.S. federal and state courts in 2022 were 110, 6.5 times more than in 2016.

Most Al-related legal cases in 2022 originated in California, Illinois, and New York.

Large businesses that have integrated AI are based in California and New York, which may explain their higher number of Al legal cases.

AI-Related Legal Cases in the United States

Illinois has seen an increase in AI-related legal cases due to the Biometric Information Privacy Act.

Figures 6.4.2 and 6.4.3 provide information on AI-related legal cases by state and district.

Financial and professional services had the greatest number of AI-related legal cases in the US in 2022.

Civil law accounted for the largest proportion of AI-related legal cases, followed by intellectual property and contract law.

The section highlights three significant AI-related legal cases in the US.

Duerr v. Bradley University case is profiled, which raises legal issues when AI is brought into the courts.

The Use of AI Technologies in Legal Cases

A private university in Illinois required the use of Respondus Monitor, an AI-powered proctoring tool for online exams during the fall 2020 semester.

The plaintiffs in the case Flores v. Stanford claimed that the defendants violated Illinois' BIPA by not following its guidelines for the collection of biometric information.

BIPA does not apply to financial institutions, and under the Gramm-Leach-Bliley Act, the defendants were considered a financial institution, resulting in the plaintiff's case being dismissed.

Northpointe, Inc. petitioned to prevent the disclosure of confidential information about its COMPAS AI tool during court proceedings in the case Chapter 6 Preview.

The court ruled in favor of releasing the material under a protective order as it was relevant to the plaintiff's case and posed little risk of competitive injury.

In Dyroff v. Ultimate Software Grp., Inc, the plaintiff claimed that the defendant's use of algorithms to recommend drug-related discussion groups on its social network site was responsible for her son's overdose.

The court ruled in favor of the defendant, noting that the use of algorithms did not constitute the creation of novel content and allowing partial immunity under the Communications Decency Act.

AI Index Report 2023 Highlights Diversity in Al, Primarily from Academia

Al systems are increasingly deployed in the real world, creating a disparity between those who develop and those who use Al, with North American Al researchers and practitioners predominantly white and male.

Chapter 7 highlights data on diversity trends in Al sourced primarily from academia, including organizations like Women in Machine Learning (WiML) and the Computing Research Association (CRA).

Diversity trends in Al education, including North American bachelor's, master's, and PhD-level computer science students becoming more ethnically diverse, are also covered.

Publicly available demographic data on trends in Al diversity is sparse, so this chapter does not cover other areas of diversity, such as sexual orientation.

The Al Index hopes that, as Al becomes more ubiquitous, the amount of data on diversity in the field will increase, allowing for more thorough coverage in future reports.

Women's Representation in Al Education and Faculty

Since 2017, the proportion of new female CS, CE, and information faculty hires has increased from 24.9% to 30.2%. However, most faculty in North American universities are male (75.9%), and only 0.1% identify as nonbinary.

In 2021, 78.7% of new Al PhDs were male, and gender imbalance persists in higher-level Al education.

American K-12 computer science education has become more diverse, with more female and ethnic minority students taking AP computer science exams.

The Women in Machine Learning (WiML) NeurIPS Workshop, founded in 2006, aims to increase the impact of women in machine learning. Since 2020, the Un-Workshop has been fostering collaboration among participants from diverse backgrounds at the International Conference of Machine Learning (ICML).

From 2010 to 2022, the number of WiML workshop participants increased from 92 to 1,157. However, the recent decrease in attendance may be due to the overall drop in NeurIPS attendance, partly caused by the shift away from a purely virtual format.

Figure 7.1.2 breaks down the continent of residence of the 2022 workshop participants.

Demographic Data on NeurIPS Women in Machine Learning Workshop Participants

41.5% of survey respondents at the NeurIPS Women in Machine Learning Workshop were from North America, followed by Europe (34.2%), Asia (17.1%), and Africa (3.4%).

There was greater representation from Europe, Asia, and South America in 2022.

The majority of participants were female-identifying (37%), followed by male-identifying (25.8%), and nonbinary-identifying (0.5%).

Most participants were PhD students (49.4%), followed by research scientists/ data scientists (20.8%), software engineers/data engineers (8.4%), and faculty (4.4%).

The most popular submission topics at the workshop were applications (32.5%), algorithms (23.4%), and deep learning (14.8%).

The number of female CS bachelor's graduates rose to 22.3% from 2020 to 2021 according to the Computing Research Association's (CRA) annual Taulbee Survey.

Ethnicity and Gender of CS Graduates in North America

The number of women CS graduates has increased in the last decade, with 77.66% male and 22.30% female graduates in 2020, and 0.04% identifying as nonbinary/other in 2021. (2011-2021)

The CRA survey reports only domestic/native CS students' and faculties' ethnicity, but data on nonresident aliens' ethnicity is excluded. (2017-2023)

In North America, the top ethnicity of new CS bachelor's graduates is white (46.7%), followed by Asian (34.0%), Hispanic (10.9%), and multiracial (4.1%) in 2021. (2011-2021)

The proportion of female CS master's graduates has slightly increased over time, moving to 27.8% in 2021 from 24.6% in 2011, and 0.9% identified as nonbinary/other in 2021. (2011-2021)

Of domestic students, white, Asian, and Hispanic ethnicities are most represented in CS. (2023)

Ethnic and gender demographics of CS master's and PhD graduates

White students have accounted for a decreasing proportion of new CS master's graduates in the last decade.

In 2021, 65.2% of new CS master's graduates were nonresident aliens.

Asian students represent the highest proportion of CS PhD graduates, followed by white, Hispanic, and Black or African-American students.

Most new CS PhD graduates continue to be male, with a large gap between male and female graduates.

The number of new white resident CS PhD graduates has declined by 9.4 percentage points between 2011 and 2021.

Only a small proportion of CS, CE, and Information students reported needing disability accommodations.

Demographic Trends in Artificial Intelligence Education and Faculty

Figure 7.2.8 shows that 78.7% of new Al PhDs in 2021 were male and 21.3% were female, with no significant gender trend in the last decade.

Most CS, CE, and information faculty members in North America are male (75.9%), while female faculty members have increased by 5 percentage points since 2011 (Figure 7.2.9).

The proportion of women among new CS, CE, and information faculty hires increased by 9 percentage points since 2015, reaching 30.2% in 2021 (Figure 7.2.10).

White faculty members make up the majority (58.1%) of resident CS, CE, and information faculty in 2021, followed by Asian faculty members (29.7%) (Figure 7.2.11).

The ethnic gap between white faculty members and faculty members of the next nearest ethnicity has decreased from 46.1% in 2011 to 28.4% in 2021.

Diversity in Computer Science Education

In 2021, 6.7% of CS, CE, and information faculty in North America were nonresident aliens.

The distribution of ethnicity in computer science education as of 2020 was White (58.08%), Asian (29.70%), Hispanic (2.54%), African-American (0.67%), Multiracial (0.25%), American Indian or Alaska Native (0.13%), and Native Hawaiian or Pacific Islander.

The proportion of female students taking AP computer science exams has almost doubled in the last decade, with 30.6% female and 69.2% male students taking the exam in 2021.

On a percent basis, the states with the largest number of female AP computer science test-takers in 2021 were Alabama (36%), Washington, D.C. (36%), Nevada (35%), Louisiana (35%), Tennessee (35%), Maryland (35%), and New York (35%).

Other states with notable CS and Al activity include California, Texas, and Washington, with rates of women taking AP computer science tests hovering around 30%.

Ethnicity Trends in AP Computer Science Test-Takers

White students took the highest proportion of AP computer science exams in 2021 (42.7%), followed by Asian (28.8%) and Hispanic/Latino/Latina students (16.5%).

Over time, the pool of AP computer science test-takers is becoming increasingly ethnically diverse, with more Asian, Hispanic/Latino/Latina, and Black/African American students taking exams.

The Al Index Report 2023 highlights the potential of Al to transform society, and understanding public opinion is crucial for its development, regulation, and use.

The chapter examines global, national, demographic, and ethnic opinions of Al, including Al researchers' views, and analyses social media discussions around Al's impact in 2022.

Longitudinal survey data related to Al is scarce, and two global surveys by IPSOS and Lloyd's Register Foundation and Gallup, along with a US-specific survey by PEW Research, inform the chapter's insights.

Public Opinion on Artificial Intelligence

Chinese citizens have the most positive attitudes towards Al products, according to a 2022 IPSOS survey.

People across the world, especially in America, remain unconvinced by self-driving cars.

Men tend to feel more positively about Al products and services than women, and are more likely to believe that Al will mostly help rather than harm.

Different demographics have different causes for excitement and concern about Al products and services.

NLP researchers have strong opinions on Al, such as private Al firms having too much influence and Al potentially leading to societal change.

Public perceptions concerning Al differ across countries and by demographic groups.

Global Attitudes Toward AI Products and Services

IPSOS conducted a survey on the opinions of 19,504 adults aged 16-74 in 28 countries regarding AI.

The survey results suggest that 60% believe AI products and services will change their daily lives and make them easier.

52% feel that AI products and services have more benefits than drawbacks, and only 40% feel nervous about them.

Opinions vary widely across countries, with China, Saudi Arabia, and India having the most positive views, while France and Canada have the most negative views.

Figure 8.1.3 in the report breaks down answers to all questions by country.

Sentiment Analysis of Artificial Intelligence Products and Services

Chinese respondents have the most positive sentiment towards AI products and services.

87% of Chinese respondents claim that AI products and services make their lives easier.

76% of Chinese respondents reported trusting opinions about AI.

American respondents are among the most negative towards AI products and services.

Only 41% of American respondents claim that AI products and services make their lives easier.

Furthermore, only 35% reported trusting AI companies as much as other companies.

52% of American respondents report that AI products and services make them nervous.

60% of all respondents believe that AI will profoundly change their daily lives in the next 3-5 years.

47% of respondents know which types of products and services use AI.

55% of respondents trust companies that use AI as much as other companies.

Public Opinion on Artificial Intelligence in Different Countries

Figure 8.1.3 in the AI Index Report 2023 breaks down opinions on AI in all countries across different demographic groups.

IPSOS survey results suggest that men feel more positively about AI products and services compared to women.

Age-specific opinions vary, with individuals under 35 being more likely to report that AI products and services make their lives easier.

Higher-income households are more positive about AI products and services.

Figure 8.1.4 provides a detailed breakdown of opinions about AI by demographic group.

In 2021, a poll by Lloyd's Register Foundation and Gallup of 125,911 people across 121 countries found that opinions about whether AI will mostly help or harm people in the next 20 years vary globally (Figure 8.1.5).

Global Perceptions of the Impact of Artificial Intelligence (AI)

According to the Lloyd's-Gallup poll, 39% of respondents believe that AI will mostly help people in the next 20 years, while 28% believe it will mostly harm.

Men in the survey were more likely than women to believe that AI will mostly benefit people.

The majority of respondents (65%) reported feeling unsafe in self-driving cars, according to the Lloyd's Register survey.

The regions with the most optimistic perceptions of AI's impact are Eastern Asia, Northern/Western Europe, and Southern Europe.

Conversely, the regions with the most pessimistic views of AI's potential benefits include Eastern Africa, Northern Africa, and Southern Africa.

In a 2022 survey by Pew Research, Americans were found to have mixed opinions on AI, with 47% seeing it as a positive development and 44% expressing worries.

Americans' Perspectives on Artificial Intelligence

A survey conducted on 10,260 Americans revealed that 45% of respondents feel equally concerned and excited about the use of Al programs in daily life, while only 18% feel more excited than concerned.

Americans’ most preferred Al applications include performing household chores (57%) and repetitive workplace tasks (46%). They are also excited about Al being used to diagnose medical problems (40%).

The survey also revealed that 74% of Americans are concerned about Al being used to make important life decisions for people and to know people's thoughts and behaviors (75%).

Americans believe the use of facial recognition technology by police (46%) and social media companies using Al to find false information on their sites (40%) are good ideas for society.

Loss of human jobs (19%) and surveillance (17%) are the main reasons why some Americans reported being more concerned than excited about Al.

Americans' Attitudes towards Artificial Intelligence

Americans are concerned about the potential loss of human jobs (20%), surveillance, hacking, and digital privacy (16%), and the lack of human connection (12%) due to AI.

Despite concerns, Americans reported being less worried about the potential loss of freedom and issues related to lack of oversight and regulation.

The potential to make life better (31%) and save time (13%) are the two leading reasons that Americans are excited about AI.

Respondents felt that AI systems most reflected the experiences and views of men (51%) and white adults (48%) over other groups.

There are concerns about the power of AI becoming too great and people misusing it, as well as the unforeseen consequences and effects.

While AI is seen as a way to handle mundane or dangerous tasks, it is also feared that people may become too reliant on it and lose important human qualities.

There is also concern about human bias being coded into AI, as well as concerns about government and tech companies using it.

Some respondents based their fears on sci-fi rather than reality, while others shared personal anecdotes about the benefits and drawbacks to AI.

Attitudes of the NLP Research Community towards AI

Respondents considered experiences and views of Asian, Black, and Hispanic adults less positively than those of white adults.

A survey of 480 NLP researchers was conducted in May-June 2022 on the state of the NLP field, AGI, and ethics, among others.

Private firms were considered to have too much influence by 77% of respondents, while 86% believed industry would produce the most widely cited research.

67% of respondents agreed that most of NLP was dubious science, and 30% believed an "NLP winter" was coming in the next decade.

51% agreed that LM systems understood language, with 67% believing multimodal models understood language.

89% of respondents felt NLP's past net impact was positive, and 87% believed its future impact would continue to be good.

The Ethical Divide in Using AI for Predicting Psychological Characteristics

A survey found that 48% of the community members feel that predicting psychological characteristics using AI is unethical.

60% of the researchers feel that the carbon footprint of AI is a major concern, but only 41% believe that NLP should be regulated.

While most researchers believe that AI could lead to revolutionary changes (73%), only 36% feel that AI decisions could cause nuclear-level catastrophe.

Researchers opined that there is too much focus on benchmarks (88%), and more interdisciplinary work should be done (82%).

57% of the researchers feel that recent progress is leading the AI community towards Artificial General Intelligence (AGI).

Ethical concerns mostly reduce to data quality and model accuracy, as per the NLP community.

Public Opinion on AI in 2022-2023

Only 17% of the NLP community agreed or weakly agreed that scaling solves any important problem, with 50% emphasizing the importance of linguistic structure.

Linguistic structure and expert inductive biases are considered necessary according to the NLP community in 2022.

The NetBase Quid platform analyzed social media posts related to various AI models and releases, indicating AlphaCode was the most positively received, with users embracing its practical use cases.

ChatGPT, a conversational AI model, has generated mixed sentiment on social media due to concerns around its ethical principles, political biases, and cultural beliefs.

Analysis of Social Media Sentiment on AI Models

GLM-130B's licensing language became a point of conversation on social media due to its restriction on use that may undermine China's national security and unity.

Jesse Wood, a technology influencer, gained significant traction for his Twitter thread on GLM-130B's licensing language.

ChatGPT dominated consumer conversation in 2022, but sentiment was mixed due to initial excitement and later realization of its limitations.

In Q2 2022, conversation around LaMDA exploded as it was reported to be a "sentient" system, but concerns were raised about its potential to create misinformation.

Technology and policy need to be developed to thwart Al systems like LaMDA and GPT-3 that are sociopathic liars with no sense of truth, according to @GaryMarcus.

AI Index Report 2023 and Stable Diffusion Debate

Many consumers have questioned the originality of Stable Diffusion’s AI models, with some claiming that the dataset used already contains stolen works.

The figures in the AI Index Report 2023 show that ChatGPT dominated consumer conversation with a rapid rise, making up over half of consumer conversation by the end of 2022.

The report also provides a table that lists select models' share of AI social media attention by quarter in 2022, with ChatGPT being the most dominant model.

Chapter 8 of the report focuses on public opinion and includes a chart that shows the percentage associated with each model's share of social media conversation.

Chapter 1 of the report details the Center for Security and Emerging Technology's (CSET) policy research organization and its analysis of bibliometric and patent data through the Country Activity Tracker (CAT) tool.

CSET’s merged corpus of scholarly literature combines publications from various sources, including Dimensions, Web of Science, and Microsoft Academic Graph, to produce data-driven research.

Methodology for Identifying Al-relevant Publications

CSET used an English-language subset of publication corpus since 2010 to identify Al-relevant publications.

CSET researchers developed a classifier for identifying Al-relevant publications by leveraging the arXiv repository and select Chinese Al keywords.

To provide a publication's field of study, CSET matched each publication in the analytic corpus with predictions from Microsoft Academic Graph's field-of-study model.

CSET researchers recorded the most common fields of study in the corpus of Al-relevant publications, and tallied English-language Al-relevant publications by top-scoring field and publication year.

CSET also provided year-by-year citations for Al-relevant work associated with each country, and cross-country collaborations were counted as distinct pairs of countries across authors.

CSET provided publication counts by year and by publication type and counted cross-sector collaborations on academic publications in the same way as cross-country collaborations.

The Epoch Dataset: Tracking Geopolitical Trends in Al Research Contributions

The Epoch dataset is a collection of landmark Artificial Intelligence (Al) and Machine Learning (ML) models, along with information about their creators and publications, designed for geopolitical Al forecasting.

The dataset's information is coded according to a methodology that tracks the distribution of Al research contributions on landmark publications by country.

These contributions are normalized, and each paper in the dataset is given equal value.

All the landmark publications are aggregated within time periods, and the national contributions are compared over time to identify any trends.

The dataset includes a list of large language and multimodal models analyzed by the Al Index Steering Committee, including GPT-2, GPT-3, and Megatron-LM.

Analysis of Training Cost Estimates for Language and Multimodal Models

The Al Index uses hardware and training time information provided by authors or calculates it from hardware speed, training compute, and hardware utilization efficiency to estimate training costs.

If price quotes are available before and after the model's training, the Al Index interpolates the hardware's cost rate along an exponential decay curve.

The Al Index assigns weights to papers that involve researchers from multiple countries.

The Al Index uses a default hardware utilization rate of 40% if the rate is not provided.

The Al Index classifies training cost estimates as high, middle, or low based on whether they are upper bound or lower bound.

Attendance totals of various Al conferences are obtained by the Al Index by reaching out to their organizers.

The GitHub data used by the Al Index is provided by OECD.AI.

GitHub is mainly used for software developers and organizations to collaborate on projects by creating repositories.

Metrics for Al Software Development on GitHub

Al software development on GitHub can provide metrics on developers, tools, and trends.

GitHub and OECD.AI collaborate to identify public Al projects using Gonzalez et al.'s methodology.

The list of public projects is updated quarterly to track trends in software development.

Metadata of Al projects include creator, programming language, development tools, and contributions.

GitHub's topical tags are confirmed or modified by project owners to appear in the metadata.

Contributions to public Al projects are mapped to a country using Mapbox and email domain.

Contributions with no location information are assigned to the organization's country if known.

71.2% of contributions to public Al projects were mapped to a country as of October 2021.

Measuring and Classifying Public AI Projects

A decreasing trend in the identification of locations for AI projects is noticed, indicating delayed location reporting.

Collaboration in public AI projects is measured by the number of contributions made, which are divided equally among countries to obtain fractional counts.

To determine a country's contribution in a public AI project, its fractional count is added across all AI projects.

GitHub uses file extensions to tag programming languages and development tools used in AI projects.

Two quality measures, project impact, and project popularity, are used to classify public AI projects.

Domestic collaboration occurs when two contributors from the same country contribute to an AI project.

The accuracy of ImageNet data was determined through technical reviews and progress reported on Papers With Code.

Technical Performance Reports on Artificial Intelligence Algorithms and Models

To improve accuracy of imageNet classification, deep convolutional neural networks such as FixEfficient Net, Inception, and VITAEv2 with perceptual codebook and progressive neural architecture search were developed.

Self-training with noisy student techniques helps to improve accuracy in imageNet classification compared to traditional deep convolutional neural network models.

Recent progress on deepfake detection methodologies have been made with Celeb-DF and FaceForensics++ datasets. AUC scores have been taken from several papers, including Deepfake Detection via Joint Unsupervised Reconstruction and Supervised Classification and Exposing Deepfake Videos by Detecting Face Warping Artifacts.

MPII benchmark with the percentage of correct keypoints (PCK) has been used as a metric for human pose estimation algorithms. Efficient object localization using convolutional networks and Stacked Hourglass Networks for Human Pose Estimation have shown promise in the PCK metric.

Cityscapes dataset is being used for pixel-level semantic labeling. Mean intersection-over-union (mloU) is the metric for this semantic segmentation task.

Technical Progress on AI Benchmark Scores in 2023

Kvasir-SEG benchmark scores were retrieved through arXiv literature review and Papers With Code, using mean dice as the result measure.

COCO benchmark scores were also retrieved through arXiv literature review and Papers With Code, using mean average precision (mAP50) as the result measure.

CIFAR-10 benchmark scores were retrieved through arXiv literature review and Papers With Code, using FID scores as the result measure.

For each benchmark, progress was highlighted by taking scores from recently published papers.

STL-10 FID scores were also retrieved through arXiv literature review and Papers With Code, using FID scores as the result measure.

Technical Progress Report on AI Benchmark Scores

Details on the STL-10 benchmark can be found in the STL-10 paper.

Highlighted progress on STL-10 using scores from DEGAS, Diffusion-GAN, Contrastive Divergence, Dist-GAN, Soft Truncation, and Text-to-Image Models.

VQA accuracy scores from various papers including Bilinear Attention Networks, Oscar, and UNITER were compared, and human-level performance was taken from the 2021 VQA challenge.

Scores from BEIT-3 were compared with the previous state-of-the-art (SOTA) for VQA tasks.

Details on the VCR benchmark can be found in the VCR paper, and VCR Q->AR score was taken from the VCR leaderboard.

Kinetics-400, Kinetics-600, and Kinetics-700 results were compared based on papers such as Co-training Transformer with eos and Images, and Slow Fast Networks for Video Recognition.

Overview of Recent AI Research Papers

Learning Spatio-Temporal Representation with Local and Global Diffusion Masked Feature Prediction explores self-supervised visual pre-training.

PERF-Net: Pose Empowered RGB-Flow Net is a new approach to action recognition.

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-Offs in Video Classification reconsiders the balance between speed and accuracy in video learning.

Slow Fast Networks for Video Recognition proposes a method for video classification.

To highlight progress on Kinetics-700, scores were taken from multiple papers.

Masked Feature Prediction for Self-Supervised Visual Pre-training and Learn to Cycle are two approaches to self-supervised visual learning.

Text-to-Video Models on UCF-101 Data achieve impressive Inception Scores.

Details about the SuperGLUE benchmark and tasks can be found in the SuperGLUE paper.

ReClor is a benchmark for reading comprehension dataset requiring logical reasoning.

The narrative highlight provides insight into GPT-2's response to an Al Index prompt about Theodore Roosevelt's presidency.

Notes on Job and Technical Performance Texts

In the job text, the writer had to convince top aides to pay $100,000 and $90,000 respectively to hire the lead prosecutor. The writer needed to know in advance the percentage of work done by the prosecutor as a private citizen and how efficiently he could handle it for his clients. The writer was not required to put forth any of those requests, but others on the case did. The case lasted for two weeks and cost $40 million.

The Technical Performance text provides information on data retrieved from Valmeekam et al.'s (2022) paper on the Blocksworld domain for large language models. Results from the arXiv recall-oriented understudy for gisting evaluation (ROUGE-1) were obtained from the most recently published paper. Details about the arXiv benchmark are in the dataset webpage. Scores were taken from several papers to highlight progress on arXiv. The PubMed data were also retrieved through a detailed arXiv literature review cross-referenced with technical progress reported on Papers With Code. Scores were taken from several papers to highlight progress in PubMed.

Various Research Papers and Benchmarks in the AI Field

The papers "A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents," "Get to the Point: Summarization with Pointer-Generator Networks," "Long Document Summarization with Top-Down and Bottom-Up Inference," "LongT5: Efficient Text-to-Text Transformer for Long Sequences," and "PEGASUS: Pre-training with Extracted Gap-Sentences for Abstractive Summarization" discuss different techniques for abstractive summarization of long documents.

"Sparsifying Transformer Models With Trainable Representation Pooling" presents a method for reducing the computational and storage cost of transformer models in natural language processing.

The papers "Abductive Natural Language Inference (aNLI)" and "Data on Abductive Natural Language Inference (aNLI)" provide details on this benchmark and where to find more information.

The "359LAI Artificial Intelligence AI Index Report 2023" discusses trends and developments in the field of AI.

The papers "An Algorithm for Routing Capsules in All Domains," "An Algorithm for Routing Vectors in Sequences," "Improved Semantic Representations from Tree-Structured Long Short-Term Memory Networks," "Improved Sentence Modeling Using Suffix Bidirectional LSTM," "Learned in Translation: Contextualized Word Vectors," "Less Grammar, More Features," "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank," and "Self-Explaining Structures Improve NLP Models" report on progress in SST-5 Fine-Grained accuracy.

The papers "Language Models Are Few-Shot Learners," "Language Models Are Unsupervised Multitask Learners," "Scaling Instruction-Finetuned Language Models," and "Scaling Language Models: Methods, Analysis & Insights from Training Gopher" report on progress in MMLU benchmark accuracy.

The "Table of Contents" and "Appendix" provide additional organization and details on performance benchmarks.

Data on the number of commercially available machine translation systems was sourced from "The State of Machine Translation, 2022" report by Intento, a San Francisco-based startup that analyzes commercially available MT services.

VoxCeleb equal error rate (EER) data was retrieved from the VoxCeleb Speaker Recognition Challenge and the "ID R&D System Description to VoxCeleb Speaker Recognition Challenge 2022," "The IDLAB VOXSRC-20 Submission: Large Margin Fine-Tuning and Quality-Aware Score Calibration in DNN Based Speaker Verification," and "The SpeakIn System for VoxCeleb Speaker Recognition Challenge 2021" papers.

Data Collection and Analysis in AI Research

VoxCeleb and VoxCeleb2 are large-scale datasets for speaker identification and recognition respectively.

Whisper Data and Procgen Data are datasets used for large-scale speech recognition and reinforcement learning models respectively.

Information on AI system training time, number of accelerators, and performance was taken from MLPerf Training and Inference benchmark competitions.

The Al Index compiled a list of GPUs and their prices, adjusting for inflation, to analyze their performance and price.

Technical Performance and Al Ethics in AI Index Report 2023

The report presents data on carbon-emission estimates of select machine learning models sourced from Luccioni et al., 2022, and real-life examples from Strubell et al., 2019, under Technical Performance in Chapter 2.

It includes data on energy savings over time for the BCOOLER experiment from the paper Luo et al., 2022.

In Chapter 3, Technical Al Ethics, the report lists benchmark and diagnostic metrics consistently cited in the academic community and public leaderboard for fairness and bias metrics in Al.

The analysis includes papers such as "Discovering Language Model Behaviors With Model-Written Evaluations" and "SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense Reasoning Models" for measuring social biases in prompt-based multi-task learning.

Finally, the report tracks citations of the Perspective API by Jigsaw at Google, adopted widely in natural language processing, to define toxicity and reports on papers like "Controllable Natural Language Generation With Contrastive Prefixes" under Natural Language Processing Bias Metrics in Section 3.3.

Overview of Various Language Models

GLaM is a large language model for science that efficiently scales language models with the mixture-of-experts.

GLM-130B is an open bilingual pre-trained model.

Gradient-based constrained sampling from language models allows for the creation of more diverse and meaningful text.

HateCheckHIn evaluates Hindi hate speech detection models.

Holistic evaluation of language models considers various metrics to assess their performance.

An invariant learning characterization of controlled text generation studies how to control the output of text generation models.

LaMDA is a language model for dialog applications.

"Leashing the Inner Demons" is a paper about self-detoxification for language models.

Toxicity in language models may be less representative of all forms of toxicity and may contain biases, as seen in the Perspective API.

RealToxicityPrompts dataset was used to evaluate the performance of language models.

Al Ethics with Chinese Characteristics? analyzes the concerns and preferred solutions on the topic in Chinese academia.

Trends in Al Ethics were analyzed by tracking FAccT papers published in conference proceedings from 2018 to 2022.

Categorization of NeurIPS publications and job market analytics by Lightcast

NeurIPS publications are categorized into thematic workshops for real-world impact, including health, climate, finance, and developing world, among others.

Papers are labeled with a single category, and data may not be as accurate for pre-2018 publications due to broad categorization.

Trends around technical topics are tracked by counting the number of papers with titles containing keywords submitted to NeurIPS main track and related workshops.

Lightcast delivers real-time strategic insights into the job market using AI technology to analyze millions of job postings collected since 2010.

Job postings provide insight into labor market patterns and employer's required experience and skills, and are compared to government data for representativeness.

Measuring Demand for AI Skills with Lightcast Data

Comparisons between JOLTS and Lightcast reveal that Lightcast captures over 99% of total labor demand, with non-online job postings typically found in small businesses and union hiring halls.

Lightcast uses its skills taxonomy of over 31,000 skills to measure employer demand for AI skills. A job posting is considered an AI job if it mentions any of the skills in Lightcast's list of AI skills.

Lightcast's list of AI skills includes clusters such as Artificial Intelligence, Autonomous Driving, Natural Language Processing (NLP), Neural Networks, and Machine Learning.

Within each AI skill cluster, specific skills or technologies are listed, such as Amazon Textract for NLP, and TensorFlow for Neural Networks.

Other AI-related technologies and methods listed include Association Rule Learning, Confusion Matrix, Evolutionary Programming, and Hyperparameter Optimization.

AI and Robotics Skills Analysis on LinkedIn

The Al skill group on LinkedIn is comprised of machine learning, natural language processing, data structures, artificial intelligence, computer vision, image processing, deep learning, TensorFlow, Pandas, and OpenCV.

LinkedIn has more than 38,000 standardized skills identified on members' profiles, which are classified by expert taxonomists into 249 skill groupings.

For any entity (occupation, country, sector), the skill genome is an ordered list of the 50 most characteristic skills. These are identified using a TF-IDF algorithm that evaluates a skill's term frequency and inverse entity frequency.

Appendix Chapter 4 includes information on the economy, while insights for countries with less than 40% LinkedIn coverage should be interpreted accordingly.

Robotics skills on LinkedIn include advanced robotics, cognitive robotics, motion planning, Nvidia Jetson, Robot Framework, Robot Operating Systems, robotic automation software, robotic liquid handling systems, robotic programming, robotic systems, servomotor, and SLAM Algorithms. Visual image recognition skills include 3D reconstruction, activity recognition, facial recognition, and image processing.

Measuring AI Skills using LinkedIn

The IDF (Inverse Document Frequency) measures the commonness of a certain skill across LinkedIn entities. A skill unique to specific entities has an IDF of 1, while an IDF closer to 0 indicates a more common skill.

The AI Index Report 2023 uses the AI Skills Penetration indicator to measure the intensity of AI skills in a specific entity.

The intensity is calculated by computing the frequency of self-added skills, re-weighting the skill frequencies using a TF-IDF model, and computing the share of skills belonging to the AI skill group.

The AI Skills Penetration rate signals the prevalence of AI skills in a particular occupation or the intensity of AI skills used by LinkedIn members in their jobs.

LinkedIn members' titles are standardized and grouped into approximately 15,000 occupations and further standardized into approximately 3,600 occupation representatives.

An "AI" job is an occupation representative that requires AI skills, and AI talent is a LinkedIn member with explicit AI skills in their profile and/or is occupied in an AI occupation representative.

Relative AI skills penetration allows for skills penetration comparisons across countries, using the skills genome and a relevant benchmark selected.

Relative Al Skills Penetration and Hiring Rate

A country's relative Al skills penetration of 1.5 indicates that Al skills are 1.5 times as frequent as in the benchmark, for an overlapping set of occupations.

The relative penetration rate of Al skills by country is measured as the sum of the penetration of each Al skill across occupations in a given country, divided by the average global penetration of Al skills across the overlapping occupations in a sample of countries. A relative penetration rate of 2 means that the average penetration of Al skills in that country is two times the global average across the same set of occupations.

The relative Al skills penetration by country for industry provides an in-depth sectoral decomposition of Al skill penetration across industries and sample countries. A country's relative Al skill penetration rate of 2 in the education sector means that the average penetration of Al skills in that country is two times the global average across the same set of occupations in that sector.

The "Relative Al Skills Penetration by Gender" metric allows for cross-country comparisons of Al skill penetrations within each gender. A country's Al skills penetration for women of 1.5 means that female members in that country are 1.5 times more likely to list Al skills than the average female member across the same set of occupations.

The "Relative Al Skills Penetration Across Genders" metric allows for cross-gender comparisons within and across countries globally. A country's "Relative Al Skills Penetration Across Genders" for women of 1.5 means that female members in that country are 1.5 times more likely to list Al skills than the average member across the same set of occupations that exist in the country.

The Relative Al Hiring Index is the pace of change in Al Hiring Rate normalized by the pace of change in Overall Hiring Rate. It provides a picture of whether hiring of Al talent is growing at a higher, equal, or lower rate than overall hiring in a market. The Al Hiring Rate is computed following the overall hiring rate methodology, but only considering members classified as Al talent.

Overview of Methodology and Changes in Al Hiring Index Report 2023

The Al Hiring Index is equal to 1.0 when Al hiring and overall hiring are growing at the same rate year on year.

The relative Al Hiring Index shows how fast each country is experiencing growth in Al talent hiring, compared to growth in overall hiring in the country.

A ratio of 1.2 means the growth in Al talent hiring has outpaced the growth in overall hiring by 20%.

LinkedIn ramped a new version of its industry taxonomy resulting in changes to the top level five key industries.

The changes in the taxonomy resulted in the introduction of a more granular industry taxonomy with improved accuracy, resulting in improved Al Talent identification.

The methodology change introduced is in Relative Skills Penetration metrics, updating coverage to all industries.

NetBase Quid delivers Al-powered consumer and market intelligence using artificial intelligence to reveal patterns in large, unstructured datasets.

NetBase Quid's Methodology for Data Mining and Visualization

NetBase Quid employs the Boolean query in searching for keywords, topics, and focus areas within various data sources like social media, news, forums, and custom datasets.

The company gathers over 8 million company profiles and metadata from various sources that are updated weekly to provide comprehensive and accurate data.

NetBase Quid utilizes an algorithm that identifies networks and clusters of documents based on their language similarity to generate visualizations of data.

The data includes all types of companies, and investment data includes private equity/venture capital, M&A, minority stakes, and others.

NetBase Quid also analyzes earnings call transcripts for mentions of Al-related keywords from Fortune 500 companies.

The search parameters enable users to filter search results by HQ region, investment amount, operating status, organization type, and founding year.

NetBase Quid selects the 7,000 most relevant companies for visualization if the search result is more than 7,000 companies.

Notes on Data Used in Al and ML Industry Analysis

NetBase Quid's relevance algorithm selected 7,000 out of 7,500 global Al and ML companies that have received over $1.5M for the last ten years.

Private placements refer to the private sale of newly issued securities by a company to a selected investor or a selected group of investors. The stakes that buyers take in private placements are often minority stakes (under 50%).

Minority investment refers to a minority stake acquisition by the buyer in NetBase Quid, where the buyer acquires less than 50% of the existing ownership stake in entities, asset products, and business divisions.

M&A is a buyer's acquisition of more than 50% of the existing ownership stake in entities, asset products, and business divisions.

Corporate Activity-Industry Adoption section used data from the McKinsey Global Survey "The State of Al in 2022-and a Half Decade in Review."

Deloitte's "State of Al in the Enterprise" surveys were the source of data for the Corporate Activity-Industry Motivation section.

Github Copilot Survey conducted in 2022 was the source of data on the effects of Copilot on developer productivity and happiness.

Sources Used in Al Index Report 2023

The Al Index report utilized various sources of information including Deloitte's State of Al in the Enterprise reports for 2022, 2021, 2020, and 2018, and the 2017 Deloitte State of Cognitive Survey.

Deloitte surveyed 2,620 global business leaders from 13 countries between April 2022 and May 2022, and all participating companies have adopted Al technologies and are Al users.

To complement the survey, Deloitte conducted interviews with 15 Al specialists from different industries.

The Robot Installations section data was sourced from the "World Robotics 2022" report published by the International Federation of Robotics (IFR).

Data in the Education chapter was collected through the CRA Taulbee survey sent to doctoral departments of computer science, computer engineering, and information science/systems in North America.

It's noteworthy that neither Deloitte nor CRA directly surveyed students in their respective data collections.

Sources used in the Al Index Report 2023

The Al Index used data from the 2011 to 2021 iterations of the CRA survey, the State Level Data, and the AP Computer Science Data.

The State of International K-12 Education Data on the Al was taken from the UNESCO report published in 2021.

The appendix chapter 5: Education and chapter 6: Policy and Governance in the Al Index Report 2023 also provide relevant information.

The Al Index performed keyword searches for "artificial intelligence" on the websites of 127 countries' congresses or parliaments to gather information on global legislation records on the Al.

Laws passed from 2016 to 2022 and signed into law by presidents or through royal assent from state-level legislative bodies of countries like Algeria, Andorra, Antigua and Barbuda, etc., are included in the legal analysis.

AI Index Report 2023 - Legislative and Parliamentary Mentions of AI around the world and in the US

The AI Index team performed searches of the keyword "artificial intelligence" on the legislative websites of all 50 US states and to count Al-related bills passed into law and committee mentions from 2015 to 2022.

Only the final version of the bill that includes the keyword "artificial intelligence" was counted as passed into law, not just the introduced version.

For global Al mentions, the team searched the websites of 81 countries' congresses or parliaments and looked under sections named “minutes,” “hansard,” etc. to find keyword mentions.

Counts included only searched mentions of "artificial intelligence" in the only public document released from the National People's Congress meetings in China, the Report on the Work of the Government, delivered by the premier.

The report includes individuals and organizations’ views on government policies and AI development.

Analysis of Al policy papers published by 55 organizations in the US

The report aims to develop a nuanced understanding of the thought leadership behind Al policy.

Policy papers of 55 organizations with a strong presence in the US were analyzed, covering civil society, consultancy, government agencies, private sector companies, think tanks, and university institutes and research programs.

The report utilized 17 keyword-based broad topic areas to categorize the content of the papers.

The list of organizations expanded from last year's 36 to include the Algorithmic Justice League, Amnesty International, Carnegie Endowment for International Peace, and others.

The appendix includes chapter 6 on policy and governance.

The report's methodology includes collecting underlying keywords and analyzing discourse related to Al between 2018 and 2021.

National AI Strategies and Federal Budget for Nondefense AI R&D

The Al Index identified national AI strategies for countries like Cyprus, Australia, China, Colombia, and United States among others.

Some identified countries with an asterisk did not have their actual strategy found and instead linked to a news article confirming the launch.

The federal U.S. budget for nondefense AI R&D was taken from previous editions of the Al Index and National Science and Technology Council reports.

Data on DoD nonclassified AI-related budget requests was also taken from previous Al Index editions and Defense Budget Overview reports.

Govini, the leading commercial data company in defense technology, created which is used in the national security sector of the U.S. federal government.

Govini's AI-powered platform for national security insights and diverse data sources in AI-related research

Govini enables government analysts, program managers, and decision-makers to gain unprecedented visibility into national security companies, capabilities, and capital via AI and machine learning (ML) to solve varied challenges.

Govini's recent scorecard focuses on critical technologies, placing AI and machine learning among its six subsegments.

By generating search terms and filtering out erroneous results, Govini creates a comprehensive yet discriminant taxonomy of mutually exclusive subsegments.

Govini's SaaS Platform and National Security Knowledge Graph establish high fidelity standards to accurately depict federal spending and the supporting vendor ecosystem over time.

To identify AI-related legal cases, the Al Index research team does a keyword search on the LexisNexis database, using Al, machine learning, and automated decision-making as keywords for cases variable interest.

The Al Index also uses's and CRA's (Computing Research Association) diversity data, which readers can find more about in Chapter 5 of the Appendix.

The methodology used by IPSOS, Lloyd's Register Foundation and Gallup, and Pew Research surveys, featured in the report, are detailed in their respective surveys.

NetBase Quid's Social Media Analysis of Public Opinion on Advancements in Artificial Intelligence

NetBase Quid collects data from over 500 million social media sources in real-time.

The analysis of this data is done through Al-powered Natural Language Processing.

The process breaks out posts by filters such as positive and negative sentiment, emotions, and behaviors to reach deeper insights.

The social media conversation on Al advancements from January 2022 to December 2022 was analyzed.

Key drivers of general sentiment around Al advancements were identified, such as ethical, cultural, and economic concerns.

A targeted analysis of the conversation around major Al model updates and releases in 2022 was conducted.

The analysis showcases the relationship between public perception and the advancement of Al.

Last updated