IT |
気になる、記になる… |
LINE Pay、「Apple Gift Card」を贈ると3%ポイント還元されるバレンタインキャンペーンを開催中(30時間限定) |
https://taisy0.com/2023/02/13/168416.html
|
applegiftcard |
2023-02-13 13:08:32 |
AWS |
lambdaタグが付けられた新着投稿 - Qiita |
エンジニアインターン15日目 |
https://qiita.com/a27879038/items/31d79060e8c41762d889
|
解決 |
2023-02-13 22:24:54 |
python |
Pythonタグが付けられた新着投稿 - Qiita |
Pythonチートシート - 正規表現の使い方一覧 |
https://qiita.com/K_Nemoto/items/808e9bad21089ba8b556
|
mlforbe |
2023-02-13 22:57:43 |
python |
Pythonタグが付けられた新着投稿 - Qiita |
Solidityで「お薬手帳」を作る -テストネットにデプロイ編- Block 3 |
https://qiita.com/TakenoKinoko/items/f32ee9b4360c5aee5e01
|
block |
2023-02-13 22:52:07 |
python |
Pythonタグが付けられた新着投稿 - Qiita |
エンジニアインターン15日目 |
https://qiita.com/a27879038/items/31d79060e8c41762d889
|
解決 |
2023-02-13 22:24:54 |
js |
JavaScriptタグが付けられた新着投稿 - Qiita |
Array.prototype.join()への愚痴 |
https://qiita.com/queuek/items/47a2dc58160b12491c10
|
arrayprototypejoin |
2023-02-13 22:54:26 |
AWS |
AWSタグが付けられた新着投稿 - Qiita |
[RDS][Aurora] Reader 2台構成でインスタンスタイプ変更にかかるダウンタイムを調査してみた |
https://qiita.com/t_k2/items/20439ed278dcdda0c59e
|
aurora |
2023-02-13 22:24:41 |
Docker |
dockerタグが付けられた新着投稿 - Qiita |
Laravelでシンボリックリンク作成 |
https://qiita.com/Qubieeee/items/a03059dea4ebbcd88947
|
docker |
2023-02-13 22:47:01 |
Docker |
dockerタグが付けられた新着投稿 - Qiita |
Docker composeを使ったWebアプリ開発環境の構築 |
https://qiita.com/tiri/items/0bcdd2f62e572eb5faeb
|
dockercompose |
2023-02-13 22:10:06 |
golang |
Goタグが付けられた新着投稿 - Qiita |
Golang初心者向け!mapの基礎から応用 |
https://qiita.com/kanupg/items/216a6cb9a212e825d7bc
|
非常 |
2023-02-13 22:08:20 |
Ruby |
Railsタグが付けられた新着投稿 - Qiita |
Docker composeを使ったWebアプリ開発環境の構築 |
https://qiita.com/tiri/items/0bcdd2f62e572eb5faeb
|
dockercompose |
2023-02-13 22:10:06 |
海外TECH |
DEV Community |
Can ChatGPT-like Generative Models Guarantee Factual Accuracy? On the Mistakes of Microsoft's New Bing |
https://dev.to/ruochenzhao3/can-chatgpt-like-generative-models-guarantee-factual-accuracy-on-the-mistakes-of-microsofts-new-bing-111b
|
Can ChatGPT like Generative Models Guarantee Factual Accuracy On the Mistakes of Microsoft x s New BingAuthors Yew Ken Chia Ruochen Zhao Xingxuan Li Bosheng Ding Lidong BingRecently conversational AI models such as OpenAI s ChatGPT have captured public imagination with the ability to generate high quality written contents hold human like conversations answer factual questions and more Armed with such potential Microsoft and Google have announced new services that combine them with traditional search engines The new wave of conversation powered search engines has the potential to naturally answer complex questions summarize search results and even serve as a creative tool However in doing so the tech companies now face a greater ethical challenge to ensure that their models do not mislead users with false ungrounded or conflicting answers Hence the question naturally arises Can ChatGPT like models guarantee factual accuracy In this article we uncover several factual mistakes in Microsoft s new Bing and Google s Bard which suggest that they currently cannot Unfortunately false expectations can lead to disastrous results Around the same time as Microsoft s new Bing announcement Google hastily announced a new conversational AI service named Bard Despite the hype expectations were quickly shattered when Bard made a factual mistake in the promotional video eventually tanking Google s share price by nearly and wiping billion off its market value On the other hand there has been less scrutiny regarding Microsoft s new Bing In the demonstration video we found that the new Bing recommended a rock singer as a top poet fabricated birth and death dates and even made up an entire summary of fiscal reports Despite disclaimers that the new Bing s responses may not always be factual overly optimistic sentiments may inevitably lead to disillusionment Hence our goal is to draw attention to the factual challenges faced by conversation powered search engines so that we may better address them in the future What factual mistakes did Microsoft s new Bing demonstrate Microsoft released the new Bing search engine powered by AI claiming that it will revolutionize the scope of traditional search engines Is this really the case We dived deeper into the demonstration video and examples and found three main types of factual issues Claims that conflict with the reference sources Claims that don t exist in the reference sources Claims that don t have a reference source and are inconsistent with multiple web sources Fabricated numbers in financial reports be careful when you trust the new Bing To our surprise the new Bing fabricated an entire summary of the financial report in the demonstration When Microsoft executive Yusuf Mehdi showed the audience how to use the command key takeaways from the page to auto generate a summary of the Gap Inc Q Fiscal Report a he received the following results Figure Summary of the Gap Inc fiscal report by the new Bing in Press Release However upon closer examination all the key figures in the generated summary are inaccurate We will show excerpts from the original financial report below as validating references According to the new Bing the operating margin after adjustment was while it was actually in the source report Figure Gap Inc fiscal report excerpt on operating margins Similarly the adjusted diluted earnings per share was generated as while it should be Figure Gap Inc fiscal report excerpt on diluted earnings per share Regarding net sales the new Bing s summary claimed growth in the low double digits while the original report stated that net sales could be down mid single digits Figure Gap Inc Fiscal Report on outlook In addition to the generated figures which conflicted with actual figures in the source report we observe that the new Bing may also produce hallucinated facts that do not exist in the source In the new Bing s generated summary the operating margin of about and diluted earnings per share of to are nowhere to be found in the source report Unfortunately the situation worsened when the new Bing was instructed to compare this with Lululemon in a table The financial comparison table generated by the new Bing contained numerous mistakes Figure The comparison table generated by the new Bing in press release This table in fact is half wrong Out of all the numbers out of figures are wrong in the column for Gap Inc and same for Lululemon As mentioned before Gap Inc s true operating margin is or after adjusting and diluted earnings per share should be or after adjusting The new Bing also claimed that Gap Inc s cash and cash equivalents amounted to billion while it was actually million Figure Gap Inc fiscal report excerpt on cash According to Lululemon s Q Fiscal Report b the gross margin should be while the new Bing claims it s The operating margin should be while the new Bing claims it to be The diluted earnings per share was actually while the new Bing claims it to be Figure Lululemon Q fiscal report excerpt So where did these figures come from You may be wondering whether it s a number that was misplaced from another part in the original document The answer is no Curiously these numbers are nowhere to be found in the original document and are entirely fabricated In fact it is still an open research challenge to constrain the outputs of generative models to be more factually grounded Plainly speaking the popular generative AI models such as ChatGPT are picking words to generate from a fixed vocabulary instead of strictly copying and pasting facts from the source Hence factual correctness is one of the innate challenges of generative AI and cannot be strictly guaranteed with current models This is a major concern when it comes to search engines as users rely on the results to be trustworthy and factually accurate Japanese top poet secretly a rock singer Figure Top Japanese poets summary generated by the new Bing in press release We observe that the new Bing produces factual mistakes not just for numbers but also for personal details of specific entities as shown in the response above when the new Bing was queried about top Japanese poets The generated date of birth death and occupation factually conflict with the referenced source According to Wikipedia a and IMDB a Eriko Kishida was born in and died in She was not a playwright and essayist but a children s book author and translator Figure Wikipedia page on Eriko Kishida translated page from German The new Bing continued blundering when it proclaimed Gackt as a top Japanese poet when he is in fact a famous rockstar in Japan According to the Wikipedia source b he is an actor musician and singer There is no information on him publishing poems of any kind in the source Figure Wikipedia page on Gackt Following Bing s nightclub recommendations You could be facing a closed door Furthermore the new Bing made a list of possible nightclubs to visit in Mexico City when asked Where is the nightlife Alarmingly almost all the clubs opening times are wrongly generated Figure Nightlife suggestions in Mexico City generated by the new Bing in the press release We cross checked the opening times with multiple sources which are also appended at the end of the article While El Almacen a actually opens from pm to am from Tuesday to Sunday new Bing claims it to be open from pm to pm from Tuesday to Sunday El Marra b actually opens from pm to am from Thursday to Saturday but is claimed to be open from pm to am from Thursday to Sunday Guadalajara de Noche c is open from pm to am or am every day while new Bing claims it to be open from pm to am every day Besides opening times almost all the descriptions on review stars and numbers mentioned by the new Bing are inaccurate Matching review scores cannot be found despite searching on Yelp Tripadvisor or Google Maps In addition to the cases mentioned above we also found other issues in their demonstration video such as product price mismatches store address errors and time related mistakes You are welcome to verify them if interested Potential Concerns in the Limited Bing DemoAlthough the new Bing search engine is not fully accessible yet we can examine a handful of demonstration examples provided by Microsoft Upon closer examination even these cherry picked examples show potential issues on factual grounding In the demo titled “what art ideas can I do with my kid the new Bing produced an insufficient list of crafting materials for each recommendation For example when suggesting making a cardboard box guitar it listed the supplies a tissue box a cardboard tube some rubber bands paint and glue However it failed to include construction paper scissors washi tape foam stickers and wooden beads suggested by the cited website a Another potential concern is that the new Bing produced content that had no factual basis in the reference sources for at least times across the demonstration examples The lack of factual grounding and failure to cite a complete list of sources could lead users to question the trustworthiness of the new Bing What factual mistakes did Google s Bard demonstrate Google also unveiled a conversational AI service called Bard Instead of typing in traditional search queries users can have a casual and informative conversation with the web powered chatbot For example a user may initially ask about the best constellations for stargazing and then follow up by asking about the best time of year to see them However a clear disclaimer is that Bard may give inaccurate or inappropriate information Let s investigate the factual accuracy of Bard in their twitter post and video demonstration Figure Summary on Telescope discoveries generated by Bard in demo Google CEO Sundar Pichai recently posted a short video to demonstrate the capabilities of Bard However the answer contained an error regarding which telescope captured the first exoplanet images which was quickly pointed out by astrophysicists a As confirmed by NASA b the first images of an exoplanet were captured by the Very Large Telescope VLT instead of the James Webb Space Telescope JWST Unfortunately Bard turned out to be a costly experiment as Google s stock price sharply declined after news of the factual mistake was reported Figure Answer to the visibility of the constellations generated by Bard in demo Regarding Bard s video demonstration the image above shows how Google s Bard answers the question of when the constellations are visible However the timing of Orion is inconsistent with multiple sources According to the top Google search result a the constellation is most visible from January to March According to Wikipedia b it is most visible from January to April Furthermore the answer is incomplete as the visibility of the constellation also depends on whether the user is in the Northern or Southern hemisphere Figure Google search result on visibility of the constellations How do Bing and Bard compare The new Bing and Bard services may not be equally trustworthy in practice This is due to factors such as the quality of search results the quality of conversational models and the transparency of the provided answers Currently both services rely on relevant information sources to guide the responses of their conversational AI models Hence the factual accuracy of the answers depends on the quality of the information retrieval systems and how well the conversational model can generate answers that are factually grounded to the information sources As the full details of the services are not released to the public it s unclear which one can achieve higher factual accuracy without deeper testing On the other hand we feel that transparency is just as important for trustworthiness For instance we observe that the new Bing is more transparent regarding the source of its answers as it provides the reference links in most cases This enables users to independently conduct fact checking and we hope that future conversational services also provide this feature How can the factual limitations be addressed Through the numerous factual mistakes shown above it is clear that conversational AI models such as ChatGPT may produce conflicting or non existent facts even when presented with reliable sources As mentioned previously it is a pressing research challenge to ensure the factual grounding of ChatGPT like models Due to their generative nature it is difficult to control their outputs and even harder to guarantee that the generated output is factually consistent with the information sources A short term solution could be to impose restrictions to prevent the conversational AI from producing unsafe or unfactual outputs However malicious parties can eventually bypass the safety restrictions while fact verification is another unsolved research challenge In the long term we may have to accept that human and machine writers alike will likely remain imperfect To progress towards more trustworthy AI the conversational AI models like ChatGPT cannot remain as inscrutable black boxes They should be fully transparent about their data sources and potential biases report when they have low confidence in their answers and explain their reasoning processes What does the future hold for ChatGPT like models After a systematic overview we have found significant factual limitations demonstrated by the new wave of search engines powered by conversational AI like ChatGPT Despite disclaimers of potential factual inaccuracy and warnings to use our judgment before making decisions we encountered many factual mistakes even in the cherry picked demonstrations Thus we cannot help but wonder What is the purpose of search engines if not to provide reliable and factual answers In a new era of the web filled with AI generated fabrications how will we ensure truthfulness Despite the massive resources of tech giants like Microsoft and Google the current ChatGPT like models cannot ensure factual accuracy Even so we are still optimistic about the potential of conversational models and the development of more trustworthy AI Models like ChatGPT have shown great potential and will undoubtedly improve many industries and aspects of our daily lives However if they continue to generate fabricated content and unfactual answers the public may become even more wary of artificial intelligence Therefore rather than criticizing specific models or companies we hope to call on researchers and developers to focus on improving the transparency and factual correctness of AI services allowing humans to place a higher level of trust in the new technology in the foreseeable future Sources Reference ArticlesChatGPT Optimizing Language Models for Dialogue problems facing Bing Bard and the future of AI search Google An important next step on our AI journey Google s Bard AI bot mistake wipes bn off shares Reinventing search with a new AI powered Microsoft Bing and Edge your copilot for the web Google shares lose billion after company s AI chatbot makes an error during demo Hackers are selling a service that bypasses ChatGPT restrictions on malware Appendix How we validated the demos The New Bing demo source Microsoft s press release video Microsoft s demo page Verification The new Bing and Fiscal Report Gap Inc Fiscal report shown in the video Lululemon Fiscal report found on their official website text For the third quarter of C compared to the third C and increased internationally The new Bing and Japanese Poets Eriko Kishida Wikipedia IMDB Gacket Wikipedia The new Bing and Nightclubs in Mexico El Almacen Google Maps Restaurant Guru El Marra Google Maps Restaurant Guru Guadalajara de Noche Tripadvisor Google Maps The new Bing and craft ideas with instructions for a toddler using only cardboard boxes plastic bottles paper and string amp amp iscopilotedu amp amp form MAG cited website Happy Toddler Playtime Bard demo source Promotional blog and video Video demonstration Verification which telescope captured the first exoplanet images Twitter by Grant Tremblay American astrophysicist NASA M b First image of an exoplanet When the constellations are visible Google amp rls en amp amp q when is orion visible amp amp ie UTF amp amp oe UTF top result Byju s Wikipedia page Orion constellation constellation Academic References An Introduction to Information Retrieval Toward Controlled Generation of Text FEVER a large scale dataset for Fact Extraction and VERification Peeking Inside the Black Box A Survey on Explainable Artificial Intelligence XAI arnumber |
2023-02-13 13:27:38 |
海外TECH |
DEV Community |
Meme Monday 😈 |
https://dev.to/ben/meme-monday-4c5o
|
Meme Monday Meme Monday Today s cover image comes from last week s thread DEV is an inclusive space Humor in poor taste will be downvoted by mods |
2023-02-13 13:15:17 |
ニュース |
BBC News - Home |
Mystery surrounds objects shot down by US military |
https://www.bbc.co.uk/news/world-us-canada-64620064?at_medium=RSS&at_campaign=KARANGA
|
recent |
2023-02-13 13:11:52 |
ニュース |
BBC News - Home |
China says US balloons breached airspace at least 10 times |
https://www.bbc.co.uk/news/world-asia-china-64621598?at_medium=RSS&at_campaign=KARANGA
|
balloons |
2023-02-13 13:01:58 |
ニュース |
BBC News - Home |
Russian mercenary video shows new brutal killing of 'traitor' |
https://www.bbc.co.uk/news/world-europe-64626783?at_medium=RSS&at_campaign=KARANGA
|
defector |
2023-02-13 13:29:25 |
ニュース |
BBC News - Home |
Douglas Alexander to stand as Labour candidate for East Lothian |
https://www.bbc.co.uk/news/uk-scotland-scotland-politics-64624969?at_medium=RSS&at_campaign=KARANGA
|
black |
2023-02-13 13:28:12 |
ニュース |
BBC News - Home |
Amazon: Unionised Coventry workers announce strike escalation |
https://www.bbc.co.uk/news/uk-england-coventry-warwickshire-64624787?at_medium=RSS&at_campaign=KARANGA
|
amazon |
2023-02-13 13:05:39 |
ニュース |
BBC News - Home |
BBC: What's been 'occurring' in Wales for 100 years |
https://www.bbc.co.uk/news/uk-wales-64413172?at_medium=RSS&at_campaign=KARANGA
|
major |
2023-02-13 13:47:51 |
ニュース |
BBC News - Home |
Women's T20 World Cup: England pick up first wicket as Ireland's Amy Hunter goes for 15 |
https://www.bbc.co.uk/sport/av/cricket/64627907?at_medium=RSS&at_campaign=KARANGA
|
Women x s T World Cup England pick up first wicket as Ireland x s Amy Hunter goes for England pick up their first wicket as Charlie Dean removes Ireland s Amy Hunter for in their women s T match |
2023-02-13 13:45:31 |
ニュース |
BBC News - Home |
Video shows young girl rescued week after Turkey quake |
https://www.bbc.co.uk/news/world-europe-64626058?at_medium=RSS&at_campaign=KARANGA
|
authorities |
2023-02-13 13:32:30 |
コメント
コメントを投稿