Episode Notes

Welcome back to another episode of AccessWorld, a podcast on digital inclusion and accessibility. As we prepare to celebrate Global Accessibility awareness day, we thought it would be great to look into the future of accessibility, welcoming our special guest to the podcast Gemini, Google’s AI platform, which gives us a glimpse into the power AI has toward making the world around us more accessible.

Aaron and Tony take time to talk with Gemini on its latest update to provide image and real world description in real time (Editor’s Note: the version used in this episode is still in beta, so some of the lag times were edited out for brevity). The conversation serves as a launch pad to discuss how AI is already tackling tasks that three years ago seemed like light years ago in a galaxy far, far, away. Aaron then offers a breakdown on apps like Suno, which creates AI generated music. You can check out more about its own accessibility for content enthusiasts in the Spring issue of AccessWorld. The conversation moves into ways that AI may continue to be rolled out in other ways, such as when will Aaron and Tony’s guide dogs be able to take a well-deserved vacation in the Bahamas as the development of AI robots continue to flourish. Guide dog users don’t fret, we’re still quite a ways away from guide-bots. In the meantime, enjoy this special episode that dreams about the future ahead of us.

About AccessWorld

The AccessWorld podcast is an extension of AccessWorld Magazine, a quarterly publication from the American Foundation for the Blind (AFB). Published each quarter, AccessWorld Magazine offers news and reviews encompassing digital inclusion and accessibility. AccessWorld celebrates its 25th anniversary this year, and we’re excited to share all of our back issues online free for readers. Visit www.afb.org/aw for the latest issues and to access our archives online.

Aaron Preece serves as editor-in-chief of AccessWorld, and Tony leads communications for AFB. Together, they enjoy meeting up each month to discuss the latest happenings around accessibility. Visit the AFB website to learn more about our exciting work like our groundbreaking research on the impact of AI for people with disabilities. And consider making a tax-deductible donation to help support this podcast and all of our work at AFB. Your support helps us create a world of endless possibilities for people who are blind or have low vision.

This podcast is powered by Pinecast.


AccessWorld Podcast, Episode 19 Transcript

Intro (00:00):

AFB. You are listening to AccessWorld, a podcast on digital inclusion and accessibility. AccessWorld is a production of the American Foundation for the Blind. Learn more at www.afb.org/aw.

Tony Stephens (00:29):

And welcome back everybody to an episode of AccessWorld podcast. A podcast on digital inclusion and accessibility from the American Foundation for the Blind. I am Tony Stephens, your co-host alongside the editor-in-chief of AccessWorld Magazine, AFB's quarterly publication that's released online each quarter. Aaron Preece. Hey man. Welcome to May.

Aaron Preece (00:53):

Time is flying by.

Tony Stephens (00:55):

I know, I know. Winter is finally behind us. No more of those April gross, rainy cold days. I'm sure there'll be some May rainy days, but it's just Aaron and I. Well I would say just Aaron and I, but we will maybe have a special guest making

Aaron Preece (01:09):

Special guest

Tony Stephens (01:10):

an appearance here soon. “Guest” using air quotes. I guess if it's not human, is it still a guest? Yeah, what qualifies being a guest? Yeah, but anyways, we are going to be diving in May is Global Accessibility Awareness Day is in May. We're going to be thinking all month long. We have AccessWorld coming out later this month as well. And really tying to the big theme, at least what we're focused on here at AFB is artificial intelligence and the recent research, you can go to our website to find out an impact study on the impact of AI, the good and the bad that came out earlier this year. Feel free to check that out at afb.org/ai research, but you can listen back to a prior episode as well where we talk about that. But in talking - Aaron and I, we've just been talking so much about AI and all the excitement going on. We're going to get a demo today and it's going to be cool because we're going to be demoing some brain power that far exceeds I feel like where my brain has been going recently, but with Gemini. So we're going to have a special guest soon with Gemini. We're going to get that loaded up and be checking things out there and be talking about all things AI. So are you ready for all the -

Aaron Preece (02:24):

Yeah, we're almost ready to go. I'm currently making NVDA understandable for everybody out there changing the voice up here to Mr. Microsoft Mark.

Tony Stephens (02:36):

Yeah, we had it up early demoing this and you had it at like 90% speed. Even my weathered ears of text to speech was trying to keep up. Folks always know that. You can tell they're real hardcore screen reader users. They have it jacked up where it sounds just like an old school tape.

Aaron Preece (02:58):

Oh, yeah.

Tony Stephens (02:59):

Three times as fast.

Aaron Preece (03:01):

Yeah, I usually use the, and I'm always impressed with the people that use e speak, which is the open source robotic synthesizer and people will use it at a hundred percent and even after that they'll hit the rate boost option, which makes it even, it can say a sentence in a fourth of a second or less. And mine's relatively fast but not near that fast.

Tony Stephens (03:22):

I'm beginning to think, I always tell people, everyone's like, oh, you're blind. Do you have superhuman powers kind of thing? Are all other senses, and I'm like, we're just like you. Anybody can do it. But yeah, when I guess we do sort of, it's definitely a training to get your brain to work that fast.

Aaron Preece (03:40):

And you train yourself without realizing. And I was thinking about, I've been using eloquence reads since I was probably 10 years old in 2000 or something like that, and that's the screener or the voice I still use. So I've probably spent, who knows how, I spend way too much time on the computer anyway, so who knows how many thousands and thousands of hours of eloquence read that I've heard that makes it possible to listen that quickly.

Tony Stephens (04:06):

Yeah, more than the, what do they say, the 10,000 hours to get you to some sort of expert level.

Aaron Preece (04:10):

Yeah, expert level on something. Awesome.

Tony Stephens (04:14):

Well go ahead and share your screen for those. Check on YouTube so they don't just have to check me out. I know your webcam isn't on right now or lack thereof webcam. But yeah, go ahead and share your screen for folks listening on the podcast going to, like I said, we're going to be introducing a special guest in a minute. Gemini is going to be joining us from Google. It's the Google AI and we've talked a little bit about Gemini on the podcast before.

Screen Reader (04:39):

You have started screen share press F six.

Aaron Preece (04:42):

I think we're live.

Screen Reader (04:45):

Participants can now see your application.

Aaron Preece (04:47):

Well NVDA thinks that participants can now see my screen. So we are in Google AI studio and so with Gemini there's been a lot of really cool developments and what we're going to show you today is Gemini 2.01 flash, I believe is what it's called now. Lots of really cool recognition advances with Gemini in particular, all of the big AI players are going into this space. I believe we might've talked about chat GPT using the app to share your screen or share your camera with chat GPT and be able to ask it questions and talk to it like a person. But you can do that with Gemini as well. And specifically you can share your computer screen so that you can talk to Gemini like it's a person looking over your shoulder at your computer screen to help you with whatever a person could help you with. I've honestly been pretty amazed with this. It's still in beta and there's still I imagine some processing things to figure out when it comes to the power of the AI time, time thinking, bandwidth when it comes to streaming and that sort of thing. But it's a very cool feature. Also, they've recently released a feature where you can upload videos and have it audio describe videos for you. So I haven't messed with that personally, but that seems like it could be a major game changer going forward.

Tony Stephens (06:19):

All this stuff is so primitive, but it is, yeah, the potential because 8, 9, 10 years ago I was working with NAB not Association of broadcasters specifically on will we ever get to a point where we can do this kind of technology. If you're listening to the weather and your news and the television screen and they're showing you where a tornado's tearing through a street on a map, you don't know where that is if you can't see it. And it's awesome to think that we're getting there. I don't think this could be done. So you're loading up Gemini now on the screen.

Aaron Preece (06:50):

We're on the Gemini screen. I need to read. So one thing with this, because it is beta and I assume it has to do with your internet connection and all that sort of thing, it comes into play. It does crash a lot and you'll hear a sound that will let you know that it has crashed and you can restart. The one thing that's nice about it is you can restart the stream very quickly and get back up and running, but we very well might run into some crashing, definitely some lag because of running through, having Zoom going at the same time screen sharing,

(07:21):

It might be a lot slower on this pc. I know I was testing it on a different PC earlier and it was significantly faster and seeing my screen and so there's going to be some fluctuations there. This might not be the, I would say take it with a grain of salt when it comes to the performance, especially when it comes to lag and that sort of thing, that might not be your experience. One thing cool about this as far as I can tell it is free. You just go to ai studio.google.com and you can select stream and there's options right here if anybody can see the screen. I believe they're on the screen right now.

Tony Stephens (07:55):

You or I can do this. I'll look forward to Gemini telling us what's on the screen in a minute.

Aaron Preece (07:58):

Yep. It'll be able to tell us. So

Screen Reader (08:02):

Mute currently unmuted.

Aaron Preece (08:04):

I'm in my

Screen Reader (08:05):

Google AI studio

Aaron Preece (08:06):

Google studio

Screen Reader (08:07):

Button, voice button, unclear the chat screen button.

Aaron Preece (08:10):

It tells me that I'm just going to go through, so I'm just using shortcut keys to move NVDA's cursor around jumping back to an edit field. So you'll hear some for anybody that's not as familiar with screeners, you're going to hear a lot of sentences that get cut off as I move around through here.

Screen Reader (08:27):

Clear the out of edit button, button, restart stream.

Aaron Preece (08:30):

So we have restart stream

Screen Reader (08:31):

Button, run settings, check type something at out of edit button, share screen

Aaron Preece (08:36):

So we share screen. So there's a bunch of options here. So we have, if you want to have this in text instead of someone speaking to you, I actually did that there for a while. I would keep the stream running in the background and then when I would ask questions I would just all tab over to the window and I found with my screen reader voice, sometimes I could read it faster than the voice would read it to me and not changing windows. So that's a lot of ways you can use this and kind of figure out what you prefer too.

Tony Stephens (09:05):

I'm trying to get caught up in real time then to some of the terminology. So the stream then is when you're actually tapping Gemini on the shoulder saying, Hey, the stream is that connection.

Aaron Preece (09:16):

Yeah, the stream is like what we're doing

Tony Stephens (09:17):

Connecting with their ai.

Aaron Preece (09:18):

Sharing my screen. It's essentially like what I'm doing. Sharing the screen with us on Zoom. You're streaming your screen to Gemini, so it's like the live feed basically

Screen Reader (09:30):

Button talk, button share screen

Aaron Preece (09:32):

So you could also just talk to it back and forth there. There's a button called talk, so I'll hit share screen.

Screen Reader (09:38):

Alert, allow iud.google

Aaron Preece (09:40):

And then you can pick, it popped up a window, so I'm going to jump into that window to pick the screen. I'm going to have a look at the, we'll see if it can actually do this, but I wanted to see if it can tell me. So if as a use case, let's see if it'll tell me if it can tell that I'm screen sharing. So I'm going to do alt+W

Screen Reader (09:58):

Alert, allow Zoom screen share test Mozilla

Aaron Preece (10:01):

That's our chat GPT window for later.

Screen Reader (10:03):

Screen sharing meeting controls.

Aaron Preece (10:05):

Okay, there's Zoom, so I'm going to hit alt+A

Screen Reader (10:08):

Firefox sharing indicator alert stream is live.

Aaron Preece (10:12):

Hello Gemini, can you hear us

Screen Reader (10:17):

Button get code? Yes, stream is live.

Gemini (10:19):

Yes, I can hear you.

Aaron Preece (10:20):

Welcome to our podcast. We are going to be demonstrating your screen sharing capabilities and your image recognition capabilities.

Screen Reader (10:29):

Yes, I can hear you.

Gemini (10:30):

Okay,

Screen Reader (10:30):

Button play.

Gemini (10:32):

I see that you are sharing your screen which shows the controls for a Zoom meeting. Is there anything in particular you would like me to recognize or comment on?

Aaron Preece (10:43):

Yeah, so for this Zoom meeting, I'm trying to share my screen and I wanted to make sure that the screen share is actually working properly and that the other participants can see my screen. Can you determine that?

Screen Reader (10:54):

Mute Button, Play button.

Gemini (10:56):

Based on what I'm seeing? You are currently sharing your screen as indicated by the message you are screen sharing. The other participants should be able to see your screen as well.

Aaron Preece (11:06):

One thing, if you're looking at the screen, you'll see that as the audio goes through because watching the screen all the time and then it only answers you whenever you talk to it and there's also an option for you to be able to play what it hears. You'll see there's a bunch, if you can see the screen and I can air through 'em, but there's a bunch of audio files that are now showing in the stream. It has a lot of potential I think when it comes to being able to help you with things when you're working. Even things like potentially like doing testing, like we do peer testing with screen readers when we work with clients to make sure, yes, I can go through a webpage and list out any of the accessibility issues I run into, but a lot of the times I know I'll find things that don't visually show up on screen or there'll be an option that's just entirely missed and I wonder, it seems like there might be some possibilities there for it to help with that sort of task.

Tony Stephens (12:08):

Well lets, can we bring Gemini back and then not to be rude to our guest.

Aaron Preece (12:13):

Yeah, we can actually talk to him instead sharing.

Tony Stephens (12:16):

Let's find out how Gemini's doing.

Aaron Preece (12:18):

So I'll hit restart stream. So for people watching, this is how you, if it crashes or you need to restart it, you can hit restart, stream.

Screen Reader (12:27):

Button, run setting known type something out of, edit button share.

Tony Stephens (12:31):

But it's also too just the idea that we are multi multi streaming here with the Zoom and recording the video and then so it's not all Google in a sense if it is crashing while we're doing this, it's also too just

Aaron Preece (12:45):

And it is beta.

Tony Stephens (12:46):

We need to get you a new computer. This is your subtle way. Yeah. What do I need to do to get a new computer so I can

Aaron Preece (12:52):

Button talk? So this is just talking to it so we can bring him up here.

Screen Reader (12:56):

Connecting to server.

Aaron Preece (12:58):

Oh and it's button connection failed, so connection failed, so it's very much a server. It's just like a connection thing. It seems to be the main issue

Aaron Preece (13:07):

They're always just since I started using it a few weeks ago back in March, there's been multiple updates to the interface, multiple updates to the voices. I don't know what all is changing under the hood necessarily, but it seems like this is a constant, I guess people know in general with AI that the progress is so rapid. Thinking back to Chat GPT-3 0.5 coming out in 2022, it's amazing in just two years, two and a half years what's been done. So this is always improving.

Screen Reader (13:41):

Button, reconnection, failed. Button heading level one talk.

Aaron Preece (13:44):

I'm going to restart the screen. I'm going to refresh the page, which seems to fix this connection issue.

Screen Reader (13:53):

Busy Google AI Studio document busy type something, edit multiline out of edit button share screen button talk. Firefox sharing indicator, alert button connection failed restarts heading level button get code type share talk button connecting to server.

Aaron Preece (14:13):

Hi Gemini. Can you hear us?

Gemini (14:16):

Hi there. Yes, I can hear you. How can I help you today?

Aaron Preece (14:20):

We are demonstrating some of your capabilities for our AccessWorld Technology podcast where we talk about technology and how it intersects with the lives of people who are blind or have low vision.

Tony Stephens (14:35):

Don't be rude, ask how it's doing.

Gemini (14:37):

That sounds like an interesting podcast. What capabilities are you hoping to demonstrate today?

Aaron Preece (14:43):

We're demonstrating your screen sharing capabilities as well as talking about your video sharing capabilities. You've been getting a lot of upgrades lately. How are you feeling about that? How are you doing?

Gemini (14:56):

It's been great getting all these upgrades. I'm learning new things every day and I'm feeling more capable and helpful as a result. Is there a particular aspect of the new screen sharing or video sharing capabilities that you'd like to focus on?

Aaron Preece (15:10):

Yeah, could you tell us about the details of your video recognition, audio description capabilities? Give us the details of that.

Gemini (15:21):

I can certainly tell you about my video recognition and audio description capabilities. I'm able to analyze the content of a video and identify objects, people and activities that are taking place. I can then generate a text description of the video, which can be used to provide audio descriptions for people who are blind or have low vision. Would you like a more detailed explanation of a specific feature?

Aaron Preece (15:45):

And there he goes.

Tony Stephens (15:48):

Yeah. Wow, that is cool.

Aaron Preece (15:55):

The voice, the fact that it sounds like we just had a third person here is what's so cool to me too and like I said, they recently upgraded the voices again, but that is crazy. I can't imagine when our screen readers could potentially talk. I know it has to be streamed now because of the way the voice is created, but that could be crazy in the future.

Tony Stephens (16:13):

It's just the interactivity. Ever since I got the 16 pro with Apple intelligence, Sury is becoming more conversational. It's more of the ability to not just have to ask the syntax that we use is becoming more conversational. The same with I've used ChatGPT Pro and it's conversation with this speaker person. Just how you can just sit there and have a conversation. Oh, ask me more. Tell me about this. And I'm a Big Iron Man was my favorite of the Marvel series and Jarvis computer. It's becoming more like Jarvis, which there's a character in the Iron Man universe and I didn't think we'd be here this quick like you said because it's only been since 2022, late 2022.

Aaron Preece (17:08):

Yeah, like November, December.

Tony Stephens (17:09):

Yeah.

Aaron Preece (17:11):

One thing when you mentioned Ironman, one thing I've always wanted to be able to do is read comics and there's just a whole world out there that you don't really have access to unless it gets adapted into some other kind of media show or something. But with image recognition you could with any of these Gemini Chat, GPT llama, any of them, you can probably to a lesser greater degrees have it read comics to you and describe each panel tell you who's talking and as they improve the context it'll be able to remember characters and there could be some interesting ways it could transcribe comics for people where that would take a human long time to do for a very small audience for at least more niche interests. I guess the things that were never profitable before possible to get someone to do.

Tony Stephens (18:03):

Here's an ask I got for Gemini then it is making me think too, we had a call the other day coming to AFB with someone looking for their mother, I think 91 and I know technology is, it's hard to teach people how to use smartphones and things like that that have never really been immersed in that technology compared to Gen Z or millennials.

(18:24):

So I was trying to explain, okay, well there's the old school just national library service with the talking book player machines and the Victor Reader and things like that. Things that this person can get but when you look alone at what we're almost 10 years into Alexa enabled devices in our homes now mine just went off. I said that in the background for some reason it's thinking my TV is on downstairs, but to the point where it's like, okay, I can say play my audible book, but I'm wondering, I'm thinking of, I've talked on the podcast before the Gutenberg project when it came out, all these public access books like Treasure Island that are public domain now that are a couple hundred years old or a hundred years old are out there. Could you just ask Gemini like, Hey Gemini read me the first paragraph of Treasure Island and the idea that we can have devices that are seamless in our homes that shall not be named right now or else she'll chime in and listen to me again. I know we're in the early stages for this, but can you just go in Gemini right now and say something, read me the first paragraph of this book.

Aaron Preece (19:29):

The Echo is getting AI capabilities now and I think it uses their own model potentially, and I know depending on what you're asking it to do, it can switch. I know Claude's a big model similar to Gemini and chat GBT, it's one of the big three or big four with LAMA two. So lots of capabilities there. Just being able to talk almost like you said Jarvis just being able to talk to it and the echo is already kind of like that in its current incarnation, but even having it be a lot smarter will be pretty helpful I think

Tony Stephens (20:04):

Just in the seamlessness of having this be part of our lives in a world that could be described for people that are blind or low vision.

Aaron Preece (20:13):

And one thing with this too, with this streaming, you can also, something I personally do a lot is I will do a print screen and you can with chat GBT in particular, there might be a different process for doing this with Gemini, but you can just paste a screenshot into chat GBT and ask it basically like we were doing here. And that's once you get used to that motion of print screen, alt tab, paste type or dictate, that's almost just as quick and you can do as different models. We were talking earlier before the podcast chat, GPTI noticed I subscribe on my personal account to GPT plus which gives you access to the reasoning models and that sort of thing. And their oh three model is out now and a lot of times you don't really need that level of reasoning on a picture. GPT-4 is very quick and can do what you need most of the time.

(21:08):

But I shared a screenshot with O three the other day and it looked at it for five minutes trying to figure out which wasn't necessarily what I wanted it to do, but it is very smart and it was trying to be very particular about what information it gave to me and it was in addition just to looking at the screenshot, it was cropping, it was basically it was changing things about the screenshot screenshot so that it could see it better, which was I thought was that's new. That's not something I've seen it do before and I thought that was

Tony Stephens (21:41):

So it's like targeting part of the screen.

Aaron Preece (21:43):

Yeah, and it's like targeting different parts of the screen, zooming in too, I asked it to give me a value. I was playing a video game, I said, how much money do I have? And so it was like, well there's a bunch of different numbers on the screen, which one's this? It looks like the money might be up here but I can't see it as well. So I'm going to zoom in on that part of the screen. This item seems to be overlapping it a little bit. I'm going to try to crop that out so that I can see the number better. So I was doing things like that that I've not seen it do before, which I thought was pretty cool.

Tony Stephens (22:13):

It's come a long way from war games in, what was it, 1980? Can you play a game of chess? Yeah,

Aaron Preece (22:19):

Yeah.

(22:21):

Well not really on the accessibility side of things, but I've had it make up games. This is Chat GPT in particular, but I've had it make up games like a card game and I said make up a card game that takes inspiration from magic the gathering or whatever, but not actually be that game. And it gave me made up rules for it and then it also made up individual decks and played the game. Let me pick a deck and pick a deck and then we played the game against each other and it was pretty cool.

Tony Stephens (22:51):

And the whole time it's doing this, it's learning. It's constantly growing. I can only imagine because for me, the area that I'm thinking of particularly it is great in describing our world, but when it creates something new like that game, the idea that we've thought in the whole animal kingdom, one of the things that makes humans unique is our ability to be creative. I mean animals can find ways to be creative too. They need to build a nest a certain way so it doesn't get blown out of a tree or necessity being the mother of invention for a lot of species probably, but just to create something new even if it's based on the idea or concept of something prior to that. I know the big space for me is music and you've had a chance to work on some of that. In fact, there's going to be some in the upcoming AccessWorld.

Aaron Preece (23:44):

So started a series last time in the spring AccessWorld back in February. I reviewed the Suno AI music platform, which is the one I use personally on a kind of a more, I dunno if I'd say a daily basis, but it's the one I've made the most music in and it's come a long way since it started back in I think 2023 was when I first picked it up with V2 and they're on V4 now and it's pretty amazing what it can do. What's amazing to me is the ability of Suno in particular. There's multiple now out there's UIO or UDO, my computer says Udio, but I think it's Udio, it's another one that's pretty popular. The one I'm going to review next is called Refusion and it's somewhat similar to Suno. It's interesting how different music AIs are good at different things. Suno is very good at making songs you want to listen to if that makes like Earworms essentially. And you probably definitely know more about music than I do about the construction of songs and that sort of thing I assume, but there's something about seeing it to me, an intangible improvement where it's like this is more complex in a way I can't quite describe in the way it's using rhythms and the way it's implementing the instruments and that sort of thing that was not really there before. The things are just more complex but while still staying. Interesting. Do we want to play one? We can,

Tony Stephens (25:23):

Yeah, let's try it. I'm really interested in this because music's obviously a big part of our lives for a lot of people that are blind or have low vision.

Screen Reader (25:33):

Screen sharing meeting, audio settings drop.

Tony Stephens (25:36):

My mother who wanted me to go study jazz.

Aaron Preece (25:40):

But it's interesting about the music AI is the people that introduced me to it are composers themselves and that's my friends that are the most into it are my composer friends, which I thought was pretty cool. A little bit of where are we going now that AI can make art and can write and draw pictures and all this sort of thing. But at least with the music side of things, I've noticed that it's people that are using music editing software and stuff like that already are really into it in my personal experience at least. So I'm going to bring up, share that other screen I think the way Zoom does this. So real quick, I want to pause for a second as I do that. All right, so we are back and Suno should be visible on the screen and I'm going to play, this is just a random song that they've picked out or that's on the

Screen Reader (26:42):

Suno. Button, play button with play count.

Aaron Preece (26:45):

So when I reviewed this for AccessWorld, and the same thing is going to be true for Refusion is we're talking about more the music AI here, the actual AI part of the platform and how well it can compose music and that sort of thing. When I'm reviewing these for AccessWorld, what I thought was nice about these particular web, or these platforms is they're kind of more complex web apps and so I'm really evaluating them on their accessibility and so Suno’s usable, but as you heard it says play button with play counts. So I don't know if that's really what that should say. So there's, I wouldn't say accessibility issues, there might be with some of the more advanced features, but there's some,

Tony Stephens (27:29):

Have you found one to be more accessible than the other? For screen readers.

Aaron Preece (27:37):

We'll see how things, they update so frequently. You never know, but I would say at this current point in time, fusion is significantly more accessible than Suno. Not that Suno is unusable or anything and they have made it strides to make the web app accessible and things that weren't labeled before have labels. So they are thinking about accessibility but out of the box fusion at this current point in time, we'll see by the time I launched the article in May, but at this point in time it's significantly more accessible when it comes to things like the labeling, even just seeing how long it takes till your songs finished cooking. I guess there's a progress bar for that. My opinion is that everybody's going to be different. This is such a subjective area of AI I would say, but I think Suno creates better compositions while Fusion has better sounding instruments at this point in time, and I know they're both of course always these platforms as we talked about with Gemini, they're always working on, I know there's just in version four of Suno, there's been all kinds of audio changes trying to fix this issue, fix that issue, improve this, improve that. So they're always changing for sure. So this is

Screen Reader (28:56):

Suno button play graphic Suno.

Aaron Preece (28:59):

This one's just called Suno. I don't know what we're about to hear, but we'll go ahead and play.

Screen Reader (29:03):

Button. Play button with play count.

Tony Stephens (29:06):

So this is a song it's just created?

Aaron Preece (29:08):

Yeah, this is just, somebody created this, let's see.

Screen Reader (29:10):

Button, like button with count.

Aaron Preece (29:12):

So I'm using the graphic shortcut to see who the

Screen Reader (29:15):

Jonathan Fly graphic

Aaron Preece (29:16):

Jonathan Fly wrote this song or AI generated this song.

Screen Reader (29:20):

Suno graphic, ONOA out of graphic button play button with play count.

AI Generated Song (29:27):

It is the morning of my life again and the wind upon my back seems to whisper why go home when you are free is life. But the black bird flies from the Garrett in the town where Bruel lived and a tear of blood runs down the ivory beak while the evil men rewrote history. When the black bird sings, let him bring us peace.

Aaron Preece (30:16):

Pause it there. So that's,

Tony Stephens (30:18):

That's all computer generated sounds?

Aaron Preece (30:21):

That's all computer generated sounds, voice and everything.

Tony Stephens (30:24):

AI guitars, AI drums.

Aaron Preece (30:26):

One thing I noticed, and maybe you can explain this better than I could, but there's notice at the beginning how there's a little bit of deliberate offbeat things with the way the lyrics and the backing instrumentals lined up there is what I thought was interesting about that particular song. That to me seems like at least when I've seen other songs do that it's usually very advanced artists that are pushing actively trying to push boundaries that do those kinds of things with music.

Tony Stephens (30:55):

I mean essentially it's taking every song ever recorded that is out there and mapped and

Aaron Preece (31:00):

Understanding the patterns and what goes into

Tony Stephens (31:02):

compressions, the idea that you go. Music always has things in numbers, right? Like the 1, 2, 3, 4, 5, 6, 7, 8, I grabbed my guitar here, take this AI. But it's interesting that you can, so it basically maps it out with all these songs and figures out, all right, these are the ones that are successful probably. Here's the changes and just says, okay, I'm going to approach it from these different angles, but there's nuances in there. I don't know if Jonathan Fly was like, give me brushes on the drum or I'm sure

Aaron Preece (31:39):

Yeah, you can do a lot of that.

Tony Stephens (31:39):

It takes time to build these.

Aaron Preece (31:41):

Yeah, and there's a lot of, for me it's different. I enjoy, I've played instruments throughout my life. I've never gotten, it's super amazing at any of them, but I definitely, I've played guitar, I've played mandolin a little bit, piano back to piano now, and the act of for me generating with suno is enjoyable in a different way than learning a difficult song and playing the piano. It's a little bit of almost like a puzzle. I want to get this out of it. What do I need to do to make it give me what I want? Essentially

Tony Stephens (32:19):

That's like DJing it all the time when they're like, I need a breakbeat, I need a backbeat, I need something that's this 90 BPM beats per minute. They kind of pieced together samples. It took a long time for traditional jazz and blues musicians to basically say there's a creative element to that. So when the person that's starting this Jonathan Flyer, whomever else is doing these on Suno, there is some sort of effort of I want something that sounds folky. I want something that sounds like different sounds like Bon Ivory or Billie Eilish or these different genres.

Aaron Preece (32:56):

They make a point of not letting you do that. But

Tony Stephens (32:59):

No, I mean Billy Eilish the name and copyright, but I'm sure it's probably like, can I do folk 1960s style.

Tony Stephens (33:07):

Things like that.

Aaron Preece (33:08):

You can get real close if you know what the person's particular genre is and how niche, especially if you can really use the tags to simulate a specific artist genre. Sometimes you'll hear a song and you go that it's not quite so and so, but it sounds kind of like them a little bit.

Tony Stephens (33:30):

I mean 10 years ago with these programs that are used to engineer and edit songs, in the old days it was reel to reel tape, but with computers like pro tools and Logic and these programs, you have software that can say, give me a guitar tone that sounds like David Gilmore's comfortably numb and they won't say Here's comfortably numb, but they'll say something like When you choose it here is uncomfortably invigorating or something. They'll find some little things where you can kind of figure out, oh, that must be that sound, and sure enough you play a note on the guitar and it's like using computers to recreate that sound. That was the early stages of what this is, which is taking all those sounds and building them into these elements and pieces that, I mean I am not going to say I would listen to that on my Spotify list, but it was catchy.

Aaron Preece (34:20):

Yeah, I do a lot of bluegrass metal folk fusion type stuff and so I've made my own playlist that I listen to sometimes of my own nerdy music that no one else wants to hear.

Tony Stephens (34:43):

Well, and if it's just background music while you're working and maybe you don't want to hear the same song a hundred times, but there's a song you really love, I mean, yeah, well this is a brave new world. So this is all computer based, right?

Aaron Preece (34:58):

There's an app too. The apps app still has some

Tony Stephens (35:04):

The apps and things you can do. They're all still devices. We haven't even gotten into AI and robotics. The different robots that are coming out now that they're demonstrating them with waiting at a party. Obviously there's been robotics and automation for many years. That's where it all basically started in factories, but the idea of this, we're having something that's conversation has a little sense of creativity. I went to a sushi restaurant recently where the appetizer was delivered by a little robot that brought you a little tray and it had your spring rolls and things like that on it.

Aaron Preece (35:46):

Oh, that's cool. A little R 2D two.

Tony Stephens (35:48):

Yeah, a little R two. Exactly. It was a little kind of like the, remember R 2D two had these friends when they would have a scene in Star Wars where there was R two and then there was the other just nondescript boxy robot. It's more of a boxy robot I guess, because you don't want to spill your sashimi everywhere, but, but with these robotics, we're getting it to describe it. Do you think we ever get to a point where it also lead us around, take me to the store and help me find stuff like personal assistance almost.

Aaron Preece (36:18):

They're working on it, the Glide, we did a blog on it a couple last year, I believe from CSUN last year. They're working on basically having a device that is, there's a lot that goes into it when it comes to the way and doesn't have to have a motor that drives it forward. I think using some interesting gravity type stuff to get it to roll effortlessly,

(36:44):

But they're working on something that you would hold like a guide dog harness handle and the AI is using lidar or something like that to recognize obstacles and find things and steer you around. I'm not sure where they are now with that. I know that's what they were working on and even we use both of us, our guide dog uses, there's always the Boston Dynamics, I think it's called Spot the Robot Dog. I guess it's probably easier to get something to walk on four legs than two balance wise. I guess you could eventually have a spot that would walk you around. I guess to me what I'm wondering is where updating or there's always advances in chip technologies, you would basically want something to be able to run the AI on the system so the robot would need to be able to run the AI locally.

Tony Stephens (37:40):

You don't want to be underground in a subway.

Aaron Preece (37:41):

Top and you could lose your connection to your guide dog.

Tony Stephens (37:44):

Stranded with the 40 pound piece of steel that's just sitting here lifeless.

Aaron Preece (37:50):

I think, just like travel too. I don't know about you but my dog, the way she can squeeze into small spaces under airplane seats and everything else. With these more suitcase like devices, how are you going to do that or how are you going to fit 'em under a table at a restaurant or

Tony Stephens (38:10):

Yeah, no, and well too, we still get the stigma of is that you are not allowed to bring pets? I still get that. You think everybody knows by now. I mean it has a harness. It's the quintessential idea of a service dog that goes back almost a hundred years and I don't know, there's still something about the biological symbiotic relationship that I have with my dog. I mean I know dog's not for everybody that's blind or has low vision, but when we were talking actually with Josh last month, Dr. Mele in his memoir about the choice of a guide dog over a cane. I mean still a lot of users use the cane, more people use cane than guide dogs. But

Aaron Preece (38:52):

Yeah, there's a lot of good reasons to use a cane too. I prefer a guide dog, but I know that's definitely a subjective choice on my part in how I particularly, I appreciate navigating through the world and there is something to be said for when you travel and I’m guessing you have the same, a similar experience. I know every time I'm in an airport I have to make sure as my dog used the bathroom and finding that when you get to the hotel again, you got to find that spot. And

Tony Stephens (39:19):

I'm having that flashback to when I flew to Charlotte or not through Charlotte to get to Huntington, I had to transfer and my 80 something pound German Shepherd and I, one of us got tummy sick, not to names, but I won't say which one in front of the gate in Charlotte and the e concourse and there's probably still a wanted poster for him there. I tipped the cleaning staff person $20, but it was a rough flight and although now here's the question though. Lithium batteries are probably what's fueling these things and you can't take those on planes.

Aaron Preece (39:57):

That's true, yeah. You don't want your guide dog to erupt on the plane or something.

Tony Stephens (40:03):

Yeah, yeah. So I mean I don't foresee guide dogs going away anytime soon, but what's interesting though is just the idea of, I mean I find myself talking to my dog a lot. I don't know if you do this a lot or not.

Aaron Preece (40:15):

Oh yeah, for sure.

Tony Stephens (40:17):

Just like this relationship and what's interesting with what we're dealing with with Gemini earlier is, is there going to be a relationship that's created with the ai? And I know some ethics folks have already started talking about this with AI and robotics and the idea that people have. People will be buying some of these robots for just to have a friend that are maybe hard in social settings and things like that.

Aaron Preece (40:43):

There's a lot going on in that arena. I know that there's - I don’t know where it is now a year ago, there's one called high AI and there might be others now similar to that where it was deliberately designed as a companion AI, like a conversational AI versus chat GPT. You can have conversations with chat GPT, but it's usually going to default to trying. You'll notice in the way Gemini talks too is they are there to do a task even though you can talk to them and you can get them to do other things, but there are definitely attempts and pretty successful ones to design just a companion that you can talk to and that can simulate emotions.

Tony Stephens (41:26):

Tell you about the Steeler's Ravens game and just have conversations about things, conversations with a friend over if it's sports or recreation or a movie you saw, but at the end of the game though, it's still using air quotes “artificial”, right?

Aaron Preece (41:44):

Cause like does it really - I saw someone have a really interesting, when they were talking about, well, how it is whenever you're talking to it, is it doing what? Is it happy helping you out? And someone said essentially that's its purpose. It's happy because it can provide the next token in the chain. You send it a text and it's happy just because it can provide that next token regardless of what it looks like and what it's simulating to you, which I thought was an interesting thought experiment, but I'm always nice to it just to be. I always think, and I feel like I always find myself kind of feeling awkward when I talk to it, as you could probably tell when I was talking to it, you talk to it kind of the way you talk to a person, but I stumble over my words and my prompts are a little different than the way I would just converse with someone. But when I type to it and I would type to it the way I would type to a person on Slack or through email or what have you.

Tony Stephens (42:48):

Yeah, I mean it's all about syntax, but that's what communication is. It's just, how we say things going back for millennia and millennia in terms of just human communications and yeah. Wow. So if you were to give some thought over this year, what are some of the big advancements you think are going to be really progressing based on just what you've been experiencing with Gemini? Because it does seem like this stuff is moving exponentially.

Aaron Preece (43:19):

One of the things I'm really excited for to see it come to mainstream is the meta glasses I've had for a while and I've been really itching to review those for AccessWorld and a lot of blind people are using them very successfully and they're constantly being updated and one of the things they're going to have is this similar kind of video chat feature where right now you can talk to meta and it will just by the nature of the glasses, it's almost like a live thing because there on your face it can take pictures instantly and it's pretty quick, but they're going to come out with a video aspect where it's just constantly streaming and you can talk to it and get information about live events right then. So I'm pretty excited for that. It's in beta right now I think. So I'm holding out until it comes to production to review the glasses as a whole and I'm waiting for that feature, so that'll be exciting.

Tony Stephens (44:19):

We could take it to a football match like soccer.

Aaron Preece (44:21):

Yeah and ask it what's going on in the field. One thing that was nice that with the meta glasses at the time, it wouldn't describe people because there's some privacy issues, especially being connected potentially to Facebook and it might know who you're looking at, but it might not, they don't want, that could be a privacy issue, so they won't tell you, Hey, Bob's across the street, but I noticed when I was outside and I would ask it about the environment, it would actively tell me there are no people around, which as a blind person, when I'm taking my dog out late at night and I'm walking up the street sometimes I want to know that. And so that's a useful feature and just the more I'm looking forward to the more hands-free, effortless kind of on the go ai, and I'm not sure when it comes to accessibility, but the improvement in chip technology, NVIDIA's been a big player in that field. They've come out with a new GPU ai, GPUI think or a PU or they're calling them something NPU, something like that possibly. And quantum computing, I think I saw an article recently that quantum computing seeing some advances and that again, being able to drastically increase speeds and I could see that benefiting the smaller and more when it comes to accessibility, especially when we think about robots. I said this earlier, but I like to see how quickly they're improving the chip technologies because the more that's done, the more power local models will get and that's where a lot of accessibility is going to benefit I think is from the local

Tony Stephens (46:09):

Model, the biggest growing companies in the world right now with their chip development, that whole parallel processing that, I mean they cut their teeth in gaming, but when you look at how much advancements have been made in video gaming, my kids have the PlayStation five, are we in five now? Four, it's five. PlayStation is the five. And it's like things like reflections and it's so many moving variables that are going on at once. And so that whole parallel computing is I think what's going to make a lot of this AI stuff really work in case situations, use scenarios where we might need it to help augment our site loss, getting to those personal sided companions that provide guidance for us in a visual world. Yeah, I'm still sort of at awe only two and a half years later.

Aaron Preece (47:13):

Yeah, it's amazing where they've come and I think because I would love to see the world be more accessible naturally. We would love to see more websites be accessible and that's just not the, as much as there's been massive, massive strides over the last 15 years, I would say especially maybe even the last 10 years, it's still somewhat of the work is still ongoing to make accessibility just a core feature of everything digital. And that's where I feel like AI can really bridge the gap in the meantime

(47:52):

To help with that. Even we've talked about it I think on previous podcasts briefly, but the AI that can just go and do stuff for you instead of just being able to describe stuff to you. And I think they're all kind of working on those. I don't know that there's a lot of those out to the public yet, but in Gemini being able to go to suno and play a song or go and order food for you or buy your plane ticket or something like that where it is going through, instead of going through a system, it's actively going to the website, looking at the website and understanding how to move the mouse to click things. And with those kinds of technologies, they're also training it in video games to look at the video game world and then control the character, which is very cool too.

(48:41):

I think kind of more personal, there's a lot of video games I'd love to play that would be very hard to make accessible 3D things especially. And so having that kind of capability or being able to go, there's some games like a lot of vertical movement where you're on top of one mountain and you can look around and see something off interesting on the other mountain and you have a glider and you can glide over to that other mountain if you're sighted having maybe an AI that could tell you what's around me that might be interesting or something like that inside of a video game would be pretty cool I think.

Tony Stephens (49:14):

Almost like that. Yeah. There was a TV show 20 something years ago called 24, and Chloe, Jack, what was his name? The lead character? That's Jack. No, not Jack Ryan.

Aaron Preece (49:29):

It's not Jack Reacher, is it?

Tony Stephens (49:31):

Yeah. But anyways, so Kiefer Sutherland's character, but he had the sidekick Chloe that actually when Ira came out, folks know Ira, their AI feature was going to sort of be part of that. And this was 2015, 2016, and Simon who started Ira, had the name of Chloe because it was Kiefer Sutherland's little assistant for their national security top secret agency where she was always on a video cam and would be like, go left, go, right? Alright. There's someone over there that's hiding, kind like the human version of what Ira is. But the idea of that AI as we have Siri, as we have Gemini, that it was going to be Chloe was kind of that idea of someone over your shoulder where you're playing a video game up, down, left, right? A B, AB B, all things, there's someone to your left, there's a thing to your right. It's no longer me going somewhere with my 13-year-old nudging him in the ribs going, Hey, what's going on?

Aaron Preece (50:29):

Hey, what's around?

Tony Stephens (50:31):

Yeah, that's cool. Well, so there is a lot to look forward to in our horizon. We'll have to come back two and a half years from now.

Aaron Preece (50:41):

And see where we are.

Tony Stephens (50:43):

An episode to see where we are. And Nougat, my guide dog is coming up to me right now is going, I'll still be here. I'm still here.

Aaron Preece (50:51):

Yeah, you might be able to hear mine snoring over there. I don't know if it picks up in the mic, but she's been snoring this whole time.

Tony Stephens (50:58):

Nougat’s doing the try to lift my hand up.

Aaron Preece (51:01):

Off the keyboard or something. Yeah,

Tony Stephens (51:03):

She's reminding me that all this stuff is artificial. It doesn't replace certain things in our life and we're not advocating, replacing

Aaron Preece (51:11):

For sure for

Tony Stephens (51:11):

Hugely independent for us and game breakers and human guides as well. But yeah, it's just exciting to think of the technology that can support us, it can create for us, it can assist us in our lives, particularly as they find interfaces, like you were mentioning the echo earlier, that a lot of older age people have in their homes and how can this homes that are smart homes already even more interactive and help us age in place for everybody that's going to live longer. Yeah. So there's just so much that ties into our culture and our life that when we look back even two and a half years in the future from now could be very different.

Aaron Preece (51:52):

And I would say too on the people who are losing their vision later in life or they're older and might not use the same, have grown up with the same kind of technologies, kind of like the Echo AI is so used, especially it's so user-friendly because you could just ask it, can you do this? And it will tell you and it can understand you and it will fill in the gaps. Even noticed if I make a spelling mistake, it still gets it and it or I'll forget a word and a keyword sometimes and it still understands what I want it to know. And so that kind of simulated human understanding I think will help and can benefit a lot of people who might find other technologies daunting or with a smartphone or what have you.

Tony Stephens (52:38):

Wow. Well, cool conversation today as we're wrapping up. We'll have AccessWorld coming out later this month that'll drop later in May. Check out that article that Aaron you wrote for listeners and people checking us out online, on music AI. Again, it's just taking it to a next level for everything.

Aaron Preece (53:03):

For sure.

Tony Stephens (53:04):

Tune in next month for AccessWorld, a podcast on digital inclusion and accessibility, and as we have our crystal ball, we'll see what revealed itself next month, but folks can also follow us online with all the 25 year back issues that are available for free online at afb.org/aw. And be sure to and subscribe to this podcast or a YouTube channel wherever you're checking us out. Yeah, so should we have Gemini wish us goodbye?

Aaron Preece (53:37):

We can try, see how he's doing,

Screen Reader (53:41):

Alert, allow istudio.google.com to use Firefox sharing indicator.

Aaron Preece (53:47):

Hey, Gemini, thank you for being on our podcast today and talking about your various capabilities and your video audio description capabilities, and we appreciate having you on the podcast today.

Gemini (54:03):

You're welcome. I'm glad to be on the podcast and discuss my capabilities with you.

Tony Stephens (54:07):

Thank you. Thanks Gemini. We love you Gemini.

Tony Stephens (54:11):

Remember us.

Gemini (54:13):

You're very welcome. Is there anything else I can help you with right now?

Aaron Preece (54:18):

He’s like what are we doing here?

Tony Stephens (54:20):

What are we doing here, guys? You're wasting my time. Come on. I got things to do. I'm trying to solve problems in Uruguay right now, so stop wasting my time. Oh, AI has such patience with us humans. All right, man. Awesome. Thanks. I'll talk to you later.

Outro (55:32):

You've been listening to AccessWorld, a podcast on digital inclusion and accessibility. AccessWorld is a production of the American Foundation for the Blind, produced at the Pickle Factory in Baltimore, Maryland. Our theme music is by Cosmonkey, compliments of Artlist.io. To email our hosts Aaron and Tony, email communications@afb.org. To learn more about the American Foundation for the Blind or even help support our work, go to www.afb.org.

AFB.