Aaron Preece

AI-powered image recognition, both in the real world and in digital technology, has been a massive boost to accessibility for people who are blind or have low vision. Not only is it helpful in the real world when identifying objects or locations, it is just as useful when identifying inaccessible items on a computer or smartphone. For example, the flood of image-based social media has made this technology incredibly beneficial, as it is now possible to independently look at visual memes and other screenshots. It is also helpful when testing content for accessibility, since you may run across inaccessible software or a website where you believe a control might exist but is not appearing to your screen reader. Using image recognition, you can get a much clearer picture of your screen’s contents.

I personally value this technology because of the independence it gives me. Often, I do have access to sighted assistance to help with inaccessible items on my screen, but being able to identify things independently is very freeing. And often, I want information on images that feel otherwise frivolous, such as the meme example above.

Historically, to access AI-powered image recognition, you would need to take a screenshot and upload it to your AI of choice using their interface. Now, on Windows PCs, a program called Viewpoint, developed by Nibble Nerds, allows you to instantly use AI image technology with the press of a keystroke, making access significantly more efficient and directly embedded in your workflow. In addition to being able to recognize images you send it, Viewpoint has a groundbreaking feature in which you can use AI image recognition to attempt to use completely inaccessible interfaces. The program sends the image to the AI, which then returns specific mouse coordinates to the Viewpoint program. Viewpoint then constructs a list of the recognized UI elements and allows you to tab through them and activate them with the keyboard. This simulates a mouse click on the region of the screen that the AI identified as the spot for that UI element. Much like Apple’s Screen Recognition feature on iOS, this can be hit or miss depending on the program, but it greatly increases the value of this particular application. In this article, we detail setting up Viewpoint as well as the benefits of its different features.

Viewpoint functions by sending information to Google’s Gemini AI. To use it, you will need to acquire an API key from Google AI Studio. Helpfully, Nibble Nerds has included a link directly to the page in AI Studio where you can acquire keys in their documentation, or you can visit the page here. This process is fairly straightforward but has changed slightly since I first started using the program. On the API Key page, you will need to select “Create an API Key,” name your key, and select a project to associate it with. Unless you're a Gemini developer already, you're not going to have any projects, so you'll need to create one before generating the key. Find “Projects” in the navigation menu, and from there select “Create Project.” Simply name your project and press Create. After a few seconds, you will be informed that the project is created. Now press the “Get API Key” button and select “Create a Key.” Name your key, select your newly created project from the drop-down, and press Create. Once the key is created, find it in the table and, in the final column, press “Copy API Key” to copy it. When you first launch Viewpoint, a dialog box will pop up asking you to input your key, where you can simply paste it. Once you have input your key and pressed OK, Viewpoint will be ready to go.

Once launched, Viewpoint will wait in the background for you to send it queries for Gemini. One keystroke will cycle you through the different Viewpoint modes, while another activates the mode, such as scanning UI elements, performing optical character recognition on the entire screen, opening a query box where you can send Gemini a question along with your screenshot, or opening a PDF reader where you can select a PDF file to send to Gemini so it can extract the text. These keyboard commands are configurable in the Viewpoint Settings menu. You can launch Viewpoint Settings by pressing CTRL+SHIFT+ALT+V by default, or by going to the system tray. You may need to expand the overflow area to find the Viewpoint icon, where you can enter Settings or exit Viewpoint.

Note that, compared to many programs, when you adjust key bindings, you simply edit the line of text that contains the binding, following the syntax provided. For example, I changed my Viewpoint activation keys from CTRL+SHIFT+SLASH to ALT+SHIFT+SLASH, as the Shift key was interfering with a program I commonly use with Viewpoint. Depending on your use cases, you may want to edit all or none of these.

The Settings menu also allows you to change how Viewpoint handles UI mode, such as automatically regenerating the UI after an item has been selected, as well as turning sounds on or off. You can also choose which Gemini model to use for your queries; 2.0 Flash is the default, with 2.5 Flash and 2.0 Flash Lite also available. 2.0 Flash is selected by default due to its balance of response speed and accuracy. You can also customize your prompt groups here. These are specific queries that you can set as presets so you don't have to type a query every time you ask Gemini for specific information. For example, I personally use Viewpoint frequently to play an RPG video game that is otherwise inaccessible. I have presets for asking Gemini about the condition of my party, describing battle information, and detailing shop screens. I have Gemini structure these responses in particular ways, so having them as presets is very useful.

Using Viewpoint is straightforward. Cycle to the mode you want with the cycling keystroke, and then press the activation keystroke. You'll hear a camera-shutter sound to let you know the mode has been triggered, as well as your screen reader informing you which mode has been activated. You'll hear a loading or “please wait” sound while your image is sent to Gemini and it returns its response. Once the response arrives, it will be displayed in a dialog box in plain text. Here, you can copy the response and close the dialog. Note that sometimes this dialog is not directly focused and you might have to navigate to the window containing the response.

UI mode is slightly different. When a UI snapshot is taken, you can use the Tab and Shift+Tab keys to cycle through the detected UI elements and press Enter to activate a given element. Depending on your settings, Viewpoint may wait and then generate a new snapshot with the assumption that the UI has changed. To exit this mode, cycle to another Viewpoint feature, which will close UI navigation.

Generally, Viewpoint works very well with few errors, but there are some persistent ones you may need to work around. One that I've noticed is that if you leave Viewpoint running for too long without using it, it will stop responding to certain keystrokes or stop sending information to Gemini. There is a keystroke for shutting Viewpoint down, which you should use here, but you may need to do so from the system tray, as the shutdown keystroke often becomes unresponsive when this occurs. In addition, be sure to close out dialogs before requesting new information from the AI, as an open dialog can sometimes cause odd behavior.

In other cases, Gemini itself may return an error. Usually you can cycle to a different mode and use that one. For example, if you get a Gemini error in Query mode, just cycle to OCR instead. The same goes for the PDF reader: instead of uploading your PDF, you can use OCR to gather the content. In many cases, simply making the request again will result in Gemini sending back a proper response.

Overall, I find the efficiency and ease of use of Viewpoint to be extremely helpful, especially given how often I ask AI to describe images. I particularly like that an OCR feature is available, as gathering slightly more accurate OCR results is often all you need depending on your use case. Having that as an instant option without needing to enter a prompt is very helpful.

Viewpoint itself is free to download and use, and you get fairly generous usage quotas with the free tier of Gemini’s API. I highly recommend giving it a try if you benefit from AI image description. If you would like to see Viewpoint in action, check out the episode of the AccessWorld podcast in which I demonstrate its use here, or you can find a developer at Nibble Nerds demonstrating the program in a YouTube video here.

Author
Aaron Preece
Article Topic
Product Reviews