How to get an element's parameters in the browser

When creating workflows for scrapping or manipulating web data you are very likely to use some activities from the 'Browser' group of activities. More likely, you will want to extract data from certain HTML elements, or elements with specific classes or IDs. ElectroNeek utilizes CSS selector or XPath mechanisms to specify elements. Both are capable to find almost any HTML element on a web page.

Hands-on

Let's consider Google.com page as example. If we intend to create some primitive bot that opens a google.com page in browser and searches some particular information, the bot should at least be able to:

  • Locate the input area in HTML code

  • Type request in the area

  • Locate the Google Search button in HTML code

  • Press the button

Assume we first want to locate the main input element on the page and get its selector.

We using Google Chrome browser for the explanation of this example.

  1. Open google.com in the browser manually and navigate the page.

  2. Press F12 to inspect HTML code.

  3. Navigate Elements tab.

You should see something like

That is HTML structure of web page. The next step is to navigate to particular element and get attributes identifying the element. To do so press the following icon in the left-upper corner of the inspector.

After activating this mode just move the cursor over the element on the left side and its corresponding code will be highlighted on the right.

Thus we can see the code for any element of the page just hovering mouse cursor.

CSS selectors

Cascading Style Sheets (CSS) is a style sheet language used for describing the look and formatting of a document written in HTML or XML. In CSS, selectors are patterns used to select the styled element(s).

CSS selectors are better to use when dealing with classes, IDs and tag names. They are shorter and easier to read.

A CSS selector is immediately shown on the screen when hovering the mouse (see the picture above).

In the example with google.com the main input field had following CSS selector

Use CSS Selector Tester to play with the different selectors.

XPath selectors

XPath, the XML path language, is a query language for selecting nodes from an XML document. Locating elements with XPath works very well with a lot of flexibility. XPath uses path expressions to navigate through elements and attributes in an XML document.

There exist examples hard to deal with using just CSS selectors. Take a look at the HTML code

<p> First </p>
<p> Second </p>
<p> Third. Some text in Paragraph </p>

XPath for getting content of the third <p> tag is

//p[contains(text(), 'Some text in Paragraph')]

But there is no possibility to match content inside <p> tag with Pure CSS Selector.

There are no content selectors in CSS3 specification. We can match on an element, the name of an attribute in the element, and the value of a named attribute in an element. There is nothing for matching content within an element, though.

But, what if we need to do a complex query that takes into consideration the element’s content you’re trying to find? There’s no other way except using XPath.

In order to get quick access to XPath right click on highlighted part of code and navigete to XPath copy

In the example with google.com the main input field had following XPath

//*[@id="tsf"]/div[2]/div[1]/div[1]/div/div[2]/input