D
Deleted member 131833
Guest
can someone explain what a robot is as in people logged onto site, guests, visitors and robots
Are we being watched?
Are we being watched?
Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
Google, Bing, Yandex and all the other search engines, plus government agents (looking for terrorist activity for example) and some private individuals , deploy software programmes called robots (located on computers attached to the Internet) who's job it is to index the web. Indexing is the process of searching all publicly accessible websites and then storing a list of everything that's been found, alongside a location of where it's located in the search engines' huge database(s).
Important and, or busy sites, can be indexed hundreds of times a day by any given robot, or in the case of smaller sites, blogs etc, might only be indexed every few days. The better the index, of which there can be hundreds describing any given data set, the more quickly an item can be found by folks typing search queries into search engines.
To achieve the above, robots are fed a list of web sites, and then navigate to each site in turn and try to find a pair of files called (iirc) robots.txt and sitemap.xml. These files, if found, contain instructions on what the site owner does or does not want to be indexed by the robot. Even if these files don't exist, the search engine robot will visit each home page and read any links on the home page (in text or navigation menus) and then read those pages in turn and any other pages linked to from those pages... and so on. Before you know it, every page on your website has been indexed simply because the robot followed all the links. Not all files or pages on a website are pointed to by links - hence why the sitemap file is used to instruct the robot where to find non obvious content.
But, lots of sites get caught out by unscrupulous robots by indicating where a page is on the site, followed by a no index request (don't add this to your index PLEASE!). The robot can decide whether to obey the noindex request, or do the complete opposite, sometimes unscrupulously listing these files first in searches, given the assumption they might be hiding something 'interesting'. Google etc doesn't do this, but sites are visited by hundreds of robots, not all of them with good intentions.
Additionally, Some sites use snippets, short pieces of structured code and content, which the robot can interpret as instructions on how to display content from a site, within the search results, without having to visit the site itself. These are most widely used for things like shopping - so for example when you search for a camera, Google can show who's selling it, whether it's in stock, the price and also ratings for the product and for the shop itself - as long as the website has gone to the trouble of 'marking up' their content using a set of predefined rules which the robot/search engine understands (defined at schema.org). Alongside product listings, things likes flight times, trains, cinemas, videos, etc, will all push their products and information into the engines in this way.
It's not just pages that are indexed - in fact anything that can be retrieved via a url (web address) can be found and indexed for use in search engine results. Incidentally, there's a big industry centred on something called SEO (search engine optimisation). Companies set themselves up as 'experts' and sell services to website owners promising to get them on page 1, or better, towards the top of page 1 of search results. They do this by adding key words and meta information (data describing data) to web pages, plus they tune robots.txt and sitemap.xml based on so called 'knowledge' of search engines indexing algorithms.
There's much more to it than I've set out above, but that's the basics.
To see what Google 'knows' about MBC, click the first link for pages and documents and the second for images.
Pages: site:forums.mbclub.co.uk
Images: site:forums.mbclub.co.uk
To try this yourself, goto Google, type site: followed by the address of a website, hit return/enter and you'll see what Google has indexed for that site. Click on the image tab to see the images.
As someone who appears to know sumfink about computas one question . Is it worth having a VPN ? not really for security , its just that I travel overseas a lot and whenever I want to watch ITV (rugby) or BBC iPlayer (for unbiased political news reports...not) or SKy go I am in 'the wrong' region. Would having a VPN sort this ? is their a monthly cost ? cheers.Google, Bing, Yandex and all the other search engines, plus government agents (looking for terrorist activity for example) and some private individuals , deploy software programmes called robots (located on computers attached to the Internet) who's job it is to index the web. Indexing is the process of searching all publicly accessible websites and then storing a list of everything that's been found, alongside a location of where it's located in the search engines' huge database(s).
Important and, or busy sites, can be indexed hundreds of times a day by any given robot, or in the case of smaller sites, blogs etc, might only be indexed every few days. The better the index, of which there can be hundreds describing any given data set, the more quickly an item can be found by folks typing search queries into search engines.
To achieve the above, robots are fed a list of web sites, and then navigate to each site in turn and try to find a pair of files called (iirc) robots.txt and sitemap.xml. These files, if found, contain instructions on what the site owner does or does not want to be indexed by the robot. Even if these files don't exist, the search engine robot will visit each home page and read any links on the home page (in text or navigation menus) and then read those pages in turn and any other pages linked to from those pages... and so on. Before you know it, every page on your website has been indexed simply because the robot followed all the links. Not all files or pages on a website are pointed to by links - hence why the sitemap file is used to instruct the robot where to find non obvious content.
But, lots of sites get caught out by unscrupulous robots by indicating where a page is on the site, followed by a no index request (don't add this to your index PLEASE!). The robot can decide whether to obey the noindex request, or do the complete opposite, sometimes unscrupulously listing these files first in searches, given the assumption they might be hiding something 'interesting'. Google etc doesn't do this, but sites are visited by hundreds of robots, not all of them with good intentions.
Additionally, Some sites use snippets, short pieces of structured code and content, which the robot can interpret as instructions on how to display content from a site, within the search results, without having to visit the site itself. These are most widely used for things like shopping - so for example when you search for a camera, Google can show who's selling it, whether it's in stock, the price and also ratings for the product and for the shop itself - as long as the website has gone to the trouble of 'marking up' their content using a set of predefined rules which the robot/search engine understands (defined at schema.org). Alongside product listings, things likes flight times, trains, cinemas, videos, etc, will all push their products and information into the engines in this way.
It's not just pages that are indexed - in fact anything that can be retrieved via a url (web address) can be found and indexed for use in search engine results. Incidentally, there's a big industry centred on something called SEO (search engine optimisation). Companies set themselves up as 'experts' and sell services to website owners promising to get them on page 1, or better, towards the top of page 1 of search results. They do this by adding key words and meta information (data describing data) to web pages, plus they tune robots.txt and sitemap.xml based on so called 'knowledge' of search engines indexing algorithms.
There's much more to it than I've set out above, but that's the basics.
To see what Google 'knows' about MBC, click the first link for pages and documents and the second for images.
Pages: site:forums.mbclub.co.uk
Images: site:forums.mbclub.co.uk
To try this yourself, goto Google, type site: followed by the address of a website, hit return/enter and you'll see what Google has indexed for that site. Click on the image tab to see the images.
As someone who appears to know sumfink about computas one question . Is it worth having a VPN ? not really for security , its just that I travel overseas a lot and whenever I want to watch ITV (rugby) or BBC iPlayer (for unbiased political news reports...not) or SKy go I am in 'the wrong' region. Would having a VPN sort this ? is their a monthly cost ? cheers.
We use essential cookies to make this site work, and optional cookies to enhance your experience.