[10x project] Exceeding 10M and announcing new capabilities
Hi everyone, a few updates related to the 10x project. Read last month’s announcement for the context. Exceeding 10M home pages in the July crawl The capacity timing could not have been better—it’s...
View ArticleProportion of browsers with 3rd party cookies off
Hi, I’d like to know what proportion of browsers have 3rd part cookies switched off. My searches haven’t come up with anything, yet. Can anybody share links to worthwhile research on that? 1 post - 1...
View ArticleURL-Level Data
Hi there - I stumbled upon this site, and the report and forums are great! Lots of awesome reports and insights to dig through and digest. Do you have URL-level detail available for the reports? I’m...
View ArticleCore Web Vitals by Http Version?
almanac.httparchive.org Performance | 2021 | The Web Almanac by HTTP Archive Performance chapter of the 2021 Web Almanac covering Core Web Vitals (Largest Contentful Paint, Cumulative Layout Shift,...
View Article--- Article Not Found! ---
*** *** *** RSSing Note: Article is missing! We don't know where we put it!!. *** ***
View Article--- Article Not Found! ---
*** *** *** RSSing Note: Article is missing! We don't know where we put it!!. *** ***
View Article--- Article Not Found! ---
*** *** *** RSSing Note: Article is missing! We don't know where we put it!!. *** ***
View Article--- Article Not Found! ---
*** *** *** RSSing Note: Article is missing! We don't know where we put it!!. *** ***
View ArticleMissing websites in April 2022
Dear Http Archive Comunity, We are researchers investigating trends in website request data using Http Archive and came across a strange issue we would like to have more clarity about. It has been...
View ArticleHttparchive.technologies Nov 22 missing
Hey, why `httparchive.technologies.2022_11_01_*’ tabels are missing? Thanks, Mor Topaz (wix.com) 1 post - 1 participant Read full topic
View ArticleHistorical decline in WWW subdomain use?
It’s commonly said that the classic ‘www.’ subdomain is in steep decline, prompted by general lack of need and catering to mobile users. I agree that it does seem like you see more ‘foo.com’ rather...
View ArticleFind all web sites which share the same Google Analytics ID
Hi, I have imported the httparchive dataset in BigQuery I have a question, Is it possible to retrieve all of the web pages which share the same google analitics ID? 1 post - 1 participant Read full...
View ArticleHow to find websites that use Early Hints
For my master thesis on Early Hints I’d like to find websites that return Early Hints (HTTP | 2022 | The Web Almanac by HTTP Archive) to do performance experiments with them. How to find urls that...
View ArticleHow many https websites are using Mixed Content images
We (Mozilla) wanted to find out how the landscape has changed since 2020 wrt to usage of mixed content images on https websites, and how many of those images are broken anyway (fail to fetch). Here’s...
View ArticleHelp with BigQuery to extract all datapoints for top 1,000 sites
Hi, would anyone be interested in helping me to extract the full data points for the top 1,000 domains on HTTP Archive for page size, and the worst 1,000 for size? Happy to pay for this, so please...
View ArticleUnderstanding embeds in WordPress
I am researching lazy loading of embeds on WordPress sites. In particular, I want to look into the “Embed block” WordPress places into a post when a user pastes in a URL from a supported oEmbed...
View ArticleWhy videos is not appearing on Median page weight by content type figure?
Is it because videos are mainly tierce assets? How to compare videos weight regarding other content type in that case? 4 posts - 2 participants Read full topic
View ArticleHelp finding list of home pages with specific http response header
I’m new to the HTTP Archive dataset as well as Big Query, but have some very basic SQL knowledge. I’m hoping to get help with a certain query. I’m looking to generate a list of home page URLs where...
View ArticleQuerying the HTTP Archive with DuckDB
I needed to explore ETag response headers locally and I’ve come up with a workflow for querying HTTP Archive data on my laptop in Parquet format using DuckDB. You may have to create Google Cloud...
View ArticleWhat is the most popular feed format: rss, atom, or JSON?
Hi all. I am new to HTTP Archive and BigQuery world. I am trying to determine the popularity of feed formats (rss/atom/json)? First idea that comes to my mind is to scan the body of...
View ArticleNext Monthly Upload?
Apologies if this has been answered before, but I couldn’t find an answer. When is fresh monthly data typically added to Big Query? When should we expect the June '23 crawl data to be accessible?...
View ArticleDownloading HAR-Datasets later than May 2022?
Hi everyone, for scientific research, we would like to download the HAR or summary data from the httparchive folder as stated in the tutorial 1. However, when checking the available datasets for...
View ArticleAccessing Web Almanac 2022's raw data?
Hi! Is there a way to download all 42TB of Web Almanac’s data? 3 posts - 3 participants Read full topic
View ArticleWebsites with cache-control: no-store, by CDN
I recently queried the HTTP Archive to get the number of websites (pages) that serve the HTML with response header cache-control contains no-store . My query: SELECT _TABLE_SUFFIX AS client,...
View ArticlePage weight in 1995?
It’s very nice to see here the evolution of page weight (and other elements) over time. I’m interesting in having the same data before 2010. Here’s what wrote WebSiteOptimization in 2014 for the...
View ArticlePages and Requests later than Apr 2022 in gs://httparchive/downloads/
Hello, I used to download the list of pages and requests from gs://httparchive/downloads/ , where there are files like: httparchive_Sep_1_2021_requests.gz However, there is no data newer than April...
View ArticleHigher level stats for structured data
I looked up the Structured Data chapter to cite data on what % of sites use structured data at all, and what % specify any type of metadata like description, image etc, but it’s currently far more...
View ArticleWeb vitals score
Hello, I have question about how to get data for Web vitals score. I am using Google Lighthouse and i wanted to edit score values for the latest ones. Right in the source files there are links like...
View ArticleTranslating JavaScript API names to feature names
I’m hoping to order a bunch of JavaScript APIs by usage, as a way to understand which of the APIs might be more/less important. Is there a recommended way to translate JavaScript API names (e.g....
View ArticleWarning: $14,000 BigQuery Charge in 2 Hours
This website makes it seem like this “public” dataset is for the community to use, but it is instead a for-profit money maker for Google Cloud and you can lose tens of thousands of dollars. Last week...
View Article
More Pages to Explore .....