Quantcast
Channel: HTTP Archive - Latest topics
Browsing latest articles
Browse All 168 View Live

Image may be NSFW.
Clik here to view.

[10x project] Exceeding 10M and announcing new capabilities

Hi everyone, a few updates related to the 10x project. Read last month’s announcement for the context. Exceeding 10M home pages in the July crawl The capacity timing could not have been better—it’s...

View Article



Proportion of browsers with 3rd party cookies off

Hi, I’d like to know what proportion of browsers have 3rd part cookies switched off. My searches haven’t come up with anything, yet. Can anybody share links to worthwhile research on that? 1 post - 1...

View Article

URL-Level Data

Hi there - I stumbled upon this site, and the report and forums are great! Lots of awesome reports and insights to dig through and digest. Do you have URL-level detail available for the reports? I’m...

View Article

Image may be NSFW.
Clik here to view.

Core Web Vitals by Http Version?

almanac.httparchive.org Performance | 2021 | The Web Almanac by HTTP Archive Performance chapter of the 2021 Web Almanac covering Core Web Vitals (Largest Contentful Paint, Cumulative Layout Shift,...

View Article

--- Article Not Found! ---

*** *** *** RSSing Note: Article is missing! We don't know where we put it!!. *** ***

View Article


--- Article Not Found! ---

*** *** *** RSSing Note: Article is missing! We don't know where we put it!!. *** ***

View Article

--- Article Not Found! ---

*** *** *** RSSing Note: Article is missing! We don't know where we put it!!. *** ***

View Article

--- Article Not Found! ---

*** *** *** RSSing Note: Article is missing! We don't know where we put it!!. *** ***

View Article


Missing websites in April 2022

Dear Http Archive Comunity, We are researchers investigating trends in website request data using Http Archive and came across a strange issue we would like to have more clarity about. It has been...

View Article


Httparchive.technologies Nov 22 missing

Hey, why `httparchive.technologies.2022_11_01_*’ tabels are missing? Thanks, Mor Topaz (wix.com) 1 post - 1 participant Read full topic

View Article

Historical decline in WWW subdomain use?

It’s commonly said that the classic ‘www.’ subdomain is in steep decline, prompted by general lack of need and catering to mobile users. I agree that it does seem like you see more ‘foo.com’ rather...

View Article

Find all web sites which share the same Google Analytics ID

Hi, I have imported the httparchive dataset in BigQuery I have a question, Is it possible to retrieve all of the web pages which share the same google analitics ID? 1 post - 1 participant Read full...

View Article

How to find websites that use Early Hints

For my master thesis on Early Hints I’d like to find websites that return Early Hints (HTTP | 2022 | The Web Almanac by HTTP Archive) to do performance experiments with them. How to find urls that...

View Article


How many https websites are using Mixed Content images

We (Mozilla) wanted to find out how the landscape has changed since 2020 wrt to usage of mixed content images on https websites, and how many of those images are broken anyway (fail to fetch). Here’s...

View Article

Help with BigQuery to extract all datapoints for top 1,000 sites

Hi, would anyone be interested in helping me to extract the full data points for the top 1,000 domains on HTTP Archive for page size, and the worst 1,000 for size? Happy to pay for this, so please...

View Article


Understanding embeds in WordPress

I am researching lazy loading of embeds on WordPress sites. In particular, I want to look into the “Embed block” WordPress places into a post when a user pastes in a URL from a supported oEmbed...

View Article

Why videos is not appearing on Median page weight by content type figure?

Is it because videos are mainly tierce assets? How to compare videos weight regarding other content type in that case? 4 posts - 2 participants Read full topic

View Article


Help finding list of home pages with specific http response header

I’m new to the HTTP Archive dataset as well as Big Query, but have some very basic SQL knowledge. I’m hoping to get help with a certain query. I’m looking to generate a list of home page URLs where...

View Article

Querying the HTTP Archive with DuckDB

I needed to explore ETag response headers locally and I’ve come up with a workflow for querying HTTP Archive data on my laptop in Parquet format using DuckDB. You may have to create Google Cloud...

View Article

Image may be NSFW.
Clik here to view.

What is the most popular feed format: rss, atom, or JSON?

Hi all. I am new to HTTP Archive and BigQuery world. I am trying to determine the popularity of feed formats (rss/atom/json)? First idea that comes to my mind is to scan the body of...

View Article

Next Monthly Upload?

Apologies if this has been answered before, but I couldn’t find an answer. When is fresh monthly data typically added to Big Query? When should we expect the June '23 crawl data to be accessible?...

View Article


Downloading HAR-Datasets later than May 2022?

Hi everyone, for scientific research, we would like to download the HAR or summary data from the httparchive folder as stated in the tutorial 1. However, when checking the available datasets for...

View Article


Accessing Web Almanac 2022's raw data?

Hi! Is there a way to download all 42TB of Web Almanac’s data? 3 posts - 3 participants Read full topic

View Article

Image may be NSFW.
Clik here to view.

Websites with cache-control: no-store, by CDN

I recently queried the HTTP Archive to get the number of websites (pages) that serve the HTML with response header cache-control contains no-store . My query: SELECT _TABLE_SUFFIX AS client,...

View Article

Image may be NSFW.
Clik here to view.

Page weight in 1995?

It’s very nice to see here the evolution of page weight (and other elements) over time. I’m interesting in having the same data before 2010. Here’s what wrote WebSiteOptimization in 2014 for the...

View Article


Pages and Requests later than Apr 2022 in gs://httparchive/downloads/

Hello, I used to download the list of pages and requests from gs://httparchive/downloads/ , where there are files like: httparchive_Sep_1_2021_requests.gz However, there is no data newer than April...

View Article

Higher level stats for structured data

I looked up the Structured Data chapter to cite data on what % of sites use structured data at all, and what % specify any type of metadata like description, image etc, but it’s currently far more...

View Article

Web vitals score

Hello, I have question about how to get data for Web vitals score. I am using Google Lighthouse and i wanted to edit score values for the latest ones. Right in the source files there are links like...

View Article

Translating JavaScript API names to feature names

I’m hoping to order a bunch of JavaScript APIs by usage, as a way to understand which of the APIs might be more/less important. Is there a recommended way to translate JavaScript API names (e.g....

View Article



Warning: $14,000 BigQuery Charge in 2 Hours

This website makes it seem like this “public” dataset is for the community to use, but it is instead a for-profit money maker for Google Cloud and you can lose tens of thousands of dollars. Last week...

View Article
Browsing latest articles
Browse All 168 View Live




Latest Images