Skip to content

Commit

Permalink
Merge pull request #2608 from komaljoshi-SSK/ocbl169-notebook-updates
Browse files Browse the repository at this point in the history
OCBL169 - Notebook updates
  • Loading branch information
damonarunion authored May 10, 2024
2 parents 7cec0f8 + 31b94d7 commit bd8362a
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 12 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -74,10 +74,10 @@
"source": [
"## Create a Dataset from BigQuery \n",
"\n",
"Hacker news headlines are available as a BigQuery public dataset. The [dataset](https://console.cloud.google.com/bigquery?project=bigquery-public-data&page=table&t=stories&d=hacker_news&p=bigquery-public-data&redirect_from_classic=true) contains all headlines from the sites inception in October 2006 until October 2015. \n",
"Hacker news headlines are available as a BigQuery public dataset. The [dataset](https://console.cloud.google.com/bigquery?project=bigquery-public-data&page=table&t=full&d=hacker_news&p=bigquery-public-data&redirect_from_classic=true) contains all headlines from the sites inception in October 2006 until October 2015. \n",
"\n",
"### Lab Task 1a: \n",
"Complete the query below to create a sample dataset containing the `url`, `title`, and `score` of articles from the public dataset `bigquery-public-data.hacker_news.stories`. Use a WHERE clause to restrict to only those articles with\n",
"Complete the query below to create a sample dataset containing the `url`, `title`, and `score` of articles from the public dataset `bigquery-public-data.hacker_news.full`. Use a WHERE clause to restrict to only those articles with\n",
"* title length greater than 10 characters\n",
"* score greater than 10\n",
"* url length greater than 0 characters"
Expand Down Expand Up @@ -126,10 +126,10 @@
"%%bigquery --project $PROJECT\n",
"\n",
"SELECT\n",
" ARRAY_REVERSE(SPLIT(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'), '.'))[OFFSET(1)] AS source,\n",
" ARRAY_REVERSE(SPLIT(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'), '.'))[safe_offset (1)] AS source,\n",
" # TODO: Your code goes here.\n",
"FROM\n",
" `bigquery-public-data.hacker_news.stories`\n",
" `bigquery-public-data.hacker_news.full`\n",
"WHERE\n",
" REGEXP_CONTAINS(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'), '.com$')\n",
" # TODO: Your code goes here.\n",
Expand Down Expand Up @@ -158,10 +158,10 @@
"sub_query = \"\"\"\n",
"SELECT\n",
" title,\n",
" ARRAY_REVERSE(SPLIT(REGEXP_EXTRACT(url, '{0}'), '.'))[OFFSET(1)] AS source\n",
" ARRAY_REVERSE(SPLIT(REGEXP_EXTRACT(url, '{0}'), '.'))[safe_offset (1)] AS source\n",
" \n",
"FROM\n",
" `bigquery-public-data.hacker_news.stories`\n",
" `bigquery-public-data.hacker_news.full`\n",
"WHERE\n",
" REGEXP_CONTAINS(REGEXP_EXTRACT(url, '{0}'), '.com$')\n",
" AND LENGTH(title) > 10\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@
"source": [
"## Create a Dataset from BigQuery \n",
"\n",
"Hacker news headlines are available as a BigQuery public dataset. The [dataset](https://console.cloud.google.com/bigquery?project=bigquery-public-data&page=table&t=stories&d=hacker_news&p=bigquery-public-data&redirect_from_classic=true) contains all headlines from the sites inception in October 2006 until October 2015. \n",
"Hacker news headlines are available as a BigQuery public dataset. The [dataset](https://console.cloud.google.com/bigquery?project=bigquery-public-data&page=table&t=full&d=hacker_news&p=bigquery-public-data&redirect_from_classic=true) contains all headlines from the sites inception in October 2006 until October 2015. \n",
"\n",
"Here is a sample of the dataset:"
]
Expand All @@ -90,7 +90,7 @@
"SELECT\n",
" url, title, score\n",
"FROM\n",
" `bigquery-public-data.hacker_news.stories`\n",
" `bigquery-public-data.hacker_news.full`\n",
"WHERE\n",
" LENGTH(title) > 10\n",
" AND score > 10\n",
Expand All @@ -114,10 +114,10 @@
"%%bigquery --project $PROJECT\n",
"\n",
"SELECT\n",
" ARRAY_REVERSE(SPLIT(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'), '.'))[OFFSET(1)] AS source,\n",
" ARRAY_REVERSE(SPLIT(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'), '.'))[safe_offset (1)] AS source,\n",
" COUNT(title) AS num_articles\n",
"FROM\n",
" `bigquery-public-data.hacker_news.stories`\n",
" `bigquery-public-data.hacker_news.full`\n",
"WHERE\n",
" REGEXP_CONTAINS(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'), '.com$')\n",
" AND LENGTH(title) > 10\n",
Expand Down Expand Up @@ -146,10 +146,10 @@
"sub_query = \"\"\"\n",
"SELECT\n",
" title,\n",
" ARRAY_REVERSE(SPLIT(REGEXP_EXTRACT(url, '{0}'), '.'))[OFFSET(1)] AS source\n",
" ARRAY_REVERSE(SPLIT(REGEXP_EXTRACT(url, '{0}'), '.'))[safe_offset (1)] AS source\n",
" \n",
"FROM\n",
" `bigquery-public-data.hacker_news.stories`\n",
" `bigquery-public-data.hacker_news.full`\n",
"WHERE\n",
" REGEXP_CONTAINS(REGEXP_EXTRACT(url, '{0}'), '.com$')\n",
" AND LENGTH(title) > 10\n",
Expand Down

0 comments on commit bd8362a

Please sign in to comment.