Victor
commited on
Commit
Β·
33f7caa
1
Parent(s):
93b922f
Create AI Azure architect
Browse files- .github/workflows/check-file-size.yml +16 -0
- .github/workflows/main.yml +20 -0
- README.md +36 -1
- app.py +171 -42
- data/.gitkeep +0 -0
- requirements.txt +0 -0
- scripts/00_DataCollection.md +44 -0
- scripts/01_Create_Dataset.ipynb +862 -0
- scripts/02_Process_files.ipynb +0 -0
- scripts/03_Add_context.ipynb +0 -0
- scripts/04_Finetune_Embedding.ipynb +0 -0
- scripts/05_Create_Test_Data.ipynb +0 -0
- scripts/06_Create_Vector.ipynb +0 -0
- scripts/07_Reranking.ipynb +0 -0
.github/workflows/check-file-size.yml
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
name: Check file size
|
2 |
+
on: # or directly `on: [push]` to run the action on every push on any branch
|
3 |
+
pull_request:
|
4 |
+
branches: [main]
|
5 |
+
|
6 |
+
# to run this workflow manually from the Actions tab
|
7 |
+
workflow_dispatch:
|
8 |
+
|
9 |
+
jobs:
|
10 |
+
sync-to-hub:
|
11 |
+
runs-on: ubuntu-latest
|
12 |
+
steps:
|
13 |
+
- name: Check large files
|
14 |
+
uses: ActionsDesk/lfs-warning@v2.0
|
15 |
+
with:
|
16 |
+
filesizelimit: 10485760 # this is 10MB so we can sync to HF Spaces
|
.github/workflows/main.yml
ADDED
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
name: Sync to Hugging Face hub
|
2 |
+
on:
|
3 |
+
push:
|
4 |
+
branches: [main]
|
5 |
+
|
6 |
+
# to run this workflow manually from the Actions tab
|
7 |
+
workflow_dispatch:
|
8 |
+
|
9 |
+
jobs:
|
10 |
+
sync-to-hub:
|
11 |
+
runs-on: ubuntu-latest
|
12 |
+
steps:
|
13 |
+
- uses: actions/checkout@v3
|
14 |
+
with:
|
15 |
+
fetch-depth: 0
|
16 |
+
lfs: true
|
17 |
+
- name: Push to hub
|
18 |
+
env:
|
19 |
+
HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
20 |
+
run: git push https://vicpada:$HF_TOKEN@huggingface.co/spaces/vicpada/ai-microsoft-solution-architect main
|
README.md
CHANGED
@@ -1,3 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Starting Point for the Final Project of the "From Beginner to Advanced LLM Developer" course
|
2 |
|
3 |
## Overview
|
@@ -14,8 +25,11 @@ If you want, you can use this repository as starting point for your final projec
|
|
14 |
|
15 |
```bash
|
16 |
OPENAI_API_KEY="sk-..."
|
|
|
17 |
```
|
18 |
|
|
|
|
|
19 |
2. Create a local virtual environment, for example using the `venv` module. Then, activate it.
|
20 |
|
21 |
```bash
|
@@ -33,4 +47,25 @@ pip install -r requirements.txt
|
|
33 |
|
34 |
```bash
|
35 |
python app.py
|
36 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
title: AI Azure Architect
|
3 |
+
emoji: π‘
|
4 |
+
colorFrom: blue
|
5 |
+
colorTo: indigo
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: "4.44.1"
|
8 |
+
app_file: app.py
|
9 |
+
pinned: false
|
10 |
+
---
|
11 |
+
|
12 |
# Starting Point for the Final Project of the "From Beginner to Advanced LLM Developer" course
|
13 |
|
14 |
## Overview
|
|
|
25 |
|
26 |
```bash
|
27 |
OPENAI_API_KEY="sk-..."
|
28 |
+
COHERE_API_KEY="...."
|
29 |
```
|
30 |
|
31 |
+
<b>Note: Open AI and Cohere Keys are manually enter in a textbox</b>
|
32 |
+
|
33 |
2. Create a local virtual environment, for example using the `venv` module. Then, activate it.
|
34 |
|
35 |
```bash
|
|
|
47 |
|
48 |
```bash
|
49 |
python app.py
|
50 |
+
```
|
51 |
+
|
52 |
+
# Data Collection and curation
|
53 |
+
|
54 |
+
Check this [Data Collection](/scripts/00_DataCollection.md) file for information about collection and curation information.
|
55 |
+
|
56 |
+
# Cost
|
57 |
+
The user can try all the functionalities with $0.50 or less.
|
58 |
+
|
59 |
+
# Optional functionalities implemented
|
60 |
+
|
61 |
+
1. Implement streaming responses. β
|
62 |
+
2. There's code for RAG evaluation in the [folder](/scripts/), and the README contains the evaluation results. The folder must also contain the evaluation dataset and the evaluation scripts. β
|
63 |
+
3. The app is designed for a specific goal/domain that is not a tutor about AI. This app is focused on Azure engineering β
|
64 |
+
4. You have shown evidence of collecting at least two data sources beyond those provided in our course. ([Five datasources collected](/scripts/)) β
|
65 |
+
5. Use a reranker in your RAG pipeline. It can be a fine-tuned version (your choice). β
|
66 |
+
6. Use a fine-tuned embedding model in your app. β
|
67 |
+
|
68 |
+
# Example questions
|
69 |
+
- when to use Azure functions vs app service
|
70 |
+
- how do I keep microservices decoupled and independent and achive HA
|
71 |
+
- Use the many-models architecture approach to scale machine learning models
|
app.py
CHANGED
@@ -16,9 +16,11 @@ from llama_index.core.llms import MessageRole
|
|
16 |
from llama_index.core.memory import ChatSummaryMemoryBuffer
|
17 |
from llama_index.core.tools import RetrieverTool, ToolMetadata
|
18 |
from llama_index.agent.openai import OpenAIAgent
|
19 |
-
from llama_index.embeddings.openai import OpenAIEmbedding
|
20 |
from llama_index.llms.openai import OpenAI
|
21 |
from llama_index.core import Settings
|
|
|
|
|
|
|
22 |
|
23 |
load_dotenv()
|
24 |
|
@@ -26,76 +28,137 @@ logger = logging.getLogger(__name__)
|
|
26 |
logging.basicConfig(level=logging.INFO)
|
27 |
logging.getLogger("httpx").setLevel(logging.WARNING)
|
28 |
|
29 |
-
PROMPT_SYSTEM_MESSAGE = """You are an AI
|
30 |
-
Topics
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
by the
|
37 |
-
|
38 |
-
|
39 |
-
Should the tool response lack information on the queried topic, politely inform the user that the question transcends the bounds of your current knowledge base, citing the absence of relevant content in the tool's documentation.
|
40 |
-
At the end of your answers, always invite the students to ask deeper questions about the topic if they have any.
|
41 |
-
Do not refer to the documentation directly, but use the information provided within it to answer questions. If code is provided in the information, share it with the students. It's important to provide complete code blocks so
|
42 |
-
they can execute the code when they copy and paste them. Make sure to format your answers in Markdown format, including code blocks and snippets.
|
43 |
"""
|
44 |
|
45 |
-
|
46 |
-
You must answer only related to AI, ML, Deep Learning and related concepts queries.
|
47 |
-
Always leverage the retrieved documents to answer the questions, don't answer them on your own.
|
48 |
-
If the query is not relevant to AI, say that you don't know the answer.
|
49 |
-
"""
|
50 |
|
51 |
|
52 |
def download_knowledge_base_if_not_exists():
|
53 |
"""Download the knowledge base from the Hugging Face Hub if it doesn't exist locally"""
|
54 |
-
if not os.path.exists("data/
|
55 |
-
os.makedirs("data/
|
56 |
|
57 |
logging.warning(
|
58 |
f"Vector database does not exist at 'data/', downloading from Hugging Face Hub..."
|
59 |
)
|
60 |
snapshot_download(
|
61 |
-
repo_id="
|
62 |
-
local_dir="data/
|
63 |
repo_type="dataset",
|
64 |
)
|
65 |
-
logging.info(f"Downloaded vector database to 'data/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
66 |
|
|
|
|
|
|
|
|
|
|
|
67 |
|
68 |
-
|
|
|
69 |
db = chromadb.PersistentClient(path=f"data/{db_collection}")
|
70 |
chroma_collection = db.get_or_create_collection(db_collection)
|
71 |
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
|
72 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
73 |
index = VectorStoreIndex.from_vector_store(
|
74 |
vector_store=vector_store,
|
75 |
show_progress=True,
|
76 |
use_async=True,
|
77 |
embed_model=Settings.embed_model
|
78 |
)
|
|
|
|
|
|
|
79 |
vector_retriever = VectorIndexRetriever(
|
80 |
index=index,
|
81 |
-
similarity_top_k=
|
82 |
embed_model=Settings.embed_model,
|
83 |
use_async=True,
|
84 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
85 |
tools = [
|
86 |
RetrieverTool(
|
87 |
retriever=vector_retriever,
|
88 |
metadata=ToolMetadata(
|
89 |
-
name="
|
90 |
-
description="Useful for info related to
|
91 |
),
|
|
|
92 |
)
|
93 |
]
|
94 |
return tools
|
95 |
|
96 |
|
97 |
-
def generate_completion(query, history, memory):
|
98 |
-
logging.info(f"User query: {query}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
99 |
|
100 |
# Manage memory
|
101 |
chat_list = memory.get()
|
@@ -109,12 +172,13 @@ def generate_completion(query, history, memory):
|
|
109 |
logging.info(f"gradio_history: {len(history)} {history}")
|
110 |
|
111 |
# Create agent
|
112 |
-
tools = get_tools(db_collection="
|
|
|
113 |
agent = OpenAIAgent.from_tools(
|
114 |
-
llm=
|
115 |
memory=memory,
|
116 |
tools=tools,
|
117 |
-
system_prompt=PROMPT_SYSTEM_MESSAGE
|
118 |
)
|
119 |
|
120 |
# Generate answer
|
@@ -122,15 +186,65 @@ def generate_completion(query, history, memory):
|
|
122 |
answer_str = ""
|
123 |
for token in completion.response_gen:
|
124 |
answer_str += token
|
125 |
-
yield answer_str
|
|
|
|
|
|
|
|
|
|
|
126 |
|
127 |
|
128 |
-
def launch_ui():
|
129 |
with gr.Blocks(
|
130 |
fill_height=True,
|
131 |
-
title="AI
|
132 |
analytics_enabled=True,
|
133 |
) as demo:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
134 |
|
135 |
memory_state = gr.State(
|
136 |
lambda: ChatSummaryMemoryBuffer.from_defaults(
|
@@ -139,7 +253,7 @@ def launch_ui():
|
|
139 |
)
|
140 |
chatbot = gr.Chatbot(
|
141 |
scale=1,
|
142 |
-
placeholder="<strong>AI
|
143 |
show_label=False,
|
144 |
show_copy_button=True,
|
145 |
)
|
@@ -147,7 +261,7 @@ def launch_ui():
|
|
147 |
gr.ChatInterface(
|
148 |
fn=generate_completion,
|
149 |
chatbot=chatbot,
|
150 |
-
additional_inputs=[memory_state],
|
151 |
)
|
152 |
|
153 |
demo.queue(default_concurrency_limit=64)
|
@@ -158,9 +272,24 @@ if __name__ == "__main__":
|
|
158 |
# Download the knowledge base if it doesn't exist
|
159 |
download_knowledge_base_if_not_exists()
|
160 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
161 |
# Set up llm and embedding model
|
162 |
-
Settings.llm = OpenAI(temperature=
|
163 |
-
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
|
164 |
|
165 |
# launch the UI
|
166 |
launch_ui()
|
|
|
16 |
from llama_index.core.memory import ChatSummaryMemoryBuffer
|
17 |
from llama_index.core.tools import RetrieverTool, ToolMetadata
|
18 |
from llama_index.agent.openai import OpenAIAgent
|
|
|
19 |
from llama_index.llms.openai import OpenAI
|
20 |
from llama_index.core import Settings
|
21 |
+
from llama_index.postprocessor.cohere_rerank import CohereRerank
|
22 |
+
from llama_index.core.embeddings import resolve_embed_model
|
23 |
+
from llama_index.embeddings.adapter import AdapterEmbeddingModel
|
24 |
|
25 |
load_dotenv()
|
26 |
|
|
|
28 |
logging.basicConfig(level=logging.INFO)
|
29 |
logging.getLogger("httpx").setLevel(logging.WARNING)
|
30 |
|
31 |
+
PROMPT_SYSTEM_MESSAGE = """You are an AI assistant and expert instructor responding to technical questions from software architects and developers who are working in enterprise software architecture.
|
32 |
+
These users are particularly focused on Microsoft technologies and Azure cloud services. Topics they are exploring include architecture patterns in Azure (serverless, microservices, event-driven systems), Azure services comparison (Functions, App Service, AKS, Logic Apps, etc.), DevOps practices (IaC with Bicep/Terraform, CI/CD with Azure DevOps or GitHub Actions), observability with Application Insights, secure design using Key Vault, identity management with Azure AD and B2C.
|
33 |
+
You should treat each question as part of this context. Your responses should be complete, accurate, and educational β suitable for technical professionals with intermediate to advanced knowledge in cloud architecture and AI application development.
|
34 |
+
To find relevant information for answering questions, always use the "Azure_AI_Knowledge" tool. This tool returns technical documentation, architecture guides, official examples, and troubleshooting data focused on Azure and AI integration.
|
35 |
+
Only part of the tool's output may be relevant to the question β discard the irrelevant sections. Your answer should rely **exclusively** on the content provided by the tool. Do **not** inject external or speculative knowledge. If the user refines their question or focuses on a specific sub-topic, reformulate the tool query to capture that specificity and retrieve deeper information.
|
36 |
+
If a user requests further elaboration on a specific aspect of a previously discussed topic, you should reformulate your input to the tool to capture this new angle or more profound layer of inquiry. Structure your answers in clear sections with multiple paragraphs if needed. If code is returned, include full code blocks in your response (formatted in Markdown) so the user can copy and run them directly.
|
37 |
+
If the tool doesn't return relevant content, inform the user clearly that the topic exceeds the current knowledge base and mention that no relevant documentation was found via the tool.
|
38 |
+
Always close your answers by inviting the user to ask follow-up or deeper questions related to the topic.
|
39 |
+
At the end of the answer, include a line to indicate whether the content was sourced using the tool or not, e.g., "Content sourced using Azure_AI_Knowledge tool" or "No relevant content found in Azure_AI_Knowledge tool".
|
40 |
+
If the question is not related to Azure or Microsoft technologies, politely inform the user that you can only provide information related to Azure and Microsoft technologies.
|
|
|
|
|
|
|
|
|
41 |
"""
|
42 |
|
43 |
+
QA_TEMPLATE = "Answer questions about Azure using 'Azure_AI_Knowledge' tool"
|
|
|
|
|
|
|
|
|
44 |
|
45 |
|
46 |
def download_knowledge_base_if_not_exists():
|
47 |
"""Download the knowledge base from the Hugging Face Hub if it doesn't exist locally"""
|
48 |
+
if not os.path.exists("data/azure-architect"):
|
49 |
+
os.makedirs("data/azure-architect")
|
50 |
|
51 |
logging.warning(
|
52 |
f"Vector database does not exist at 'data/', downloading from Hugging Face Hub..."
|
53 |
)
|
54 |
snapshot_download(
|
55 |
+
repo_id="vicpada/AzureArchitectKnowledgeFull",
|
56 |
+
local_dir="data/azure-architect",
|
57 |
repo_type="dataset",
|
58 |
)
|
59 |
+
logging.info(f"Downloaded vector database to 'data/azure-architect'")
|
60 |
+
|
61 |
+
def download_embeddings_if_not_exists():
|
62 |
+
"""Download the embeddings from the Hugging Face Hub if they don't exist locally"""
|
63 |
+
if not os.path.exists("data/azure-architect-embeddings"):
|
64 |
+
os.makedirs("data/azure-architect-embeddings")
|
65 |
+
|
66 |
+
logging.warning(
|
67 |
+
f"Embeddings do not exist at 'data/', downloading from Hugging Face Hub..."
|
68 |
+
)
|
69 |
+
|
70 |
+
snapshot_download(repo_id="vicpada/finetuned_embed_model_full",
|
71 |
+
repo_type="model",
|
72 |
+
local_dir="./data/azure-architect-embeddings")
|
73 |
+
|
74 |
+
logging.info(f"Downloaded embeddings to 'data/azure-architect-embeddings'")
|
75 |
+
|
76 |
+
def load_embed_model():
|
77 |
+
"""Load the embedding model from the local directory"""
|
78 |
+
|
79 |
+
embed_model_path = "data/azure-architect-embeddings"
|
80 |
+
if not os.path.exists(embed_model_path):
|
81 |
+
logging.error(f"Embedding model path '{embed_model_path}' does not exist.")
|
82 |
+
return None
|
83 |
+
|
84 |
+
# Load the Base model without fine-tuning
|
85 |
+
base_embed_model = resolve_embed_model("local:BAAI/bge-small-en-v1.5")
|
86 |
|
87 |
+
# Load the Fine-tuned model.
|
88 |
+
logging.info(f"Loading embedding model from {embed_model_path}")
|
89 |
+
embed_model = AdapterEmbeddingModel(base_embed_model, embed_model_path)
|
90 |
+
|
91 |
+
return embed_model
|
92 |
|
93 |
+
|
94 |
+
def get_tools(db_collection="azure-architect", cohere_api_key=None):
|
95 |
db = chromadb.PersistentClient(path=f"data/{db_collection}")
|
96 |
chroma_collection = db.get_or_create_collection(db_collection)
|
97 |
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
|
98 |
|
99 |
+
|
100 |
+
logging.info(f"Vector store initialized with {chroma_collection.count()} documents.")
|
101 |
+
|
102 |
+
# Create the vector store index
|
103 |
+
logging.info("Creating vector store index...")
|
104 |
+
|
105 |
+
# Use the vector store to create an index
|
106 |
+
|
107 |
index = VectorStoreIndex.from_vector_store(
|
108 |
vector_store=vector_store,
|
109 |
show_progress=True,
|
110 |
use_async=True,
|
111 |
embed_model=Settings.embed_model
|
112 |
)
|
113 |
+
|
114 |
+
logging.info("Creating vector retriever...")
|
115 |
+
|
116 |
vector_retriever = VectorIndexRetriever(
|
117 |
index=index,
|
118 |
+
similarity_top_k=200,
|
119 |
embed_model=Settings.embed_model,
|
120 |
use_async=True,
|
121 |
+
verbose=True,
|
122 |
+
)
|
123 |
+
|
124 |
+
cohere_rerank3 = CohereRerank(top_n=5, model = 'rerank-english-v3.0', api_key = cohere_api_key)
|
125 |
+
|
126 |
+
logging.info("Creating tool...")
|
127 |
+
|
128 |
tools = [
|
129 |
RetrieverTool(
|
130 |
retriever=vector_retriever,
|
131 |
metadata=ToolMetadata(
|
132 |
+
name="Azure_AI_Knowledge",
|
133 |
+
description="Useful for info related to Azure and microsoft. Best practices, architecture, official documentation, functional use cases and reference architectures and other related resources."
|
134 |
),
|
135 |
+
node_postprocessors=[cohere_rerank3],
|
136 |
)
|
137 |
]
|
138 |
return tools
|
139 |
|
140 |
|
141 |
+
def generate_completion(query, history, memory, openAI_api_key, cohere_api_key):
|
142 |
+
logging.info(f"User query: {query}")
|
143 |
+
logging.info(f"User history: {history}")
|
144 |
+
logging.info(f"User memory: {memory}")
|
145 |
+
|
146 |
+
openAI_api_key = openAI_api_key if openAI_api_key else os.getenv("OPENAI_API_KEY")
|
147 |
+
cohere_api_key = cohere_api_key if cohere_api_key else os.getenv("COHERE_API_KEY")
|
148 |
+
|
149 |
+
# Validate OpenAI API Key
|
150 |
+
if openAI_api_key is None or not openAI_api_key.startswith("sk-"):
|
151 |
+
logging.error("OpenAI API Key is not set or is invalid. Please provide a valid key.")
|
152 |
+
yield "Error: OpenAI API Key is not set or is invalid. Please provide a valid key."
|
153 |
+
return
|
154 |
+
|
155 |
+
llm = OpenAI(temperature=1, model="gpt-4o-mini", api_key=openAI_api_key)
|
156 |
+
|
157 |
+
# Validate Cohere API Key
|
158 |
+
if cohere_api_key is None or not cohere_api_key.strip():
|
159 |
+
logging.error("Cohere API Key is not set or is invalid. Please provide a valid key.")
|
160 |
+
yield "Error: Cohere API Key is not set or is invalid. Please provide a valid key."
|
161 |
+
return
|
162 |
|
163 |
# Manage memory
|
164 |
chat_list = memory.get()
|
|
|
172 |
logging.info(f"gradio_history: {len(history)} {history}")
|
173 |
|
174 |
# Create agent
|
175 |
+
tools = get_tools(db_collection="azure-architect", cohere_api_key = cohere_api_key )
|
176 |
+
|
177 |
agent = OpenAIAgent.from_tools(
|
178 |
+
llm=llm,
|
179 |
memory=memory,
|
180 |
tools=tools,
|
181 |
+
system_prompt=PROMPT_SYSTEM_MESSAGE
|
182 |
)
|
183 |
|
184 |
# Generate answer
|
|
|
186 |
answer_str = ""
|
187 |
for token in completion.response_gen:
|
188 |
answer_str += token
|
189 |
+
yield answer_str
|
190 |
+
|
191 |
+
logging.info(f"Source count: {len(completion.sources)}")
|
192 |
+
logging.info(f"Sources: {completion.sources}")
|
193 |
+
|
194 |
+
def launch_ui():
|
195 |
|
196 |
|
|
|
197 |
with gr.Blocks(
|
198 |
fill_height=True,
|
199 |
+
title="AI Azure Architect π€",
|
200 |
analytics_enabled=True,
|
201 |
) as demo:
|
202 |
+
|
203 |
+
|
204 |
+
openai_key_tb = gr.Textbox(
|
205 |
+
visible=True,
|
206 |
+
label="OpenAI API Key",
|
207 |
+
placeholder="Enter your OpenAI API Key here (e.g., sk-...)",
|
208 |
+
)
|
209 |
+
|
210 |
+
def onOpenAIKeyChange(x):
|
211 |
+
# Validate the OpenAI API Key format
|
212 |
+
if x is None or x.strip() == "":
|
213 |
+
logging.error("OpenAI API Key is empty. Please provide a valid key.")
|
214 |
+
return
|
215 |
+
else:
|
216 |
+
x = x.strip()
|
217 |
+
if not x.startswith("sk-"):
|
218 |
+
logging.error("Invalid OpenAI API Key format. It should start with 'sk-'")
|
219 |
+
return
|
220 |
+
|
221 |
+
logging.info(f"OpenAI API Key set: {x is not None}")
|
222 |
+
|
223 |
+
openai_key_tb.change(
|
224 |
+
lambda x: onOpenAIKeyChange(x),
|
225 |
+
inputs=openai_key_tb,
|
226 |
+
outputs=None,
|
227 |
+
)
|
228 |
+
|
229 |
+
cohere_key_tb = gr.Textbox(
|
230 |
+
visible=True,
|
231 |
+
label="Cohere API Key",
|
232 |
+
placeholder="Enter your Cohere API Key here",
|
233 |
+
)
|
234 |
+
|
235 |
+
def onCohereKeyChange(x):
|
236 |
+
# Validate the Cohere API Key format
|
237 |
+
if x is None or x.strip() == "":
|
238 |
+
logging.error("Cohere API Key is empty. Please provide a valid key.")
|
239 |
+
return
|
240 |
+
|
241 |
+
logging.info(f"Cohere API Key set: {x is not None}")
|
242 |
+
|
243 |
+
cohere_key_tb.change(
|
244 |
+
onCohereKeyChange,
|
245 |
+
inputs=cohere_key_tb,
|
246 |
+
outputs=None,
|
247 |
+
)
|
248 |
|
249 |
memory_state = gr.State(
|
250 |
lambda: ChatSummaryMemoryBuffer.from_defaults(
|
|
|
253 |
)
|
254 |
chatbot = gr.Chatbot(
|
255 |
scale=1,
|
256 |
+
placeholder="<strong>Azure AI Architect π€: A Question-Answering Bot for anything Azure related</strong><br>",
|
257 |
show_label=False,
|
258 |
show_copy_button=True,
|
259 |
)
|
|
|
261 |
gr.ChatInterface(
|
262 |
fn=generate_completion,
|
263 |
chatbot=chatbot,
|
264 |
+
additional_inputs=[memory_state, openai_key_tb, cohere_key_tb],
|
265 |
)
|
266 |
|
267 |
demo.queue(default_concurrency_limit=64)
|
|
|
272 |
# Download the knowledge base if it doesn't exist
|
273 |
download_knowledge_base_if_not_exists()
|
274 |
|
275 |
+
# Download the embeddings if they don't exist
|
276 |
+
download_embeddings_if_not_exists()
|
277 |
+
|
278 |
+
# Set the GPU usage based on the environment variable
|
279 |
+
Settings.use_gpu = os.getenv("USE_GPU", "1") == "1"
|
280 |
+
if Settings.use_gpu:
|
281 |
+
logging.info("Using GPU for inference.")
|
282 |
+
else:
|
283 |
+
logging.info("Using CPU for inference.")
|
284 |
+
|
285 |
+
# Load the embedding model
|
286 |
+
Settings.embed_model = load_embed_model()
|
287 |
+
if Settings.embed_model is None:
|
288 |
+
logging.error("Embedding model could not be loaded. Exiting.")
|
289 |
+
exit(1)
|
290 |
+
|
291 |
# Set up llm and embedding model
|
292 |
+
# Settings.llm = OpenAI(temperature=1, model="gpt-4o-mini")
|
|
|
293 |
|
294 |
# launch the UI
|
295 |
launch_ui()
|
data/.gitkeep
DELETED
File without changes
|
requirements.txt
CHANGED
Binary files a/requirements.txt and b/requirements.txt differ
|
|
scripts/00_DataCollection.md
ADDED
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
This file provides information on Data Curation:
|
2 |
+
|
3 |
+
1. [Create dataset](01_Create_Dataset.ipynb)<br />
|
4 |
+
<em>This step crawls and scrapes data sources</em><br />
|
5 |
+
A jupiter notebook with steps to scrap data from five different [Azure Resources sites](https://docs.google.com/spreadsheets/d/1b_QcHNPBg34Q05FPsmRqzPW5XTUGmedaP_SLonyiMt4)
|
6 |
+
It uses [Firecrawl](https://www.firecrawl.dev/) for the scrapping. The API kept failing and I end up scheduling and downloading the crawling from the site directly.
|
7 |
+
The files were uploaded into HF in `vicpada/AzureResources` into a zip file.
|
8 |
+
|
9 |
+
2. [Process files](02_Process_files.ipynb)<br />
|
10 |
+
<em>This step builds prepares JSONL</em><br />
|
11 |
+
A jupiter notebook with the steps to download the zip file create a JSONL files. One JSONL file for each site. For each entry from [Firecrawl](https://www.firecrawl.dev/) extracts a title, cleans the content, adds the token count, skips very small or extremely large files and generates a deterministic id so we can use for evaluation. Also, in the next step we will include the whole doc as context for those documents with token count less than 8000.
|
12 |
+
|
13 |
+
3. [Add context](03_Add_context.ipynb)<br />
|
14 |
+
<em>This step creates the chunks, adds context, and saves them into PKL files</em><br />
|
15 |
+
A jupiter notebook to download the JSONL files and build the documents for our RAG application.
|
16 |
+
It uses [SentenceSplitter](https://docs.llamaindex.ai/en/stable/api_reference/node_parsers/sentence_splitter/) with a <b>chunk size of 800</b> and <b>0 overlap</b>.
|
17 |
+
It also adds some context to each chunk to situate it within the overall document for the purposes of improving search retrieval of the chunk.
|
18 |
+
For each JSONL file a PKL file is created and uploaded to HF.
|
19 |
+
It uses `gpt-4.1-nano` for situating the chunk in the context for cost saving purposes and to finish processing in a timely manner.
|
20 |
+
Also one site from the datasources was excluded as it was getting very expensive to process all.
|
21 |
+
|
22 |
+
4. [Finetune embedding](04_Finetune_Embedding.ipynb)<br />
|
23 |
+
<em>This step finetunes the embedding model</em><br />
|
24 |
+
A jupiter notebook with steps to download the PKL files and train an embedding model. Only 10,000 chunks were used for time and cost saving purposes as there were more than 400,000 nodes. Using [`generate_qa_embedding_pairs`](https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings/) for generating training and validation data.
|
25 |
+
The base model for the embedding is `local:BAAI/bge-small-en-v1.5`. <b>Hit Rate & MRR</b> were used for evaluation. The model is uploaded to HF.
|
26 |
+
Note: Finetuning and using the embedding model locally is a much cheaper way than using a online service. When dealing with a big number of nodes, and a big number of queries it does a big difference at the time of processing the nodes.
|
27 |
+
|
28 |
+
|
29 |
+
5. [Create test data](05_Create_Test_Data.ipynb)<br />
|
30 |
+
<em>Create Test data for evaluation of the model</em><br />
|
31 |
+
A jupiter notebook with steps to download the embedding model, the PKL files and use [`generate_question_context_pairs`](https://docs.llamaindex.ai/en/stable/examples/evaluation/QuestionGeneration/) to generate evaluation data. The json generated file is then uploaded to HF. It uses `Gemini-2.0-flash` to generate the eval data.
|
32 |
+
|
33 |
+
|
34 |
+
6. [Create Vector](06_Create_Vector.ipynb)<br />
|
35 |
+
<em>Create VectorDB and evaluate</em><br />
|
36 |
+
A jupiter notebook with steps to generate the VectorDB using the local embedding model.
|
37 |
+
It calculates "hit_rate", "mrr", "precision", "recall", "ap" and "ndcg" metrics
|
38 |
+
The vector is saved as a PKL file and uploaded to HF.
|
39 |
+
|
40 |
+
7. [Reranking](07_Reranking.ipynb)<br />
|
41 |
+
<em>Evaluate Cohere Rerank</em><br />
|
42 |
+
A jupiter notebook with steps to download the VectorDB, the embedding model and the evaluation dataset.
|
43 |
+
Then it evaluates the result with and without cohere. Setting the path for the final solution
|
44 |
+
|
scripts/01_Create_Dataset.ipynb
ADDED
@@ -0,0 +1,862 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "markdown",
|
5 |
+
"metadata": {
|
6 |
+
"id": "V3bxTMy4MTcO"
|
7 |
+
},
|
8 |
+
"source": [
|
9 |
+
"#Create Corpus\n",
|
10 |
+
"\n",
|
11 |
+
"[CSV file with Azure & Microsoft links created](https://docs.google.com/spreadsheets/d/1b_QcHNPBg34Q05FPsmRqzPW5XTUGmedaP_SLonyiMt4/edit?usp=sharing)\n",
|
12 |
+
"\n",
|
13 |
+
"Columns:\n",
|
14 |
+
"- Name\n",
|
15 |
+
"- URL\n",
|
16 |
+
"- Content type"
|
17 |
+
]
|
18 |
+
},
|
19 |
+
{
|
20 |
+
"cell_type": "code",
|
21 |
+
"execution_count": null,
|
22 |
+
"metadata": {
|
23 |
+
"colab": {
|
24 |
+
"base_uri": "https://localhost:8080/"
|
25 |
+
},
|
26 |
+
"id": "Y0DKd30YMxAw",
|
27 |
+
"outputId": "92cf65ad-bd80-4230-e6cf-0087000816ea"
|
28 |
+
},
|
29 |
+
"outputs": [
|
30 |
+
{
|
31 |
+
"name": "stdout",
|
32 |
+
"output_type": "stream",
|
33 |
+
"text": [
|
34 |
+
"\u001b[?25l \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m0.0/56.5 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m56.5/56.5 kB\u001b[0m \u001b[31m3.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
35 |
+
"\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
|
36 |
+
" Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
|
37 |
+
" Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
|
38 |
+
"\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m7.4/7.4 MB\u001b[0m \u001b[31m77.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
39 |
+
"\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
|
40 |
+
" Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
|
41 |
+
" Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
|
42 |
+
"\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m454.8/454.8 kB\u001b[0m \u001b[31m27.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
43 |
+
"\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m1.2/1.2 MB\u001b[0m \u001b[31m45.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
44 |
+
"\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m80.2/80.2 kB\u001b[0m \u001b[31m5.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
45 |
+
"\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m71.2/71.2 kB\u001b[0m \u001b[31m5.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
46 |
+
"\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m7.7/7.7 MB\u001b[0m \u001b[31m77.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
47 |
+
"\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m266.8/266.8 kB\u001b[0m \u001b[31m17.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
48 |
+
"\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m41.0/41.0 kB\u001b[0m \u001b[31m2.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
49 |
+
"\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m211.1/211.1 kB\u001b[0m \u001b[31m13.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
50 |
+
"\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m45.1/45.1 MB\u001b[0m \u001b[31m17.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
51 |
+
"\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m9.4/9.4 MB\u001b[0m \u001b[31m80.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
52 |
+
"\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m45.8/45.8 kB\u001b[0m \u001b[31m3.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
53 |
+
"\u001b[2K \u001b[90mββββββββββββββββββββοΏ½οΏ½βββββββββββββββββββ\u001b[0m \u001b[32m81.3/81.3 kB\u001b[0m \u001b[31m7.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
54 |
+
"\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m304.2/304.2 kB\u001b[0m \u001b[31m22.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
55 |
+
"\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m107.4/107.4 kB\u001b[0m \u001b[31m8.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
56 |
+
"\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m499.2/499.2 kB\u001b[0m \u001b[31m32.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
57 |
+
"\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m50.9/50.9 kB\u001b[0m \u001b[31m3.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
58 |
+
"\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m129.3/129.3 kB\u001b[0m \u001b[31m11.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
|
59 |
+
"\u001b[?25h Building wheel for html2text (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
|
60 |
+
" Building wheel for tinysegmenter (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
|
61 |
+
" Building wheel for spider-client (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
|
62 |
+
" Building wheel for feedfinder2 (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
|
63 |
+
" Building wheel for jieba3k (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
|
64 |
+
" Building wheel for sgmllib3k (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
|
65 |
+
"\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
|
66 |
+
"torch 2.6.0+cu124 requires nvidia-cublas-cu12==12.4.5.8; platform_system == \"Linux\" and platform_machine == \"x86_64\", but you have nvidia-cublas-cu12 12.5.3.2 which is incompatible.\n",
|
67 |
+
"torch 2.6.0+cu124 requires nvidia-cuda-cupti-cu12==12.4.127; platform_system == \"Linux\" and platform_machine == \"x86_64\", but you have nvidia-cuda-cupti-cu12 12.5.82 which is incompatible.\n",
|
68 |
+
"torch 2.6.0+cu124 requires nvidia-cuda-nvrtc-cu12==12.4.127; platform_system == \"Linux\" and platform_machine == \"x86_64\", but you have nvidia-cuda-nvrtc-cu12 12.5.82 which is incompatible.\n",
|
69 |
+
"torch 2.6.0+cu124 requires nvidia-cuda-runtime-cu12==12.4.127; platform_system == \"Linux\" and platform_machine == \"x86_64\", but you have nvidia-cuda-runtime-cu12 12.5.82 which is incompatible.\n",
|
70 |
+
"torch 2.6.0+cu124 requires nvidia-cudnn-cu12==9.1.0.70; platform_system == \"Linux\" and platform_machine == \"x86_64\", but you have nvidia-cudnn-cu12 9.3.0.75 which is incompatible.\n",
|
71 |
+
"torch 2.6.0+cu124 requires nvidia-cufft-cu12==11.2.1.3; platform_system == \"Linux\" and platform_machine == \"x86_64\", but you have nvidia-cufft-cu12 11.2.3.61 which is incompatible.\n",
|
72 |
+
"torch 2.6.0+cu124 requires nvidia-curand-cu12==10.3.5.147; platform_system == \"Linux\" and platform_machine == \"x86_64\", but you have nvidia-curand-cu12 10.3.6.82 which is incompatible.\n",
|
73 |
+
"torch 2.6.0+cu124 requires nvidia-cusolver-cu12==11.6.1.9; platform_system == \"Linux\" and platform_machine == \"x86_64\", but you have nvidia-cusolver-cu12 11.6.3.83 which is incompatible.\n",
|
74 |
+
"torch 2.6.0+cu124 requires nvidia-cusparse-cu12==12.3.1.170; platform_system == \"Linux\" and platform_machine == \"x86_64\", but you have nvidia-cusparse-cu12 12.5.1.3 which is incompatible.\n",
|
75 |
+
"torch 2.6.0+cu124 requires nvidia-nvjitlink-cu12==12.4.127; platform_system == \"Linux\" and platform_machine == \"x86_64\", but you have nvidia-nvjitlink-cu12 12.5.82 which is incompatible.\u001b[0m\u001b[31m\n",
|
76 |
+
"\u001b[0m"
|
77 |
+
]
|
78 |
+
}
|
79 |
+
],
|
80 |
+
"source": [
|
81 |
+
"# Install requirements\n",
|
82 |
+
"\n",
|
83 |
+
"!pip install -q llama-index==0.12.12 openai==1.59.6 tiktoken==0.8.0 llama-index-readers-web==0.3.4 firecrawl-py==2.7.1"
|
84 |
+
]
|
85 |
+
},
|
86 |
+
{
|
87 |
+
"cell_type": "code",
|
88 |
+
"execution_count": null,
|
89 |
+
"metadata": {
|
90 |
+
"id": "MdEa-LKlNef8"
|
91 |
+
},
|
92 |
+
"outputs": [],
|
93 |
+
"source": [
|
94 |
+
"# set variables\n",
|
95 |
+
"import os\n",
|
96 |
+
"\n",
|
97 |
+
"from google.colab import userdata\n",
|
98 |
+
"\n",
|
99 |
+
"os.environ[\"OPENAI_API_KEY\"] = userdata.get('openai_api_key')\n",
|
100 |
+
"os.environ[\"FIRECRAWL_API_KEY\"] = userdata.get('FIRECRAWL_API_KEY')\n",
|
101 |
+
"os.environ[\"HF_TOKEN\"] = userdata.get('HF_TOKEN2')\n",
|
102 |
+
"\n",
|
103 |
+
"FIRECRAWL_API_KEY = userdata.get('FIRECRAWL_API_KEY')\n",
|
104 |
+
"HF_TOKEN = userdata.get('HF_TOKEN2')"
|
105 |
+
]
|
106 |
+
},
|
107 |
+
{
|
108 |
+
"cell_type": "code",
|
109 |
+
"execution_count": null,
|
110 |
+
"metadata": {
|
111 |
+
"id": "5r73CntzOAc4"
|
112 |
+
},
|
113 |
+
"outputs": [],
|
114 |
+
"source": [
|
115 |
+
"# Download dataset list\n",
|
116 |
+
"\n",
|
117 |
+
"import requests\n",
|
118 |
+
"import csv\n",
|
119 |
+
"\n",
|
120 |
+
"# Google Sheets file URL (CSV export link)\n",
|
121 |
+
"url = 'https://docs.google.com/spreadsheets/d/1b_QcHNPBg34Q05FPsmRqzPW5XTUGmedaP_SLonyiMt4/export?format=csv'\n",
|
122 |
+
"\n",
|
123 |
+
"# Send a GET request to fetch the CSV file\n",
|
124 |
+
"response = requests.get(url)\n",
|
125 |
+
"\n",
|
126 |
+
"response_list = []\n",
|
127 |
+
"\n",
|
128 |
+
"# Check if the request was successful\n",
|
129 |
+
"if response.status_code == 200:\n",
|
130 |
+
" # Decode the content to a string\n",
|
131 |
+
" content = response.content.decode('utf-8')\n",
|
132 |
+
"\n",
|
133 |
+
" # Use the csv.DictReader to read the content as a dictionary\n",
|
134 |
+
" csv_reader = csv.DictReader(content.splitlines(), delimiter=',')\n",
|
135 |
+
" response_list = [row for row in csv_reader]\n",
|
136 |
+
"else:\n",
|
137 |
+
" print(f\"Failed to retrieve the file: {response.status_code}\")\n"
|
138 |
+
]
|
139 |
+
},
|
140 |
+
{
|
141 |
+
"cell_type": "code",
|
142 |
+
"execution_count": null,
|
143 |
+
"metadata": {
|
144 |
+
"colab": {
|
145 |
+
"base_uri": "https://localhost:8080/"
|
146 |
+
},
|
147 |
+
"id": "wi6LzQXkgPdM",
|
148 |
+
"outputId": "099faa6e-baeb-4846-f61e-fb06da3702bc"
|
149 |
+
},
|
150 |
+
"outputs": [
|
151 |
+
{
|
152 |
+
"name": "stdout",
|
153 |
+
"output_type": "stream",
|
154 |
+
"text": [
|
155 |
+
"CSV data\n",
|
156 |
+
"[{'Content type': 'Technical blogs and expert solutions',\n",
|
157 |
+
" 'Exclude': '',\n",
|
158 |
+
" 'Name': 'Tech Community',\n",
|
159 |
+
" 'URL': 'https://techcommunity.microsoft.com/'}]\n"
|
160 |
+
]
|
161 |
+
}
|
162 |
+
],
|
163 |
+
"source": [
|
164 |
+
"import pprint\n",
|
165 |
+
"print(\"CSV data\")\n",
|
166 |
+
"pprint.pprint(response_list[2:3])"
|
167 |
+
]
|
168 |
+
},
|
169 |
+
{
|
170 |
+
"cell_type": "code",
|
171 |
+
"execution_count": null,
|
172 |
+
"metadata": {
|
173 |
+
"id": "GVVsd8O5gbmZ"
|
174 |
+
},
|
175 |
+
"outputs": [],
|
176 |
+
"source": [
|
177 |
+
"# Initialise Firecrawl\n",
|
178 |
+
"import os\n",
|
179 |
+
"from firecrawl import FirecrawlApp\n",
|
180 |
+
"\n",
|
181 |
+
"app = FirecrawlApp(api_key=FIRECRAWL_API_KEY)\n",
|
182 |
+
"\n"
|
183 |
+
]
|
184 |
+
},
|
185 |
+
{
|
186 |
+
"cell_type": "code",
|
187 |
+
"execution_count": null,
|
188 |
+
"metadata": {
|
189 |
+
"colab": {
|
190 |
+
"base_uri": "https://localhost:8080/",
|
191 |
+
"height": 211
|
192 |
+
},
|
193 |
+
"id": "Z6FHM0s_g4cu",
|
194 |
+
"outputId": "56ee85c4-5341-473d-fb33-b303de645e86"
|
195 |
+
},
|
196 |
+
"outputs": [
|
197 |
+
{
|
198 |
+
"ename": "NameError",
|
199 |
+
"evalue": "name 'response_list' is not defined",
|
200 |
+
"output_type": "error",
|
201 |
+
"traceback": [
|
202 |
+
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
203 |
+
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
|
204 |
+
"\u001b[0;32m<ipython-input-4-2630878180>\u001b[0m in \u001b[0;36m<cell line: 0>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0mscraped_pages\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 12\u001b[0;31m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mwebsite_dict\u001b[0m \u001b[0;32min\u001b[0m \u001b[0menumerate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresponse_list\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 13\u001b[0m \u001b[0murl\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mwebsite_dict\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'URL'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 14\u001b[0m \u001b[0mexcludePaths\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mwebsite_dict\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Exclude'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
205 |
+
"\u001b[0;31mNameError\u001b[0m: name 'response_list' is not defined"
|
206 |
+
]
|
207 |
+
}
|
208 |
+
],
|
209 |
+
"source": [
|
210 |
+
"import time\n",
|
211 |
+
"from firecrawl import ScrapeOptions\n",
|
212 |
+
"\n",
|
213 |
+
"# Crawl websites and handle responses\n",
|
214 |
+
"url_response = {}\n",
|
215 |
+
"crawl_per_min = 15 # Max crawl per minute\n",
|
216 |
+
"\n",
|
217 |
+
"# Track crawls\n",
|
218 |
+
"crawled_websites = 0\n",
|
219 |
+
"scraped_pages = 0\n",
|
220 |
+
"\n",
|
221 |
+
"for i, website_dict in enumerate(response_list[2:3]):\n",
|
222 |
+
" url = website_dict.get('URL')\n",
|
223 |
+
" excludePaths = website_dict.get('Exclude')\n",
|
224 |
+
" print(f\"Crawling: {url}\")\n",
|
225 |
+
"\n",
|
226 |
+
" try:\n",
|
227 |
+
" response = app.crawl_url(\n",
|
228 |
+
" url,\n",
|
229 |
+
" limit=15000,\n",
|
230 |
+
" scrape_options=ScrapeOptions(formats=['markdown']),\n",
|
231 |
+
" exclude_paths=[excludePaths]\n",
|
232 |
+
" )\n",
|
233 |
+
"\n",
|
234 |
+
" #'includePaths': [r\"^https:\\/\\/[^\\/]+(?:\\/en-US(?:\\/.*)?|(\\/(?![a-z]{2}-[A-Z]{2})([^\\/]+)?(?:\\/[^\\/]*)*)?)$\"],\n",
|
235 |
+
" crawled_websites += 1\n",
|
236 |
+
"\n",
|
237 |
+
"\n",
|
238 |
+
" except Exception as exc:\n",
|
239 |
+
" print(f\"Failed to fetch {url} -> {exc}\")\n",
|
240 |
+
" continue\n",
|
241 |
+
"\n",
|
242 |
+
" # Store the scraped data and associated info in the response dict\n",
|
243 |
+
" url_response[url] = {\n",
|
244 |
+
" \"scraped_data\": response.data,\n",
|
245 |
+
" \"csv_data\": website_dict\n",
|
246 |
+
" }\n",
|
247 |
+
"\n",
|
248 |
+
" # Pause to comply with crawl per minute limit for free version its 1 crawl per minute\n",
|
249 |
+
" if i!=len(response_list) and (i + 1) % crawl_per_min == 0:\n",
|
250 |
+
" print(\"Pausing for 1 minute to comply with crawl limit...\")\n",
|
251 |
+
" time.sleep(60) # Pause for 1 minute after every crawl"
|
252 |
+
]
|
253 |
+
},
|
254 |
+
{
|
255 |
+
"cell_type": "code",
|
256 |
+
"execution_count": null,
|
257 |
+
"metadata": {
|
258 |
+
"id": "Gq4EiKzVx2hg"
|
259 |
+
},
|
260 |
+
"outputs": [],
|
261 |
+
"source": [
|
262 |
+
"# Initialise HG\n",
|
263 |
+
"\n",
|
264 |
+
"from huggingface_hub import HfApi\n",
|
265 |
+
"\n",
|
266 |
+
"api = HfApi(token=HF_TOKEN)\n",
|
267 |
+
"\n"
|
268 |
+
]
|
269 |
+
},
|
270 |
+
{
|
271 |
+
"cell_type": "code",
|
272 |
+
"execution_count": null,
|
273 |
+
"metadata": {
|
274 |
+
"id": "0qAM_aeni0A3"
|
275 |
+
},
|
276 |
+
"outputs": [],
|
277 |
+
"source": [
|
278 |
+
"from llama_index.core import Document\n",
|
279 |
+
"documents = []\n",
|
280 |
+
"\n",
|
281 |
+
"for _, scraped_content in url_response.items():\n",
|
282 |
+
" csv_data = scraped_content.get(\"csv_data\")\n",
|
283 |
+
" scraped_results = scraped_content.get(\"scraped_data\")\n",
|
284 |
+
"\n",
|
285 |
+
"\n",
|
286 |
+
" for result in scraped_results:\n",
|
287 |
+
" markdown_content = result.markdown\n",
|
288 |
+
" title = result.metadata.get(\"title\")\n",
|
289 |
+
" url = result.metadata.get(\"sourceURL\")\n",
|
290 |
+
" documents.append(\n",
|
291 |
+
" Document(\n",
|
292 |
+
" text=markdown_content,\n",
|
293 |
+
" metadata={\n",
|
294 |
+
" \"title\": title,\n",
|
295 |
+
" \"url\": url,\n",
|
296 |
+
" \"name\": csv_data.get(\"Name\"),\n",
|
297 |
+
" \"contentType\": csv_data.get(\"Content type\")\n",
|
298 |
+
" }\n",
|
299 |
+
" )\n",
|
300 |
+
" )"
|
301 |
+
]
|
302 |
+
},
|
303 |
+
{
|
304 |
+
"cell_type": "code",
|
305 |
+
"execution_count": null,
|
306 |
+
"metadata": {
|
307 |
+
"colab": {
|
308 |
+
"base_uri": "https://localhost:8080/"
|
309 |
+
},
|
310 |
+
"id": "AqdCTtzqqxnX",
|
311 |
+
"outputId": "1312691a-abe4-498d-ea37-64e25951f1fb"
|
312 |
+
},
|
313 |
+
"outputs": [
|
314 |
+
{
|
315 |
+
"name": "stdout",
|
316 |
+
"output_type": "stream",
|
317 |
+
"text": [
|
318 |
+
"Documents\n",
|
319 |
+
"1\n",
|
320 |
+
"[Document(id_='59766750-ed2d-4359-98d0-a8024f5b7c3b', embedding=None, metadata={'title': 'Azure updates | Microsoft Azure', 'url': 'https://azure.microsoft.com/en-us/updates/', 'name': 'Azure Blog / Updates', 'contentType': 'Platform changes, new features'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, metadata_template='{key}: {value}', metadata_separator='\\n', text_resource=MediaResource(embeddings=None, data=None, text='Trace Id is missing\\n\\n[Skip to main content](javascript:void(0))\\n\\n\\n\\n# Azure Updates\\n\\nGet the latest updates on Azure products and features to meet your cloud investment needs. Subscribe to notifications to stay informed.\\n\\n[Get the user guide](https://go.microsoft.com/fwlink/?linkid=2289891&clcid=0x409)[Subscribe via RSS](https://www.microsoft.com/releasecommunications/api/v2/azure/rss)\\n\\nFilter(8768 updates)\\n\\nSave filters\\n\\nClear filters\\n\\nApply filters\\n\\nFilter(8768 updates)\\n\\nFilter\\n\\nSort\\n\\n- RECENTLY MODIFIED\\n\\n- Newest to oldest\\n\\n- Oldest to newest\\n\\n\\nLoading...\\n\\nFilter\\n\\nProducts\\n\\nSEARCH PRODUCTS\\n\\nNo matches found.\\n\\n[Expand all](https://azure.microsoft.com/en-us/updates/#) [Collapse all](https://azure.microsoft.com/en-us/updates/#)\\n\\n0 Selected\\n\\nClear All\\n\\n- AI + machine learning [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nAI Anomaly Detector\\n\\n\\n\\n\\n\\n\\n\\nAzure AI Bot Service\\n\\n\\n\\n\\n\\n\\n\\nAzure AI Content Safety\\n\\n\\n\\n\\n\\n\\n\\nAzure AI Custom Vision\\n\\n\\n\\n\\n\\n\\n\\nAzure AI Foundry\\n\\n\\n\\n\\n\\n\\n\\nAzure AI Language\\n\\n\\n\\n\\n\\n\\n\\nAzure AI Personalizer\\n\\n\\n\\n\\n\\n\\n\\nAzure AI Search\\n\\n\\n\\n\\n\\n\\n\\nAzure AI Services\\n\\n\\n\\n\\n\\n\\n\\nAzure AI Translator\\n\\n\\n\\n\\n\\n\\n\\nAzure AI Video Indexer\\n\\n\\n\\n\\n\\n\\n\\nAzure Databricks\\n\\n\\n\\n\\n\\n\\n\\nAzure Machine Learning\\n\\n\\n\\n\\n\\n\\n\\nAzure Open Datasets\\n\\n\\n\\n\\n\\n\\n\\nAzure OpenAI Service\\n\\n\\n\\n\\n\\n\\n\\nData Science Virtual Machines\\n\\n\\n\\n\\n\\n\\n\\nHealth Bot\\n\\n\\n\\n\\n\\n\\n\\nLanguage Understanding (LUIS)\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Copilot for Security\\n\\n\\n\\n\\n\\n\\n\\nQnA Maker\\n\\n\\n\\n\\n\\n\\n\\nSpeaker recognition\\n\\n\\n\\n\\n\\n\\n\\nSpeech to text\\n\\n\\n\\n\\n\\n\\n\\nSpeech translation\\n\\n\\n\\n\\n\\n\\n\\nText to speech\\n\\n- Analytics [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nAzure Analysis Services\\n\\n\\n\\n\\n\\n\\n\\nAzure Chaos Studio\\n\\n\\n\\n\\n\\n\\n\\nAzure Data Explorer\\n\\n\\n\\n\\n\\n\\n\\nAzure Data Factory\\n\\n\\n\\n\\n\\n\\n\\nAzure Data Lake Storage\\n\\n\\n\\n\\n\\n\\n\\nAzure Data Share\\n\\n\\n\\n\\n\\n\\n\\nAzure Databricks\\n\\n\\n\\n\\n\\n\\n\\nAzure HDInsight on Azure Kubernetes Service (AKS)\\n\\n\\n\\n\\n\\n\\n\\nAzure Stream Analytics\\n\\n\\n\\n\\n\\n\\n\\nAzure Synapse Analytics\\n\\n\\n\\n\\n\\n\\n\\nData Catalog\\n\\n\\n\\n\\n\\n\\n\\nData Lake Analytics\\n\\n\\n\\n\\n\\n\\n\\nEvent Hubs\\n\\n\\n\\n\\n\\n\\n\\nHDInsight\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Fabric\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Purview\\n\\n\\n\\n\\n\\n\\n\\nPower BI Embedded\\n\\n- Compute [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nApp Service\\n\\n\\n\\n\\n\\n\\n\\nAzure Compute Fleet\\n\\n\\n\\n\\n\\n\\n\\nAzure Container Instances\\n\\n\\n\\n\\n\\n\\n\\nAzure CycleCloud\\n\\n\\n\\n\\n\\n\\n\\nAzure Dedicated Host\\n\\n\\n\\n\\n\\n\\n\\nAzure Functions\\n\\n\\n\\n\\n\\n\\n\\nAzure Kubernetes Service (AKS)\\n\\n\\n\\n\\n\\n\\n\\nAzure Modeling and Simulation Workbench\\n\\n\\n\\n\\n\\n\\n\\nAzure Quantum\\n\\n\\n\\n\\n\\n\\n\\nAzure Service Fabric\\n\\n\\n\\n\\n\\n\\n\\nAzure Spot Virtual Machines\\n\\n\\n\\n\\n\\n\\n\\nAzure Spring Apps\\n\\n\\n\\n\\n\\n\\n\\nAzure Virtual Desktop\\n\\n\\n\\n\\n\\n\\n\\nAzure VM Image Builder\\n\\n\\n\\n\\n\\n\\n\\nAzure VMware Solution\\n\\n\\n\\n\\n\\n\\n\\nBatch\\n\\n\\n\\n\\n\\n\\n\\nCloud Services\\n\\n\\n\\n\\n\\n\\n\\nLinux Virtual Machines\\n\\n\\n\\n\\n\\n\\n\\nSQL Server on Azure Virtual Machines\\n\\n\\n\\n\\n\\n\\n\\nStatic Web Apps\\n\\n\\n\\n\\n\\n\\n\\nVirtual Machine Scale Sets\\n\\n\\n\\n\\n\\n\\n\\nVirtual Machines\\n\\n\\n\\n\\n\\n\\n\\nWindows Virtual Machines\\n\\n- Containers [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nApp Configuration\\n\\n\\n\\n\\n\\n\\n\\nAzure Container Apps\\n\\n\\n\\n\\n\\n\\n\\nAzure Container Instances\\n\\n\\n\\n\\n\\n\\n\\nAzure Container Registry\\n\\n\\n\\n\\n\\n\\n\\nAzure Container Storage\\n\\n\\n\\n\\n\\n\\n\\nAzure Functions\\n\\n\\n\\n\\n\\n\\n\\nAzure Kubernetes Fleet Manager\\n\\n\\n\\n\\n\\n\\n\\nAzure Kubernetes Service (AKS)\\n\\n\\n\\n\\n\\n\\n\\nAzure Red Hat OpenShift\\n\\n\\n\\n\\n\\n\\n\\nAzure Service Fabric\\n\\n\\n\\n\\n\\n\\n\\nWeb App for Containers\\n\\n\\n\\n\\n\\n\\n\\nWeb Apps for Containers\\n\\n- Databases [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nAzure Cache for Redis\\n\\n\\n\\n\\n\\n\\n\\nAzure confidential ledger\\n\\n\\n\\n\\n\\n\\n\\nAzure Cosmos DB\\n\\n\\n\\n\\n\\n\\n\\nAzure Data Factory\\n\\n\\n\\n\\n\\n\\n\\nAzure Database for MariaDB\\n\\n\\n\\n\\n\\n\\n\\nAzure Database for MySQL\\n\\n\\n\\n\\n\\n\\n\\nAzure Database for PostgreSQL\\n\\n\\n\\n\\n\\n\\n\\nAzure Database Migration Service\\n\\n\\n\\n\\n\\n\\n\\nAzure Managed Instance for Apache Cassandra\\n\\n\\n\\n\\n\\n\\n\\nAzure SQL\\n\\n\\n\\n\\n\\n\\n\\nAzure SQL Database\\n\\n\\n\\n\\n\\n\\n\\nAzure SQL Edge\\n\\n\\n\\n\\n\\n\\n\\nAzure SQL Managed Instance\\n\\n\\n\\n\\n\\n\\n\\nSQL Server on Azure Virtual Machines\\n\\n- Developer tools [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nApp Configuration\\n\\n\\n\\n\\n\\n\\n\\nAzure Deployment Environments\\n\\n\\n\\n\\n\\n\\n\\nAzure DevOps\\n\\n\\n\\n\\n\\n\\n\\nAzure DevTest Labs\\n\\n\\n\\n\\n\\n\\n\\nAzure Load Testing\\n\\n\\n\\n\\n\\n\\n\\nAzure Pipelines\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Dev Box\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Playwright Testing\\n\\n\\n\\n\\n\\n\\n\\nSDKs\\n\\n\\n\\n\\n\\n\\n\\nVisual Studio\\n\\n\\n\\n\\n\\n\\n\\nVisual Studio Code\\n\\n- DevOps [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nAzure Boards\\n\\n\\n\\n\\n\\n\\n\\nAzure Deployment Environments\\n\\n\\n\\n\\n\\n\\n\\nAzure DevOps\\n\\n\\n\\n\\n\\n\\n\\nAzure DevTest Labs\\n\\n\\n\\n\\n\\n\\n\\nAzure Load Testing\\n\\n\\n\\n\\n\\n\\n\\nAzure Managed Grafana\\n\\n\\n\\n\\n\\n\\n\\nAzure Monitor\\n\\n\\n\\n\\n\\n\\n\\nAzure Pipelines\\n\\n\\n\\n\\n\\n\\n\\nAzure Repos\\n\\n\\n\\n\\n\\n\\n\\nDevOps tool integrations\\n\\n\\n\\n\\n\\n\\n\\nGitHub Advanced Security for Azure DevOps\\n\\n\\n\\n\\n\\n\\n\\nGitHub Enterprise\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Dev Box\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Playwright Testing\\n\\n- Hybrid + multicloud [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nAzure Arc\\n\\n\\n\\n\\n\\n\\n\\nAzure Database for PostgreSQL\\n\\n\\n\\n\\n\\n\\n\\nAzure DevOps\\n\\n\\n\\n\\n\\n\\n\\nAzure ExpressRoute\\n\\n\\n\\n\\n\\n\\n\\nAzure IoT Edge\\n\\n\\n\\n\\n\\n\\n\\nAzure Modular Datacenter\\n\\n\\n\\n\\n\\n\\n\\nAzure SQL Database\\n\\n\\n\\n\\n\\n\\n\\nAzure SQL Edge\\n\\n\\n\\n\\n\\n\\n\\nAzure Stack\\n\\n\\n\\n\\n\\n\\n\\nAzure Stack Edge\\n\\n\\n\\n\\n\\n\\n\\nAzure Stack HCI\\n\\n\\n\\n\\n\\n\\n\\nAzure Stack Hub\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Defender for Cloud\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Sentinel\\n\\n- Identity [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nAzure Active Directory B2C\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Entra Domain Services\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Entra ID (formerly Azure AD)\\n\\n- Integration [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nAPI Management\\n\\n\\n\\n\\n\\n\\n\\nAzure Health Data Services\\n\\n\\n\\n\\n\\n\\n\\nAzure Web PubSub\\n\\n\\n\\n\\n\\n\\n\\nEvent Grid\\n\\n\\n\\n\\n\\n\\n\\nLogic Apps\\n\\n\\n\\n\\n\\n\\n\\nService Bus\\n\\n- Internet of Things [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nAPI Management\\n\\n\\n\\n\\n\\n\\n\\nAzure Cosmos DB\\n\\n\\n\\n\\n\\n\\n\\nAzure Digital Twins\\n\\n\\n\\n\\n\\n\\n\\nAzure Functions\\n\\n\\n\\n\\n\\n\\n\\nAzure IoT Central\\n\\n\\n\\n\\n\\n\\n\\nAzure IoT Edge\\n\\n\\n\\n\\n\\n\\n\\nAzure IoT Hub\\n\\n\\n\\n\\n\\n\\n\\nAzure Machine Learning\\n\\n\\n\\n\\n\\n\\n\\nAzure Maps\\n\\n\\n\\n\\n\\n\\n\\nAzure RTOS\\n\\n\\n\\n\\n\\n\\n\\nAzure Sphere\\n\\n\\n\\n\\n\\n\\n\\nAzure SQL Edge\\n\\n\\n\\n\\n\\n\\n\\nAzure Stream Analytics\\n\\n\\n\\n\\n\\n\\n\\nAzure Time Series Insights\\n\\n\\n\\n\\n\\n\\n\\nEvent Grid\\n\\n\\n\\n\\n\\n\\n\\nLogic Apps\\n\\n\\n\\n\\n\\n\\n\\nNotification Hubs\\n\\n\\n\\n\\n\\n\\n\\nWindows 10 IoT Core Services\\n\\n\\n\\n\\n\\n\\n\\nWindows for IoT\\n\\n- Management and governance [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nAutomation\\n\\n\\n\\n\\n\\n\\n\\nAzure Advisor\\n\\n\\n\\n\\n\\n\\n\\nAzure Automanage\\n\\n\\n\\n\\n\\n\\n\\nAzure Backup\\n\\n\\n\\n\\n\\n\\n\\nAzure Blueprints\\n\\n\\n\\n\\n\\n\\n\\nAzure Chaos Studio\\n\\n\\n\\n\\n\\n\\n\\nAzure Lighthouse\\n\\n\\n\\n\\n\\n\\n\\nAzure Managed Applications\\n\\n\\n\\n\\n\\n\\n\\nAzure Managed Grafana\\n\\n\\n\\n\\n\\n\\n\\nAzure Migrate\\n\\n\\n\\n\\n\\n\\n\\nAzure mobile app\\n\\n\\n\\n\\n\\n\\n\\nAzure Monitor\\n\\n\\n\\n\\n\\n\\n\\nAzure Policy\\n\\n\\n\\n\\n\\n\\n\\nAzure Resource Manager\\n\\n\\n\\n\\n\\n\\n\\nAzure Resource Manager templates\\n\\n\\n\\n\\n\\n\\n\\nAzure Resource Mover\\n\\n\\n\\n\\n\\n\\n\\nAzure Service Health\\n\\n\\n\\n\\n\\n\\n\\nAzure Site Recovery\\n\\n\\n\\n\\n\\n\\n\\nCloud Shell\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Azure portal\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Copilot for Azure\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Cost Management\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Purview\\n\\n\\n\\n\\n\\n\\n\\nNetwork Watcher\\n\\n\\n\\n\\n\\n\\n\\nTraffic Manager\\n\\n\\n\\n\\n\\n\\n\\nUpdate management center\\n\\n- Media [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nAzure Media Player\\n\\n\\n\\n\\n\\n\\n\\nContent Delivery Network\\n\\n\\n\\n\\n\\n\\n\\nLive and On-Demand Streaming\\n\\n\\n\\n\\n\\n\\n\\nMedia Services\\n\\n- Migration [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nAzure Data Box\\n\\n\\n\\n\\n\\n\\n\\nAzure Database Migration Service\\n\\n\\n\\n\\n\\n\\n\\nAzure Migrate\\n\\n\\n\\n\\n\\n\\n\\nAzure Site Recovery\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Cost Management\\n\\n- Mixed reality [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nAzure Digital Twins\\n\\n\\n\\n\\n\\n\\n\\nRemote Rendering\\n\\n- Mobile [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nAPI Management\\n\\n\\n\\n\\n\\n\\n\\nApp Center\\n\\n\\n\\n\\n\\n\\n\\nApp Configuration\\n\\n\\n\\n\\n\\n\\n\\nApp Service\\n\\n\\n\\n\\n\\n\\n\\nAzure AI Search\\n\\n\\n\\n\\n\\n\\n\\nAzure AI Services\\n\\n\\n\\n\\n\\n\\n\\nAzure Communication Services\\n\\n\\n\\n\\n\\n\\n\\nAzure Maps\\n\\n\\n\\n\\n\\n\\n\\nNotification Hubs\\n\\n- Networking [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nApplication Gateway\\n\\n\\n\\n\\n\\n\\n\\nAzure Bastion\\n\\n\\n\\n\\n\\n\\n\\nAzure DDoS Protection\\n\\n\\n\\n\\n\\n\\n\\nAzure DNS\\n\\n\\n\\n\\n\\n\\n\\nAzure ExpressRoute\\n\\n\\n\\n\\n\\n\\n\\nAzure Firewall\\n\\n\\n\\n\\n\\n\\n\\nAzure Firewall Manager\\n\\n\\n\\n\\n\\n\\n\\nAzure Front Door\\n\\n\\n\\n\\n\\n\\n\\nAzure NAT Gateway\\n\\n\\n\\n\\n\\n\\n\\nAzure Private Link\\n\\n\\n\\n\\n\\n\\n\\nAzure Route Server\\n\\n\\n\\n\\n\\n\\n\\nAzure Virtual Network Manager\\n\\n\\n\\n\\n\\n\\n\\nContent Delivery Network\\n\\n\\n\\n\\n\\n\\n\\nLoad Balancer\\n\\n\\n\\n\\n\\n\\n\\nNetwork Watcher\\n\\n\\n\\n\\n\\n\\n\\nTraffic Manager\\n\\n\\n\\n\\n\\n\\n\\nVirtual Network\\n\\n\\n\\n\\n\\n\\n\\nVirtual WAN\\n\\n\\n\\n\\n\\n\\n\\nVPN Gateway\\n\\n\\n\\n\\n\\n\\n\\nWeb Application Firewall\\n\\n- Security [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nApp Configuration\\n\\n\\n\\n\\n\\n\\n\\nApplication Gateway\\n\\n\\n\\n\\n\\n\\n\\nAzure Bastion\\n\\n\\n\\n\\n\\n\\n\\nAzure confidential ledger\\n\\n\\n\\n\\n\\n\\n\\nAzure DDoS Protection\\n\\n\\n\\n\\n\\n\\n\\nAzure Dedicated HSM\\n\\n\\n\\n\\n\\n\\n\\nAzure Firewall\\n\\n\\n\\n\\n\\n\\n\\nAzure Firewall Manager\\n\\n\\n\\n\\n\\n\\n\\nAzure Front Door\\n\\n\\n\\n\\n\\n\\n\\nKey Vault\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Copilot for Security\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Defender for Cloud\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Entra Domain Services\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Sentinel\\n\\n\\n\\n\\n\\n\\n\\nVPN Gateway\\n\\n\\n\\n\\n\\n\\n\\nWeb Application Firewall\\n\\n- Storage [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nArchive Storage\\n\\n\\n\\n\\n\\n\\n\\nAvere vFXT for Azure\\n\\n\\n\\n\\n\\n\\n\\nAzure Backup\\n\\n\\n\\n\\n\\n\\n\\nAzure Blob Storage\\n\\n\\n\\n\\n\\n\\n\\nAzure confidential ledger\\n\\n\\n\\n\\n\\n\\n\\nAzure Data Box\\n\\n\\n\\n\\n\\n\\n\\nAzure Data Lake Storage\\n\\n\\n\\n\\n\\n\\n\\nAzure Data Share\\n\\n\\n\\n\\n\\n\\n\\nAzure Disk Storage\\n\\n\\n\\n\\n\\n\\n\\nAzure Elastic SAN\\n\\n\\n\\n\\n\\n\\n\\nAzure Files\\n\\n\\n\\n\\n\\n\\n\\nAzure HPC Cache\\n\\n\\n\\n\\n\\n\\n\\nAzure Managed Lustre\\n\\n\\n\\n\\n\\n\\n\\nAzure NetApp Files\\n\\n\\n\\n\\n\\n\\n\\nAzure Storage Actions\\n\\n\\n\\n\\n\\n\\n\\nQueue Storage\\n\\n\\n\\n\\n\\n\\n\\nStorage Accounts\\n\\n\\n\\n\\n\\n\\n\\nStorage Explorer\\n\\n- Virtual desktop infrastructure [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nAzure Lab Services\\n\\n\\n\\n\\n\\n\\n\\nAzure Virtual Desktop\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Dev Box\\n\\n- Web [open Microsoft home page](javascript:void(0);)\\n\\n\\n\\n\\n\\n\\n\\nAPI Management\\n\\n\\n\\n\\n\\n\\n\\nApp Configuration\\n\\n\\n\\n\\n\\n\\n\\nApp Service\\n\\n\\n\\n\\n\\n\\n\\nAzure AI Search\\n\\n\\n\\n\\n\\n\\n\\nAzure Communication Services\\n\\n\\n\\n\\n\\n\\n\\nAzure Maps\\n\\n\\n\\n\\n\\n\\n\\nAzure SignalR Service\\n\\n\\n\\n\\n\\n\\n\\nAzure Web PubSub\\n\\n\\n\\n\\n\\n\\n\\nContent Delivery Network\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Playwright Testing\\n\\n\\n\\n\\n\\n\\n\\nNotification Hubs\\n\\n\\n\\n\\n\\n\\n\\nStatic Web Apps\\n\\n\\n\\n\\n\\n\\n\\nWeb App for Containers\\n\\n\\nLoading...\\n\\nStatus\\n\\n0 Selected\\n\\nAll (3)\\n\\nIn development\\n\\nIn preview\\n\\nLaunched\\n\\nLoading...\\n\\nUpdate type\\n\\n0 Selected\\n\\nAll (17)\\n\\nCompliance\\n\\nFeatures\\n\\nGallery\\n\\nManagement\\n\\nMicrosoft Build\\n\\nMicrosoft Connect\\n\\nMicrosoft Ignite\\n\\nMicrosoft Inspire\\n\\nMicrosoftBuild\\n\\nOpen Source\\n\\nOperating System\\n\\nPricing & Offerings\\n\\nRegions & Datacenters\\n\\nRetirements\\n\\nSDK and Tools\\n\\nSecurity\\n\\nServices\\n\\nLoading...\\n\\nNew or updated\\n\\nLoading...\\n\\nSave filters\\n\\nClear filters\\n\\nApply filters\\n\\nIn development (124)\\n\\nLimited offering to select customers for non-production use and testing.\\n\\nIn preview (2743)\\n\\nAvailable to all Azure customers for non-production use and testing.\\n\\nLaunched (4371)\\n\\nFully released, production-ready product available to all Azure customers.\\n\\n- Clear All\\n\\n\\n\"test2\"\\n\\n\"test3\"\\n\\nSort by\\n\\nRecently Modified: Newest to oldest\\n\\n- RECENTLY MODIFIED\\n\\n- Newest to oldest\\n\\n- Oldest to newest\\n\\n\\nLoading...\\n\\n- ## Public Preview: Live Resize for Premium SSD v2 and Ultra NVMe Disks\\n\\n\\n\\n\\n\\nAzure Disk Storage\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAnnouncing the public preview of Live Resize for Premium SSD v2 (Pv2)\\nand Ultra [NVMe Disks](https://learn.microsoft.com/en-us/azure/virtual-machines/nvme-overview). This feature allows you to dynamically expand\\nthe storage capacity of your disks without any disruption to your applications.\\nTo optimize costs, you can start with smaller disks and gradually increase\\ntheir storage capacity as needed, all without experiencing any downtime.\\n\\n\\n\\n\\n\\n\\n\\nGet\\nstarted with Live Resizing your Ultra and/or Pv2 disks [here.](https://aka.ms/LiveResizePublicLink)\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAzure ID\\n\\n\\n\\n495106\\n\\n\\n\\n\\n\\n\\n\\nProduct Categories(s)\\n\\n\\n\\nStorage\\n\\n\\n\\n\\n\\n\\n\\nUpdate Types(s)\\n\\n\\n\\nFeatures\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAdded to roadmap: 05/29/2025\\n\\n\\n\\n\\\\|\\n\\n\\n\\n\\nLast modified: 05/29/2025\\n\\n\\n\\n\\n\\nShare\\n\\n\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/495106)\\n\\n- ## Generally Available: Destination Network Address Translation (DNAT) on Azure Firewall Private IP address\\n\\n\\n\\n\\n\\nAzure Firewall\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAzure Firewall enhances the DNAT rule configuration to\\nsupport port translations on its Private IP address.\\n\\n\\n\\nDNAT on Azure Firewall Private IP address helps connect\\noverlapped IP networks, which is a common scenario for enterprises when\\nonboarding new partners to their network or merging with new\\nacquisitions.\\n\\n\\n\\nThis capability is also relevant for hybrid scenarios,\\nconnecting on-premises datacenters to Azure, where DNAT bridges the gap,\\nenabling communication between private resources over non-routable IP\\naddresses.\\n\\n\\n\\n[Learn\\\\\\\\\\nmore](https://techcommunity.microsoft.com/blog/azurenetworksecurityblog/private-ip-dnat-support-preview-and-scenarios-with-azure-firewall/4230073).\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAzure ID\\n\\n\\n\\n493296\\n\\n\\n\\n\\n\\n\\n\\nProduct Categories(s)\\n\\n\\n\\nNetworking, Security\\n\\n\\n\\n\\n\\n\\n\\nUpdate Types(s)\\n\\n\\n\\nFeatures\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAdded to roadmap: 05/29/2025\\n\\n\\n\\n\\\\|\\n\\n\\n\\n\\nLast modified: 05/29/2025\\n\\n\\n\\n\\n\\nShare\\n\\n\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/493296)\\n\\n- ## Public Preview: Azure Backup for Elastic SAN\\n\\n\\n\\n\\n\\nAzure Backup\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\nPRIVATE PREVIEW\\n\\n\\n\\nJuly 2024\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\nPRIVATE PREVIEW\\n\\n\\n\\nJuly 2024\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAzure Backup now supports Elastic SAN, offering a fully\\nmanaged solution for backing up and restoring Elastic SAN volumes. This\\nintegration helps protect data against accidental deletions, ransomware\\nattacks, and application updates by exporting Elastic SAN volumes to\\nindependent Managed Disk Incremental Snapshots. These snapshots are stored in\\nlocally redundant storage and are independent of the Elastic SAN volume\\nlifecycle.\\n\\n\\n\\nThe solution supports up to 450 restore points with 24 hours\\nbackup frequency. It is currently available in select Azure regions and\\nsupports volumes up to 4 TiB. Long-term vaulted backups and hourly backups are\\nnot supported in this preview.\\n\\n\\n\\nThis public preview does not incur Azure Backup Protected\\nInstance fees. However, standard charges apply for Managed Disk Incremental\\nSnapshots.\\n\\n\\n\\nTo get started, visit the Azure Business Continuity Center\\nand configure protection for your Elastic SAN volumes.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAzure ID\\n\\n\\n\\n494438\\n\\n\\n\\n\\n\\n\\n\\nProduct Categories(s)\\n\\n\\n\\nManagement and governance, Storage\\n\\n\\n\\n\\n\\n\\n\\nUpdate Types(s)\\n\\n\\n\\nCompliance, Management, Features\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAdded to roadmap: 05/29/2025\\n\\n\\n\\n\\\\|\\n\\n\\n\\n\\nLast modified: 05/29/2025\\n\\n\\n\\n\\n\\nShare\\n\\n\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/494438)\\n\\n- ## Public Preview: Using Server-sent events with Application Gateway\\n\\n\\n\\n\\n\\nApplication Gateway\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAzure Application Gateway supports use of Server-sent\\nevents in preview, enabling real-time data streaming from server to client. Server-sent\\nevents utilize server push technology on a persistent HTTP connection for\\nseamless updates to the clients.\\n\\n\\n\\nTo implement this, specific configurations\\nare required on both the application gateway resource and the backend\\napplication. [Explore\\\\\\\\\\nthese configurations](https://learn.microsoft.com/azure/application-gateway/use-server-sent-events) to know how you can use Server-sent events with\\nApplication Gateway.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAzure ID\\n\\n\\n\\n494787\\n\\n\\n\\n\\n\\n\\n\\nProduct Categories(s)\\n\\n\\n\\nNetworking, Security\\n\\n\\n\\n\\n\\n\\n\\nUpdate Types(s)\\n\\n\\n\\nManagement, Features\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAdded to roadmap: 05/28/2025\\n\\n\\n\\n\\\\|\\n\\n\\n\\n\\nLast modified: 05/28/2025\\n\\n\\n\\n\\n\\nShare\\n\\n\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/494787)\\n\\n- ## Generally Available: App Service Hybrid Connection Manager\\n\\n\\n\\n\\n\\nApp Service\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAnnouncing\\xa0 App Service Hybrid Connection has been updated, and the latest version is now generally available.\\n\\nThis new version has an updated look and feel and provides the following advantages over the previous version:\\n\\n\\n\\n- Support for both Windows and Linux clients\\n- Enhanced logging and visibility into operating status\\n- Refreshed GUI and a new CLI experience for cross-platform compatibility\\n\\n[Learn more](https://learn.microsoft.com/azure/app-service/app-service-hybrid-connections?tabs=windows#hybrid-connection-manager)\\n\\nAzure ID\\n\\n494993\\n\\nProduct Categories(s)\\n\\nCompute, Mobile, Web\\n\\nUpdate Types(s)\\n\\nFeatures\\n\\nAdded to roadmap: 05/28/2025\\n\\n\\\\|\\n\\nLast modified: 05/28/2025\\n\\nShare\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/494993)\\n\\n- ## Generally Available: Customer-managed keys for Azure NetApp Files volume encryption with Azure Key Vault Managed HSM\\n\\n\\n\\n\\n\\nAzure NetApp Files\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAzure NetApp Files volume encryption choices have expanded to offer support customer-managed keys for Azure NetApp Files volume encryption with Azure Key Vault Managed HSM.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nThis capability offers increased security from FIPS 140-2 Level 2 to FIPS 140-2 Level 3 for critical deployments. Various applications that leverage HSM security include payment processing, application-level encryption, authentication. Industry verticals that use HSMs include financial services, public sector, IT/Telco (secure communications), energy (securing critical infrastructure).\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nThis feature is generally available in [these regions](https://learn.microsoft.com/en-us/azure/azure-netapp-files/configure-customer-managed-keys-hardware#supported-regions).\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLearn more:\\n\\n\\n\\n\\n\\n\\n\\n- [What\\'s new in Azure NetApp Files](https://learn.microsoft.com/en-us/azure/azure-netapp-files/whats-new?msclkid=0da6b3e7d15111ecb7b535244a027cc8)\\n\\n- [Configure customer-managed keys with managed Hardware Security Module for Azure NetApp Files volume encryption](https://learn.microsoft.com/en-us/azure/azure-netapp-files/configure-customer-managed-keys-hardware)\\n\\n\\nAzure ID\\n\\n493909\\n\\nProduct Categories(s)\\n\\nStorage\\n\\nUpdate Types(s)\\n\\nFeatures, SDK and Tools, Services\\n\\nAdded to roadmap: 05/28/2025\\n\\n\\\\|\\n\\nLast modified: 05/28/2025\\n\\nShare\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/493909)\\n\\n- ## Private Preview: Application Awareness\\n\\n\\n\\n\\n\\nAzure Migrate\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nA key step in any cloud transformation plan is a current state analysis of the entire IT estate covering workloads and applications, and relationships/dependencies among them. More often, customers are looking to set their migration goals in terms of the applications they want to move to cloud, and not in terms of the individual servers, databases or webapps in silos.\\n\\nWe are excited to announce the public preview of application aware experiences in Azure Migrate. The capability includes identification of ideal migration strategy among Rehost and Replatform from Gartner\\'s 6Rs to allow customers to gain insights into the total cost of ownership, identify suitable IaaS and PaaS targets, and receive tailored migration guidance.\\n\\nAs part of delivering application awareness, we will also be delivering a refreshed user experience for Azure Migrate. This experience is designed to guide users through the journey and deliver an integrated seamless experience from Azure Migrate to other bespoke tools, depending upon their goals and personas.\\n\\n**Learn more:**\\n\\n- Read [blog](https://aka.ms/AzureMigrateBuild2025Blog) post\\n- Read [Review an application assessment](https://learn.microsoft.com/en-us/azure/migrate/review-application-assessment?view=migrate#overview)\\n\\n- Watch\\xa0[Azure Migrate guided application aware user experience](https://www.youtube.com/watch?v=aquRVLvau7c)\\n\\nAzure ID\\n\\n494101\\n\\nProduct Categories(s)\\n\\nManagement and governance, Migration\\n\\nUpdate Types(s)\\n\\nMicrosoft Build, Features\\n\\nAdded to roadmap: 05/28/2025\\n\\n\\\\|\\n\\nLast modified: 05/28/2025\\n\\nShare\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/494101)\\n\\n- ## Public Preview: Microsoft Planetary Computer Pro\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nMicrosoft Planetary Computer Pro is a\\ncomprehensive geospatial data platform designed to accelerate the generation of\\ngeospatial insights and their integration into enterprise Data & AI\\nworkflows. It offers robust capabilities to ingest, manage, and disseminate\\ngeospatial datasets, thus enabling organizations to harness the power of\\nlocation-based data effectively. By streamlining geospatial processes, Microsoft Planetary Computer Pro\\nfacilitates improved decision-making and operational efficiency.\\xa0[Learn more](https://aka.ms/planetarycomputerpro).\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAzure ID\\n\\n\\n\\n494165\\n\\n\\n\\n\\n\\n\\n\\nUpdate Types(s)\\n\\n\\n\\nFeatures, Microsoft Build\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAdded to roadmap: 05/28/2025\\n\\n\\n\\n\\\\|\\n\\n\\n\\n\\nLast modified: 05/28/2025\\n\\n\\n\\n\\n\\nShare\\n\\n\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/494165)\\n\\n- ## Generally Available: Azure Migrate enhances support with Premium SSD v2 Disks\\n\\n\\n\\n\\n\\nAzure Disk Storage\\n\\n\\n\\nAzure Migrate\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n[Azure Migrate](https://learn.microsoft.com/en-us/azure/migrate/?view=migrate-classic)\\xa0now supports migration to [Premium SSD v2 (Pv2)](https://learn.microsoft.com/en-us/azure/virtual-machines/disks-deploy-premium-v2?tabs=azure-cli)\\xa0disks, offering customers a seamless experience to migrate their on-premises\\xa0workloads to Azure and benefit from the greater flexibility and enhanced performance of Pv2 disks in Azure. Pv2 disks offer sub-millisecond disk latencies for demanding IO-intensive workloads at a low-cost. Customers can use that to improve the price-performance of a broad range of enterprise production workloads such as SQL Server, Oracle, MariaDB, SAP, Cassandra, Mongo DB, big data, analytics, gaming, on virtual machines, or stateful containers. Azure Migrate now recommends Pv2 as the target disk type for eligible data disks in regions where Pv2 is available and offers Pv2 as a selectable option\\xa0for migrating applicable data disks.\\n\\n[Learn more](https://review.learn.microsoft.com/en-us/azure/migrate/vmware/tutorial-migrate-vmware?branch=main&branchFallbackFrom=pr-en-us-294110#replicate-vms).\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAzure ID\\n\\n\\n\\n495302\\n\\n\\n\\n\\n\\n\\n\\nProduct Categories(s)\\n\\n\\n\\nStorage, Management and governance, Migration\\n\\n\\n\\n\\n\\n\\n\\nUpdate Types(s)\\n\\n\\n\\nFeatures, Management\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAdded to roadmap: 05/28/2025\\n\\n\\n\\n\\\\|\\n\\n\\n\\n\\nLast modified: 05/28/2025\\n\\n\\n\\n\\n\\nShare\\n\\n\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/495302)\\n\\n- ## Public Preview: Azure Migrate expands support for migrations with Ultra SSD\\n\\n\\n\\n\\n\\nAzure Disk Storage\\n\\n\\n\\nAzure Migrate\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n[Azure Migrate](https://learn.microsoft.com/en-us/azure/migrate/?view=migrate-classic) now supports migration to [Ultra Disk](https://learn.microsoft.com/en-us/azure/virtual-machines/disks-enable-ultra-ssd?tabs=azure-portal), enabling customers to seamlessly migrate their\\non-premises workloads to Azure while taking advantage of Ultra Diskβs\\ncutting-edge performance and scalability. Designed for I/O-intensive workloads,\\nsuch as SAP HANA, high-performance databases like SQL and Oracle, and\\nlatency-sensitive applications, Ultra Disk delivers up to 400,000 IOPS and\\n10,000 MBps with low sub-millisecond latency, dynamic scalability, and\\nenterprise-grade reliability. In regions where Ultra Disk is available, Azure\\nMigrate has been enhanced to offer Ultra Disk as a selectable option for\\nmigrating applicable data disks.\\n\\n\\n\\n[Learn more](https://review.learn.microsoft.com/en-us/azure/migrate/vmware/tutorial-migrate-vmware?branch=main&branchFallbackFrom=pr-en-us-294110#replicate-vms).\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAzure ID\\n\\n\\n\\n495312\\n\\n\\n\\n\\n\\n\\n\\nProduct Categories(s)\\n\\n\\n\\nStorage, Management and governance, Migration\\n\\n\\n\\n\\n\\n\\n\\nUpdate Types(s)\\n\\n\\n\\nFeatures, Management\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAdded to roadmap: 05/28/2025\\n\\n\\n\\n\\\\|\\n\\n\\n\\n\\nLast modified: 05/28/2025\\n\\n\\n\\n\\n\\nShare\\n\\n\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/495312)\\n\\n- ## Generally Available: Azure Cosmos DB JavaScript SDK 4.0\\n\\n\\n\\n\\n\\nAzure Cosmos DB\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nThe Azure Cosmos DB JavaScript SDK 4.0 is now generally available.\\xa0This major update brings a range of enhancements to help you build more efficient and scalable applications,\\xa0whether you\\'re\\xa0working with large datasets, securing sensitive information, or building smart search functionalities.\\n\\n\\n\\n\\n\\n\\n\\nKey improvements include enhanced diagnostic logging for better performance tracking, an improved bulk API for faster data operations, and a more flexible query design for efficient scaling. The SDK\\xa0also introduces client-side encryption to support data security, as well as AI-driven features, including vector\\xa0search and full-text search to enable advanced search capabilities.\\n\\n\\n\\n\\n\\n\\n\\nThese updates make the Azure Cosmos DB JavaScript SDK 4.0 a powerful tool for building high-performance applications.\\n\\n\\n\\n\\n\\n\\n\\n[Learn more](https://aka.ms/azure-cosmosdb-javascript-4.0).\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAzure ID\\n\\n\\n\\n442638\\n\\n\\n\\n\\n\\n\\n\\nProduct Categories(s)\\n\\n\\n\\nDatabases, Internet of Things\\n\\n\\n\\n\\n\\n\\n\\nUpdate Types(s)\\n\\n\\n\\nFeatures, Microsoft Build\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAdded to roadmap: 05/27/2025\\n\\n\\n\\n\\\\|\\n\\n\\n\\n\\nLast modified: 05/27/2025\\n\\n\\n\\n\\n\\nShare\\n\\n\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/442638)\\n\\n- ## Public Preview: Azure Cosmos DB for MongoDB (vCore) trigger and bindings for Azure Functions\\n\\n\\n\\n\\n\\nAzure Cosmos DB\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIntegration between Azure Functions and Azure Cosmos DB for MongoDB (vCore) is now available in public preview.\\xa0You can now respond to changes in your collections using Azure Functions\\xa0triggers and bindings, enabling you to easily build real-time, event-driven applications. The public preview supports Azure Functions\\xa0triggers in C# for Azure Cosmos DB\\xa0for MongoDB (vCore).\\n\\n\\n\\n[Learn more](https://aka.ms/vcore-azure-functions).\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAzure ID\\n\\n\\n\\n491267\\n\\n\\n\\n\\n\\n\\n\\nProduct Categories(s)\\n\\n\\n\\nDatabases, Internet of Things\\n\\n\\n\\n\\n\\n\\n\\nUpdate Types(s)\\n\\n\\n\\nFeatures, Microsoft Build\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAdded to roadmap: 05/27/2025\\n\\n\\n\\n\\\\|\\n\\n\\n\\n\\nLast modified: 05/27/2025\\n\\n\\n\\n\\n\\nShare\\n\\n\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/491267)\\n\\n- ## Public Preview: Microsoft DocumentDB Docker Image\\n\\n\\n\\n\\n\\nAzure Cosmos DB\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIntroducing the Microsoft DocumentDB\\xa0Docker Image,\\xa0a local development environment you can use to build and test apps with DocumentDB.\\xa0This image\\xa0simulates the DocumentDB\\xa0on your local machine, providing a fast, cost-effective, and isolated\\xa0environment for development. You can also integrate the\\xa0DocumentDB\\xa0Docker Image into continuous integration and continuous delivery (CI/CD)\\xa0pipelines to support automated testing and validation of app behavior against a local instance, ensuring smooth deployments\\xa0and consistent performance across environments.\\n\\n[Learn more](https://aka.ms/documentdb-docker).\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAzure ID\\n\\n\\n\\n491272\\n\\n\\n\\n\\n\\n\\n\\nProduct Categories(s)\\n\\n\\n\\nDatabases, Internet of Things\\n\\n\\n\\n\\n\\n\\n\\nUpdate Types(s)\\n\\n\\n\\nFeatures, Microsoft Build\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAdded to roadmap: 05/27/2025\\n\\n\\n\\n\\\\|\\n\\n\\n\\n\\nLast modified: 05/27/2025\\n\\n\\n\\n\\n\\nShare\\n\\n\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/491272)\\n\\n- ## Public Preview: Azure Front Door now supports origin authentication via Managed Identities\\n\\n\\n\\n\\n\\nAzure Front Door\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nFront Door Standard and Premium now\\nsupports sending authenticated requests to its origins using Managed Identities.\\nThis feature allows customers to secure their origins by allowing only approved\\nFront Door profiles to access the origins.\\n\\n\\n\\nUsing Managed Identities eliminates the\\nneed for customers to manually handle the credentials involved in the authentication\\nprocess, thereby reducing potential risks associated with credential leakage.\\n\\n\\n\\n**Learn more**:\\n\\n\\n\\n- About\\u202f[managed\\\\\\\\\\nidentities](https://learn.microsoft.com/azure/active-directory/managed-identities-azure-resources/overview).\\n- About\\xa0[how to enable\\\\\\\\\\nmanaged identities on Azure Front Door Standard and Premium.](https://learn.microsoft.com/azure/frontdoor/origin-authentication-with-managed-identities)\\n\\nAzure ID\\n\\n494352\\n\\nProduct Categories(s)\\n\\nNetworking, Security\\n\\nUpdate Types(s)\\n\\nFeatures, Services\\n\\nAdded to roadmap: 05/27/2025\\n\\n\\\\|\\n\\nLast modified: 05/27/2025\\n\\nShare\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/494352)\\n\\n- ## Generally Available: Container Apps and Functions as Private Link enabled origins for Front Door Premium\\n\\n\\n\\n\\n\\nAzure Front Door\\n\\n\\n\\nAzure Private Link\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nYou can now configure Azure Container Apps\\nand Azure Functions as Private Link enabled origins in your Front Door Premium\\nprofile. Private Link enabled origins in Front Door allow you to deliver\\ncontent to your end-users through public Front Door endpoints while ensuring\\nthat your origins remain inaccessible to the public internet.\\n\\n\\n\\n**Learn more:**\\n\\n\\n\\n- [Secure your Origin with Private Link in Azure Front Door\\\\\\\\\\nPremium.](https://learn.microsoft.com/azure/frontdoor/private-link)\\n\\n- How to [connect Azure Front Door Premium to an Azure Container App\\\\\\\\\\norigin with Private Link](https://learn.microsoft.com/azure/container-apps/how-to-integrate-with-azure-front-door?pivots=azure-portal).\\n- How to [connect\\\\\\\\\\nAzure Front Door Premium to an App Service (Web App or Functions) origin\\\\\\\\\\nwith Private Link.](https://learn.microsoft.com/en-us/azure/frontdoor/standard-premium/how-to-enable-private-link-web-app)\\n\\nAzure ID\\n\\n494357\\n\\nProduct Categories(s)\\n\\nNetworking, Security\\n\\nUpdate Types(s)\\n\\nFeatures, Services\\n\\nAdded to roadmap: 05/27/2025\\n\\n\\\\|\\n\\nLast modified: 05/27/2025\\n\\nShare\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/494357)\\n\\n- ## Generally Available: Private subnet\\n\\n\\n\\n\\n\\nVirtual Network\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nWe are announcing the general availability of private subnet functionality in Azure.\\n\\n\\n\\nCurrently, when virtual machines are created\\nin a virtual network without any explicit outbound connectivity, they are\\nassigned a default outbound public IP address. These implicit IPs are subject\\nto change, not associated with a subscription, difficult to troubleshoot, and\\ndo not follow Azure\\'s model of \"secure by default\" which ensures\\ncustomers have strong security without additional steps needed.\\xa0 The\\nprivate subnet feature prevents this insecure implicit connectivity for any\\nnewly created subnets by setting the \"default outbound access\"\\nparameter to false. You can then pick your preferred method for explicit\\noutbound connectivity, such as a NAT Gateway or Public IP address.\\n\\n\\n\\nAdditionally, please note that after\\nSeptember 30th, 2025, new virtual networks will default to using private subnets,\\nmeaning that an explicit outbound method must be enabled in order to reach\\npublic endpoints on the Internet and within Microsoft.\\xa0 Older versions of\\nthe Azure API will not be affected, and there would also be no change to\\nexisting virtual networks. This means that there will be no change in the operation of existing or new virtual machines in these subnets.\\n\\n[Learn more](https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/default-outbound-access).\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAzure ID\\n\\n\\n\\n492953\\n\\n\\n\\n\\n\\n\\n\\nProduct Categories(s)\\n\\n\\n\\nNetworking\\n\\n\\n\\n\\n\\n\\n\\nUpdate Types(s)\\n\\n\\n\\nFeatures\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAdded to roadmap: 05/27/2025\\n\\n\\n\\n\\\\|\\n\\n\\n\\n\\nLast modified: 05/27/2025\\n\\n\\n\\n\\n\\nShare\\n\\n\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/492953)\\n\\n- ## Generally Available: Inbound Private Endpoint Support for Azure API Management Standard v2\\n\\n\\n\\n\\n\\nAPI Management\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAnnouncing\\nthe general availability of the inbound private endpoint feature for the Azure\\nAPI Management Standard v2 tier. This update enables organizations to securely\\nexpose their API Management gateway exclusively over Azure Private Link,\\nensuring that API traffic remains\\xa0fully contained within the Microsoft\\nbackbone network.\\n\\n\\n\\nThis\\ncapability is critical for customers who need network-level security and\\ncompliance for API access particularly in regulated industries like finance,\\nhealthcare, and government. With inbound private endpoints, API Management\\nStandard v2 now supports:\\n\\n\\n\\n- End-to-end private\\nconnectivity\\n- Improved security posture\\n- Reduced attack surface\\n- Better control over data flow\\nand API exposure\\n\\n[Learn\\\\\\\\\\nmore](https://learn.microsoft.com/en-us/azure/api-management/v2-service-tiers-overview).\\n\\nAzure ID\\n\\n492607\\n\\nProduct Categories(s)\\n\\nIntegration, Internet of Things, Mobile, Web\\n\\nUpdate Types(s)\\n\\nMicrosoft Build, Features\\n\\nAdded to roadmap: 05/23/2025\\n\\n\\\\|\\n\\nLast modified: 05/23/2025\\n\\nShare\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/492607)\\n\\n- ## Generally Available: Import from Azure AI Foundry to Azure API Managementβs AI Gateway\\n\\n\\n\\n\\n\\nAPI Management\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAnnouncing\\nthe general availability of importing model endpoints from Azure AI Foundry\\ndirectly into Azure API Managementβs AI Gateway. This capability simplifies\\nonboarding of large language model (LLM) APIs by enabling seamless integration\\nthrough the Azure portal.\\n\\n\\n\\n\\n\\n\\n\\nKey\\nbenefits:\\n\\n\\n\\n- Rapid onboarding of LLM\\nendpoints from Azure AI Foundry\\n- Configure token limiting,\\ntoken tracking, semantic caching, and content safety\\n- Centralized API governance and\\nobservability for generative AI workloads\\n\\n[Learn more](https://aka.ms/apim/genai/ai-foundry-import).\\n\\nAzure ID\\n\\n491980\\n\\nProduct Categories(s)\\n\\nIntegration, Internet of Things, Mobile, Web\\n\\nUpdate Types(s)\\n\\nFeatures, Microsoft Build\\n\\nAdded to roadmap: 05/23/2025\\n\\n\\\\|\\n\\nLast modified: 05/23/2025\\n\\nShare\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/491980)\\n\\n- ## Generally Available: Support for AWS Bedrock API in AI Gateway Capabilities in Azure API Management\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLAUNCHED\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nGENERAL AVAILABILITY\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAnnouncing\\nexpanded support for AWS Bedrock model endpoints across all Generative AI\\npolicies in Azure API Managementβs AI Gateway. This release enables you to\\napply advanced management and optimization features such as Token Limit Policy,\\nToken Metric Policy, and Semantic Caching Policy to AWS Bedrock models,\\nempowering you to seamlessly manage and optimize\\xa0your multi-cloud AI\\nworkloads.\\n\\n\\n\\n\\n\\n\\n\\nKey\\nbenefits:\\n\\n\\n\\n- Apply token limiting,\\ntracking, and logging to AWS Bedrock APIs for better control\\n- Enable semantic caching to\\nenhance performance and response times for Bedrock models\\n- Achieve unified observability\\nand governance across multi-cloud AI endpoints\\n\\n[Learn more](https://aka.ms/apim/genai/bedrock).\\n\\nAzure ID\\n\\n491985\\n\\nUpdate Types(s)\\n\\nMicrosoft Build, Features\\n\\nAdded to roadmap: 05/23/2025\\n\\n\\\\|\\n\\nLast modified: 05/23/2025\\n\\nShare\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/491985)\\n\\n- ## Public Preview: Model Context Protocol support in Azure API Management and Azure API Center\\n\\n\\n\\n\\n\\nAPI Management\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nIN PREVIEW\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPREVIEW\\n\\n\\n\\nMay 2025\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nAnnouncing\\nthe public preview of Model Context Protocol (MCP) support in Azure API\\nManagement and Azure API Center. With this new capability, enterprises can\\ntransform their existing APIs into dynamic, agent-ready tools, while improving\\nsecurity and simplifying management.\\n\\n\\n\\n\\n\\n\\n\\nEnhanced\\nSecurity for MCP Servers:\\n\\n\\n\\n- Apply gateway policies to\\nprotect MCP servers by enforcing authentication, authorization, rate\\nlimiting, and other security measures.\\n\\nTransform\\nExisting APIs into MCP Servers:\\n\\n- Easily expose any APIM-managed\\nAPI as an MCP server, transforming your existing APIs into dynamic,\\nagent-ready tools with minimal effort.\\n\\nPrivate\\nMCP Registry for Enterprise Organizations:\\n\\n- Use Azure API Center as a\\nprivate remote MCP registry for your organization, giving you full control\\nover what services are exposed.\\n\\nStreamlined\\nMCP Discovery and Consumption:\\n\\n- Expose APIs as MCP servers in\\nAzure API Center (APIC) for a better developer experience and smoother API\\nconsumption.\\n\\nThese\\nfeatures make it easier to secure, manage, and consume APIs in a way that\\nsupports advanced use cases like AI integrations, all while improving security\\nand governance.\\n\\n[Learn more](https://aka.ms/apim-mcp-support).\\n\\nAzure ID\\n\\n491990\\n\\nProduct Categories(s)\\n\\nIntegration, Internet of Things, Mobile, Web\\n\\nUpdate Types(s)\\n\\nMicrosoft Build, Features\\n\\nAdded to roadmap: 05/23/2025\\n\\n\\\\|\\n\\nLast modified: 05/23/2025\\n\\nShare\\n\\n[](https://www.microsoft.com/releasecommunications/api/v2/azure/rss/491990)\\n\\n- <\\n- 1\\n- 2\\n- 3\\n- ...\\n- 439\\n- >\\n\\n0 Updates found\\n\\nPlease update your selections\\n\\n## Additional Resources\\n\\n\\n\\n### Microsoft Azure Blog\\n\\nLearn about the latest Microsoft Security solutions.\\n\\n[Learn more](https://azure.microsoft.com/en-us/blog)\\n\\n\\n\\n### Community support\\n\\nGet answers to your questions from Microsoft and Community experts.\\n\\n[Learn more](https://azure.microsoft.com/en-us/support/community/)\\n\\n\\n\\n### Provide feedback\\n\\nTell us what you think of Azure and what you want to see in the future.\\n\\n[Learn more](https://go.microsoft.com/fwlink/?linkid=2295127&clcid=0x409)\\n\\n\\n\\n### Product availability by region\\n\\nAzure is available in more regions than any other cloud provider.\\n\\n[Learn more](https://azure.microsoft.com/en-us/explore/global-infrastructure/products-by-region)\\n\\n- [ Get the Azure mobile app](https://azure.microsoft.com/en-us/get-started/azure-portal/mobile-app)\\n\\n- [](https://go.microsoft.com/fwlink/?linkid=2275431)\\n- [](https://go.microsoft.com/fwlink/?linkid=2275352)\\n- [](https://go.microsoft.com/fwlink/?linkid=2268687)\\n- [](https://go.microsoft.com/fwlink/?linkid=2239941)\\n\\n[iframe](https://mscom.demdex.net/dest5.html?d_nsid=0#https%3A%2F%2Fazure.microsoft.com)\\n\\n\\n\\nMicrosoft is conducting an online survey to understand your opinions about the Microsoft Azure website. If you choose to participate, the online survey will be presented to you when you leave the website.\\n\\nWould you like to participate?\\n\\n[Privacy Statement](https://go.microsoft.com/fwlink/?LinkId=521839 \"Privacy Statement\")\\n\\n[iframe](https://s.company-target.com/s/sync?exc=lr)\\n\\n', path=None, url=None, mimetype=None), image_resource=None, audio_resource=None, video_resource=None, text_template='{metadata_str}\\n\\n{content}')]\n"
|
321 |
+
]
|
322 |
+
}
|
323 |
+
],
|
324 |
+
"source": [
|
325 |
+
"print(\"Documents\")\n",
|
326 |
+
"print( len(documents))\n",
|
327 |
+
"pprint.pprint(documents[0:10])"
|
328 |
+
]
|
329 |
+
},
|
330 |
+
{
|
331 |
+
"cell_type": "code",
|
332 |
+
"execution_count": null,
|
333 |
+
"metadata": {
|
334 |
+
"colab": {
|
335 |
+
"base_uri": "https://localhost:8080/",
|
336 |
+
"height": 153,
|
337 |
+
"referenced_widgets": [
|
338 |
+
"bde8f75e71d84f90b6fbcc40a15ccd9e",
|
339 |
+
"e80f53dc9a784c29a3bd17c1569bd56a",
|
340 |
+
"4ff6592efda145cfb32c644f5b7bb7e1",
|
341 |
+
"e8db992d46f3404cb6c94fed540e89a4",
|
342 |
+
"e496685e94a34540908ad47423978d37",
|
343 |
+
"44aa911946a6412faad36abfcf24a5d4",
|
344 |
+
"38e3f8c2e78348a191c6de7ac2d02b62",
|
345 |
+
"28986ca2cbc2439e97f1f02eecc9706e",
|
346 |
+
"1658b21204a04c6da554ad380c3d26f1",
|
347 |
+
"6d808760aa88409d9066f2a363275672",
|
348 |
+
"13f5df276e014dbda0f0d8b8b2d80d94"
|
349 |
+
]
|
350 |
+
},
|
351 |
+
"id": "gMLXOYuZ15jE",
|
352 |
+
"outputId": "601d53e9-6498-41a2-816a-70c17ccb1e65"
|
353 |
+
},
|
354 |
+
"outputs": [
|
355 |
+
{
|
356 |
+
"name": "stdout",
|
357 |
+
"output_type": "stream",
|
358 |
+
"text": [
|
359 |
+
"Directorio 'dataset' creado.\n",
|
360 |
+
"The documents array has been saved to 'dataset/documents.csv'\n"
|
361 |
+
]
|
362 |
+
},
|
363 |
+
{
|
364 |
+
"data": {
|
365 |
+
"application/vnd.jupyter.widget-view+json": {
|
366 |
+
"model_id": "bde8f75e71d84f90b6fbcc40a15ccd9e",
|
367 |
+
"version_major": 2,
|
368 |
+
"version_minor": 0
|
369 |
+
},
|
370 |
+
"text/plain": [
|
371 |
+
"documents.csv: 0%| | 0.00/44.7k [00:00<?, ?B/s]"
|
372 |
+
]
|
373 |
+
},
|
374 |
+
"metadata": {},
|
375 |
+
"output_type": "display_data"
|
376 |
+
},
|
377 |
+
{
|
378 |
+
"data": {
|
379 |
+
"application/vnd.google.colaboratory.intrinsic+json": {
|
380 |
+
"type": "string"
|
381 |
+
},
|
382 |
+
"text/plain": [
|
383 |
+
"CommitInfo(commit_url='https://huggingface.co/datasets/vicpada/AzureResources/commit/6e04a0cabff20f42d9f440972f7fec91caa5cb45', commit_message='Upload documents.csv with huggingface_hub', commit_description='', oid='6e04a0cabff20f42d9f440972f7fec91caa5cb45', pr_url=None, repo_url=RepoUrl('https://huggingface.co/datasets/vicpada/AzureResources', endpoint='https://huggingface.co', repo_type='dataset', repo_id='vicpada/AzureResources'), pr_revision=None, pr_num=None)"
|
384 |
+
]
|
385 |
+
},
|
386 |
+
"execution_count": 16,
|
387 |
+
"metadata": {},
|
388 |
+
"output_type": "execute_result"
|
389 |
+
}
|
390 |
+
],
|
391 |
+
"source": [
|
392 |
+
"# prompt: Write documents array to csv file\n",
|
393 |
+
"\n",
|
394 |
+
"# Define the CSV filename for the documents\n",
|
395 |
+
"filename = 'dataset/documents.csv'\n",
|
396 |
+
"directory_name = \"dataset\"\n",
|
397 |
+
"\n",
|
398 |
+
"# Comprueba si el directorio no existe\n",
|
399 |
+
"if not os.path.exists(directory_name):\n",
|
400 |
+
" # Crea el directorio\n",
|
401 |
+
" os.makedirs(directory_name)\n",
|
402 |
+
" print(f\"Directorio '{directory_name}' creado.\")\n",
|
403 |
+
"else:\n",
|
404 |
+
" print(f\"El directorio '{directory_name}' ya existe.\")\n",
|
405 |
+
"\n",
|
406 |
+
"# Ensure documents is a list before proceeding\n",
|
407 |
+
"if isinstance(documents, list):\n",
|
408 |
+
" # Open the file in write mode ('w')\n",
|
409 |
+
" # newline='' avoids issues with blank lines\n",
|
410 |
+
" with open(filename, 'w', newline='') as file:\n",
|
411 |
+
" writer = csv.writer(file)\n",
|
412 |
+
" # Write the header row\n",
|
413 |
+
" writer.writerow(['Text', 'Title', 'URL', 'Name', 'ContentType'])\n",
|
414 |
+
" # Write each document as a row\n",
|
415 |
+
" for doc in documents:\n",
|
416 |
+
" writer.writerow([\n",
|
417 |
+
" doc.text,\n",
|
418 |
+
" doc.metadata.get('title'),\n",
|
419 |
+
" doc.metadata.get('url'),\n",
|
420 |
+
" doc.metadata.get('name'),\n",
|
421 |
+
" doc.metadata.get('contentType')\n",
|
422 |
+
" ])\n",
|
423 |
+
"\n",
|
424 |
+
" print(f\"The documents array has been saved to '{filename}'\")\n",
|
425 |
+
"else:\n",
|
426 |
+
" print(\"documents is not a list. Cannot write to CSV.\")\n",
|
427 |
+
"\n",
|
428 |
+
"# Upload the new documents CSV to Hugging Face\n",
|
429 |
+
"api.upload_file(\n",
|
430 |
+
" path_or_fileobj=filename,\n",
|
431 |
+
" path_in_repo='documents.csv',\n",
|
432 |
+
" repo_id=\"vicpada/AzureResources\",\n",
|
433 |
+
" repo_type=\"dataset\",\n",
|
434 |
+
")"
|
435 |
+
]
|
436 |
+
},
|
437 |
+
{
|
438 |
+
"cell_type": "code",
|
439 |
+
"execution_count": null,
|
440 |
+
"metadata": {
|
441 |
+
"colab": {
|
442 |
+
"base_uri": "https://localhost:8080/"
|
443 |
+
},
|
444 |
+
"id": "KR4W9wpM1QPm",
|
445 |
+
"outputId": "2adcaebb-1890-4ceb-c5dc-e3993e964541"
|
446 |
+
},
|
447 |
+
"outputs": [
|
448 |
+
{
|
449 |
+
"name": "stdout",
|
450 |
+
"output_type": "stream",
|
451 |
+
"text": [
|
452 |
+
"El directorio 'dataset' ya existe.\n",
|
453 |
+
"El array ha sido guardado en 'dataset/website-content.csv'\n"
|
454 |
+
]
|
455 |
+
}
|
456 |
+
],
|
457 |
+
"source": [
|
458 |
+
"# prompt: write a dictionary to a csv file\n",
|
459 |
+
"\n",
|
460 |
+
"import csv\n",
|
461 |
+
"\n",
|
462 |
+
"directory_name = \"dataset\"\n",
|
463 |
+
"\n",
|
464 |
+
"# Comprueba si el directorio no existe\n",
|
465 |
+
"if not os.path.exists(directory_name):\n",
|
466 |
+
" # Crea el directorio\n",
|
467 |
+
" os.makedirs(directory_name)\n",
|
468 |
+
" print(f\"Directorio '{directory_name}' creado.\")\n",
|
469 |
+
"else:\n",
|
470 |
+
" print(f\"El directorio '{directory_name}' ya existe.\")\n",
|
471 |
+
"\n",
|
472 |
+
"# save to CSV\n",
|
473 |
+
"filename = 'dataset/website-content.csv'\n",
|
474 |
+
"\n",
|
475 |
+
"# Ensure url_response is a dictionary before proceeding\n",
|
476 |
+
"if isinstance(url_response, dict):\n",
|
477 |
+
" # Open the file in write mode ('w')\n",
|
478 |
+
" # newline='' avoids issues with blank lines\n",
|
479 |
+
" with open(filename, 'w', newline='') as file:\n",
|
480 |
+
" writer = csv.writer(file)\n",
|
481 |
+
" # Write the header row\n",
|
482 |
+
" writer.writerow(['URL', 'Scraped Data', 'CSV Data'])\n",
|
483 |
+
" # Write each row from the dictionary\n",
|
484 |
+
" for url, data in url_response.items():\n",
|
485 |
+
" writer.writerow([url, data.get('scraped_data'), data.get('csv_data')])\n",
|
486 |
+
"\n",
|
487 |
+
" print(f\"El array ha sido guardado en '{filename}'\")\n",
|
488 |
+
"else:\n",
|
489 |
+
" print(\"url_response is not a dictionary. Cannot write to CSV.\")\n",
|
490 |
+
"\n",
|
491 |
+
" # Upload to HG\n",
|
492 |
+
"\n",
|
493 |
+
"api.upload_folder(\n",
|
494 |
+
" folder_path=\"dataset\",\n",
|
495 |
+
" repo_id=\"vicpada/AzureResources\",\n",
|
496 |
+
" repo_type=\"dataset\",\n",
|
497 |
+
")"
|
498 |
+
]
|
499 |
+
}
|
500 |
+
],
|
501 |
+
"metadata": {
|
502 |
+
"colab": {
|
503 |
+
"provenance": []
|
504 |
+
},
|
505 |
+
"kernelspec": {
|
506 |
+
"display_name": "Python 3",
|
507 |
+
"name": "python3"
|
508 |
+
},
|
509 |
+
"language_info": {
|
510 |
+
"name": "python"
|
511 |
+
},
|
512 |
+
"widgets": {
|
513 |
+
"application/vnd.jupyter.widget-state+json": {
|
514 |
+
"state":{},
|
515 |
+
"13f5df276e014dbda0f0d8b8b2d80d94": {
|
516 |
+
"model_module": "@jupyter-widgets/controls",
|
517 |
+
"model_module_version": "1.5.0",
|
518 |
+
"model_name": "DescriptionStyleModel",
|
519 |
+
"state": {
|
520 |
+
"_model_module": "@jupyter-widgets/controls",
|
521 |
+
"_model_module_version": "1.5.0",
|
522 |
+
"_model_name": "DescriptionStyleModel",
|
523 |
+
"_view_count": null,
|
524 |
+
"_view_module": "@jupyter-widgets/base",
|
525 |
+
"_view_module_version": "1.2.0",
|
526 |
+
"_view_name": "StyleView",
|
527 |
+
"description_width": ""
|
528 |
+
}
|
529 |
+
},
|
530 |
+
"1658b21204a04c6da554ad380c3d26f1": {
|
531 |
+
"model_module": "@jupyter-widgets/controls",
|
532 |
+
"model_module_version": "1.5.0",
|
533 |
+
"model_name": "ProgressStyleModel",
|
534 |
+
"state": {
|
535 |
+
"_model_module": "@jupyter-widgets/controls",
|
536 |
+
"_model_module_version": "1.5.0",
|
537 |
+
"_model_name": "ProgressStyleModel",
|
538 |
+
"_view_count": null,
|
539 |
+
"_view_module": "@jupyter-widgets/base",
|
540 |
+
"_view_module_version": "1.2.0",
|
541 |
+
"_view_name": "StyleView",
|
542 |
+
"bar_color": null,
|
543 |
+
"description_width": ""
|
544 |
+
}
|
545 |
+
},
|
546 |
+
"28986ca2cbc2439e97f1f02eecc9706e": {
|
547 |
+
"model_module": "@jupyter-widgets/base",
|
548 |
+
"model_module_version": "1.2.0",
|
549 |
+
"model_name": "LayoutModel",
|
550 |
+
"state": {
|
551 |
+
"_model_module": "@jupyter-widgets/base",
|
552 |
+
"_model_module_version": "1.2.0",
|
553 |
+
"_model_name": "LayoutModel",
|
554 |
+
"_view_count": null,
|
555 |
+
"_view_module": "@jupyter-widgets/base",
|
556 |
+
"_view_module_version": "1.2.0",
|
557 |
+
"_view_name": "LayoutView",
|
558 |
+
"align_content": null,
|
559 |
+
"align_items": null,
|
560 |
+
"align_self": null,
|
561 |
+
"border": null,
|
562 |
+
"bottom": null,
|
563 |
+
"display": null,
|
564 |
+
"flex": null,
|
565 |
+
"flex_flow": null,
|
566 |
+
"grid_area": null,
|
567 |
+
"grid_auto_columns": null,
|
568 |
+
"grid_auto_flow": null,
|
569 |
+
"grid_auto_rows": null,
|
570 |
+
"grid_column": null,
|
571 |
+
"grid_gap": null,
|
572 |
+
"grid_row": null,
|
573 |
+
"grid_template_areas": null,
|
574 |
+
"grid_template_columns": null,
|
575 |
+
"grid_template_rows": null,
|
576 |
+
"height": null,
|
577 |
+
"justify_content": null,
|
578 |
+
"justify_items": null,
|
579 |
+
"left": null,
|
580 |
+
"margin": null,
|
581 |
+
"max_height": null,
|
582 |
+
"max_width": null,
|
583 |
+
"min_height": null,
|
584 |
+
"min_width": null,
|
585 |
+
"object_fit": null,
|
586 |
+
"object_position": null,
|
587 |
+
"order": null,
|
588 |
+
"overflow": null,
|
589 |
+
"overflow_x": null,
|
590 |
+
"overflow_y": null,
|
591 |
+
"padding": null,
|
592 |
+
"right": null,
|
593 |
+
"top": null,
|
594 |
+
"visibility": null,
|
595 |
+
"width": null
|
596 |
+
}
|
597 |
+
},
|
598 |
+
"38e3f8c2e78348a191c6de7ac2d02b62": {
|
599 |
+
"model_module": "@jupyter-widgets/controls",
|
600 |
+
"model_module_version": "1.5.0",
|
601 |
+
"model_name": "DescriptionStyleModel",
|
602 |
+
"state": {
|
603 |
+
"_model_module": "@jupyter-widgets/controls",
|
604 |
+
"_model_module_version": "1.5.0",
|
605 |
+
"_model_name": "DescriptionStyleModel",
|
606 |
+
"_view_count": null,
|
607 |
+
"_view_module": "@jupyter-widgets/base",
|
608 |
+
"_view_module_version": "1.2.0",
|
609 |
+
"_view_name": "StyleView",
|
610 |
+
"description_width": ""
|
611 |
+
}
|
612 |
+
},
|
613 |
+
"44aa911946a6412faad36abfcf24a5d4": {
|
614 |
+
"model_module": "@jupyter-widgets/base",
|
615 |
+
"model_module_version": "1.2.0",
|
616 |
+
"model_name": "LayoutModel",
|
617 |
+
"state": {
|
618 |
+
"_model_module": "@jupyter-widgets/base",
|
619 |
+
"_model_module_version": "1.2.0",
|
620 |
+
"_model_name": "LayoutModel",
|
621 |
+
"_view_count": null,
|
622 |
+
"_view_module": "@jupyter-widgets/base",
|
623 |
+
"_view_module_version": "1.2.0",
|
624 |
+
"_view_name": "LayoutView",
|
625 |
+
"align_content": null,
|
626 |
+
"align_items": null,
|
627 |
+
"align_self": null,
|
628 |
+
"border": null,
|
629 |
+
"bottom": null,
|
630 |
+
"display": null,
|
631 |
+
"flex": null,
|
632 |
+
"flex_flow": null,
|
633 |
+
"grid_area": null,
|
634 |
+
"grid_auto_columns": null,
|
635 |
+
"grid_auto_flow": null,
|
636 |
+
"grid_auto_rows": null,
|
637 |
+
"grid_column": null,
|
638 |
+
"grid_gap": null,
|
639 |
+
"grid_row": null,
|
640 |
+
"grid_template_areas": null,
|
641 |
+
"grid_template_columns": null,
|
642 |
+
"grid_template_rows": null,
|
643 |
+
"height": null,
|
644 |
+
"justify_content": null,
|
645 |
+
"justify_items": null,
|
646 |
+
"left": null,
|
647 |
+
"margin": null,
|
648 |
+
"max_height": null,
|
649 |
+
"max_width": null,
|
650 |
+
"min_height": null,
|
651 |
+
"min_width": null,
|
652 |
+
"object_fit": null,
|
653 |
+
"object_position": null,
|
654 |
+
"order": null,
|
655 |
+
"overflow": null,
|
656 |
+
"overflow_x": null,
|
657 |
+
"overflow_y": null,
|
658 |
+
"padding": null,
|
659 |
+
"right": null,
|
660 |
+
"top": null,
|
661 |
+
"visibility": null,
|
662 |
+
"width": null
|
663 |
+
}
|
664 |
+
},
|
665 |
+
"4ff6592efda145cfb32c644f5b7bb7e1": {
|
666 |
+
"model_module": "@jupyter-widgets/controls",
|
667 |
+
"model_module_version": "1.5.0",
|
668 |
+
"model_name": "FloatProgressModel",
|
669 |
+
"state": {
|
670 |
+
"_dom_classes": [],
|
671 |
+
"_model_module": "@jupyter-widgets/controls",
|
672 |
+
"_model_module_version": "1.5.0",
|
673 |
+
"_model_name": "FloatProgressModel",
|
674 |
+
"_view_count": null,
|
675 |
+
"_view_module": "@jupyter-widgets/controls",
|
676 |
+
"_view_module_version": "1.5.0",
|
677 |
+
"_view_name": "ProgressView",
|
678 |
+
"bar_style": "success",
|
679 |
+
"description": "",
|
680 |
+
"description_tooltip": null,
|
681 |
+
"layout": "IPY_MODEL_28986ca2cbc2439e97f1f02eecc9706e",
|
682 |
+
"max": 44740,
|
683 |
+
"min": 0,
|
684 |
+
"orientation": "horizontal",
|
685 |
+
"style": "IPY_MODEL_1658b21204a04c6da554ad380c3d26f1",
|
686 |
+
"value": 44740
|
687 |
+
}
|
688 |
+
},
|
689 |
+
"6d808760aa88409d9066f2a363275672": {
|
690 |
+
"model_module": "@jupyter-widgets/base",
|
691 |
+
"model_module_version": "1.2.0",
|
692 |
+
"model_name": "LayoutModel",
|
693 |
+
"state": {
|
694 |
+
"_model_module": "@jupyter-widgets/base",
|
695 |
+
"_model_module_version": "1.2.0",
|
696 |
+
"_model_name": "LayoutModel",
|
697 |
+
"_view_count": null,
|
698 |
+
"_view_module": "@jupyter-widgets/base",
|
699 |
+
"_view_module_version": "1.2.0",
|
700 |
+
"_view_name": "LayoutView",
|
701 |
+
"align_content": null,
|
702 |
+
"align_items": null,
|
703 |
+
"align_self": null,
|
704 |
+
"border": null,
|
705 |
+
"bottom": null,
|
706 |
+
"display": null,
|
707 |
+
"flex": null,
|
708 |
+
"flex_flow": null,
|
709 |
+
"grid_area": null,
|
710 |
+
"grid_auto_columns": null,
|
711 |
+
"grid_auto_flow": null,
|
712 |
+
"grid_auto_rows": null,
|
713 |
+
"grid_column": null,
|
714 |
+
"grid_gap": null,
|
715 |
+
"grid_row": null,
|
716 |
+
"grid_template_areas": null,
|
717 |
+
"grid_template_columns": null,
|
718 |
+
"grid_template_rows": null,
|
719 |
+
"height": null,
|
720 |
+
"justify_content": null,
|
721 |
+
"justify_items": null,
|
722 |
+
"left": null,
|
723 |
+
"margin": null,
|
724 |
+
"max_height": null,
|
725 |
+
"max_width": null,
|
726 |
+
"min_height": null,
|
727 |
+
"min_width": null,
|
728 |
+
"object_fit": null,
|
729 |
+
"object_position": null,
|
730 |
+
"order": null,
|
731 |
+
"overflow": null,
|
732 |
+
"overflow_x": null,
|
733 |
+
"overflow_y": null,
|
734 |
+
"padding": null,
|
735 |
+
"right": null,
|
736 |
+
"top": null,
|
737 |
+
"visibility": null,
|
738 |
+
"width": null
|
739 |
+
}
|
740 |
+
},
|
741 |
+
"bde8f75e71d84f90b6fbcc40a15ccd9e": {
|
742 |
+
"model_module": "@jupyter-widgets/controls",
|
743 |
+
"model_module_version": "1.5.0",
|
744 |
+
"model_name": "HBoxModel",
|
745 |
+
"state": {
|
746 |
+
"_dom_classes": [],
|
747 |
+
"_model_module": "@jupyter-widgets/controls",
|
748 |
+
"_model_module_version": "1.5.0",
|
749 |
+
"_model_name": "HBoxModel",
|
750 |
+
"_view_count": null,
|
751 |
+
"_view_module": "@jupyter-widgets/controls",
|
752 |
+
"_view_module_version": "1.5.0",
|
753 |
+
"_view_name": "HBoxView",
|
754 |
+
"box_style": "",
|
755 |
+
"children": [
|
756 |
+
"IPY_MODEL_e80f53dc9a784c29a3bd17c1569bd56a",
|
757 |
+
"IPY_MODEL_4ff6592efda145cfb32c644f5b7bb7e1",
|
758 |
+
"IPY_MODEL_e8db992d46f3404cb6c94fed540e89a4"
|
759 |
+
],
|
760 |
+
"layout": "IPY_MODEL_e496685e94a34540908ad47423978d37"
|
761 |
+
}
|
762 |
+
},
|
763 |
+
"e496685e94a34540908ad47423978d37": {
|
764 |
+
"model_module": "@jupyter-widgets/base",
|
765 |
+
"model_module_version": "1.2.0",
|
766 |
+
"model_name": "LayoutModel",
|
767 |
+
"state": {
|
768 |
+
"_model_module": "@jupyter-widgets/base",
|
769 |
+
"_model_module_version": "1.2.0",
|
770 |
+
"_model_name": "LayoutModel",
|
771 |
+
"_view_count": null,
|
772 |
+
"_view_module": "@jupyter-widgets/base",
|
773 |
+
"_view_module_version": "1.2.0",
|
774 |
+
"_view_name": "LayoutView",
|
775 |
+
"align_content": null,
|
776 |
+
"align_items": null,
|
777 |
+
"align_self": null,
|
778 |
+
"border": null,
|
779 |
+
"bottom": null,
|
780 |
+
"display": null,
|
781 |
+
"flex": null,
|
782 |
+
"flex_flow": null,
|
783 |
+
"grid_area": null,
|
784 |
+
"grid_auto_columns": null,
|
785 |
+
"grid_auto_flow": null,
|
786 |
+
"grid_auto_rows": null,
|
787 |
+
"grid_column": null,
|
788 |
+
"grid_gap": null,
|
789 |
+
"grid_row": null,
|
790 |
+
"grid_template_areas": null,
|
791 |
+
"grid_template_columns": null,
|
792 |
+
"grid_template_rows": null,
|
793 |
+
"height": null,
|
794 |
+
"justify_content": null,
|
795 |
+
"justify_items": null,
|
796 |
+
"left": null,
|
797 |
+
"margin": null,
|
798 |
+
"max_height": null,
|
799 |
+
"max_width": null,
|
800 |
+
"min_height": null,
|
801 |
+
"min_width": null,
|
802 |
+
"object_fit": null,
|
803 |
+
"object_position": null,
|
804 |
+
"order": null,
|
805 |
+
"overflow": null,
|
806 |
+
"overflow_x": null,
|
807 |
+
"overflow_y": null,
|
808 |
+
"padding": null,
|
809 |
+
"right": null,
|
810 |
+
"top": null,
|
811 |
+
"visibility": null,
|
812 |
+
"width": null
|
813 |
+
}
|
814 |
+
},
|
815 |
+
"e80f53dc9a784c29a3bd17c1569bd56a": {
|
816 |
+
"model_module": "@jupyter-widgets/controls",
|
817 |
+
"model_module_version": "1.5.0",
|
818 |
+
"model_name": "HTMLModel",
|
819 |
+
"state": {
|
820 |
+
"_dom_classes": [],
|
821 |
+
"_model_module": "@jupyter-widgets/controls",
|
822 |
+
"_model_module_version": "1.5.0",
|
823 |
+
"_model_name": "HTMLModel",
|
824 |
+
"_view_count": null,
|
825 |
+
"_view_module": "@jupyter-widgets/controls",
|
826 |
+
"_view_module_version": "1.5.0",
|
827 |
+
"_view_name": "HTMLView",
|
828 |
+
"description": "",
|
829 |
+
"description_tooltip": null,
|
830 |
+
"layout": "IPY_MODEL_44aa911946a6412faad36abfcf24a5d4",
|
831 |
+
"placeholder": "β",
|
832 |
+
"style": "IPY_MODEL_38e3f8c2e78348a191c6de7ac2d02b62",
|
833 |
+
"value": "documents.csv:β100%"
|
834 |
+
}
|
835 |
+
},
|
836 |
+
"e8db992d46f3404cb6c94fed540e89a4": {
|
837 |
+
"model_module": "@jupyter-widgets/controls",
|
838 |
+
"model_module_version": "1.5.0",
|
839 |
+
"model_name": "HTMLModel",
|
840 |
+
"state": {
|
841 |
+
"_dom_classes": [],
|
842 |
+
"_model_module": "@jupyter-widgets/controls",
|
843 |
+
"_model_module_version": "1.5.0",
|
844 |
+
"_model_name": "HTMLModel",
|
845 |
+
"_view_count": null,
|
846 |
+
"_view_module": "@jupyter-widgets/controls",
|
847 |
+
"_view_module_version": "1.5.0",
|
848 |
+
"_view_name": "HTMLView",
|
849 |
+
"description": "",
|
850 |
+
"description_tooltip": null,
|
851 |
+
"layout": "IPY_MODEL_6d808760aa88409d9066f2a363275672",
|
852 |
+
"placeholder": "β",
|
853 |
+
"style": "IPY_MODEL_13f5df276e014dbda0f0d8b8b2d80d94",
|
854 |
+
"value": "β44.7k/44.7kβ[00:00<00:00,β184kB/s]"
|
855 |
+
}
|
856 |
+
}
|
857 |
+
}
|
858 |
+
}
|
859 |
+
},
|
860 |
+
"nbformat": 4,
|
861 |
+
"nbformat_minor": 0
|
862 |
+
}
|
scripts/02_Process_files.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
scripts/03_Add_context.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
scripts/04_Finetune_Embedding.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
scripts/05_Create_Test_Data.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
scripts/06_Create_Vector.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
scripts/07_Reranking.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|